Monday, August 31, 2020

Introduction to matplotlib

Introduction to matplotlib:

matplotlib is a 2-d graphing library in python.Matplotlib comes with a wide variety of plots. Plots helps to understand trends, patterns. It is used for data visualization using pyplot.

data visualization: It is the process of constructing graphical object like pie chart, bar graph, histogram etc from the data.

pyplot: pyplot is a collection of function which allows user to constuct 2D plots(graphs) from data

Installation:

Installation using pip: The code and the steps are listed below. Note: pip has to be installed in the system inorder for this to work

  • open cmd or terminal on the computer
  • type:
    • pip install matplotlib
    • press enter

References:

  • informatics practices by sumita arora
Share:

Importing/Exporting Data between CSV files and Data Frames

Introduction:

Core idea of using pandas is to analyse data in order to get information which will help in decision making. Now to do that we need large amount of data and CSV can be source of data. CSV stands for Comma Seperated Values, these files can be generated by any spreadsheet application like microsoft excel. pandas have function to both read and write from/to CSV into dataframe. Example of a typical csv file (data.csv)

csv%20file.JPG

Importing data into a DataFrame from CSV:

pandas provide function read_csv() that can be used to read csv file data into a dataframe

In [3]:
import pandas as pd
df=pd.read_csv("G:\pythonVirtualEnv\practicePro\data.csv")
print(df)
   sl    name  age
0   1  ganesh   23
1   2  ramesh   24
2   3  suresh   21
3   4  dinesh   17
4   5   sayam   18

now let us look into few important parameter of read_csv function

read_csv(filepath, sep= ',' , header= 'infer', names= None, skiprows= None, nrows= None, dtype= None)

Parameter:

  • filepath: used to pass the location of the CSV file, can be local on the system or any remote location using http, ftp etc
  • sep: the symbol that seperates data present in a row, default is " , ",
  • header:Row number(s) to use as the column names, and the start of the data
  • names:used to pass in the name of column for the dataframe in a list
  • skiprows:used to skip n numbers of rows from the top of the CSV file
  • nrows:Number of rows of file to read. Useful for reading pieces of large files

Use of names argument

In [5]:
import pandas as pd
df=pd.read_csv("G:\pythonVirtualEnv\practicePro\data.csv", names=["ROll","stuNm","stuAge"])
print(df)
  ROll   stuNm stuAge
0   sl    name    age
1    1  ganesh     23
2    2  ramesh     24
3    3  suresh     21
4    4  dinesh     17
5    5   sayam     18

use of skiprows argument

In [8]:
import pandas as pd
df=pd.read_csv("G:\pythonVirtualEnv\practicePro\data.csv", names=["ROll","stuNm","stuAge"], skiprows=1)
print(df)
   ROll   stuNm  stuAge
0     1  ganesh      23
1     2  ramesh      24
2     3  suresh      21
3     4  dinesh      17
4     5   sayam      18

use of nrows argument

In [9]:
import pandas as pd
df=pd.read_csv("G:\pythonVirtualEnv\practicePro\data.csv", names=["ROll","stuNm","stuAge"], nrows=3)
print(df)
  ROll   stuNm stuAge
0   sl    name    age
1    1  ganesh     23
2    2  ramesh     24

Storing data form a DataFrame into CSV

After perofrming operation on dataframe, we might need to save the dataframe data into a permanent storage like a CSV file. pandas offer to_csv function to write data in dataframe into a CSV file

In [3]:
import pandas as pd
import numpy as np
df=pd.DataFrame((np.arange(0,6)).reshape(3,2), columns=['c1','c2'], index=['r1','r2','r3'])
print(df)
df.to_csv("data2.csv")
    c1  c2
r1   0   1
r2   2   3
r3   4   5

by default when we export data into a CSV, the column headers and the row index of the dataframe are stored in the CSV, now if we dont want to include them in the CSV file, then header, index arguments can be set to False in the to_csv function

In [4]:
import pandas as pd
import numpy as np
df=pd.DataFrame((np.arange(0,6)).reshape(3,2), columns=['c1','c2'], index=['r1','r2','r3'])
print(df)
df.to_csv("data3.csv", header=False, index=False)
    c1  c2
r1   0   1
r2   2   3
r3   4   5

Handling missing values:

if the DataFrame has NaN, i.e. missing data then they will be represented as blank space when exported to a CSV file. to_csv has an argument called na_rep, where we can provide custom representation for missing data(i.e. NaN). Example

In [8]:
import pandas as pd
import numpy as np
df=pd.DataFrame((np.arange(0,6)).reshape(3,2), columns=['c1','c2'], index=['r1','r2','r3'])
df['c1']['r2']=np.NaN
df
Out[8]:
c1 c2
r1 0.0 1
r2 NaN 3
r3 4.0 5

the above DataFrame has a missing value at c1 column and r2 row. if this is imported then it will lead to an empty space in the CSV file which can be solved by using na_rep argument

In [9]:
df.to_csv("data4.csv", na_rep="NULL")

References:

  • Informatics practices with python - sumita arora
  • pandas documentation
Share: