Introduction:

Core idea of using pandas is to analyse data in order to get information which will help in decision making. Now to do that we need large amount of data and CSV can be source of data. CSV stands for Comma Seperated Values, these files can be generated by any spreadsheet application like microsoft excel. pandas have function to both read and write from/to CSV into dataframe. Example of a typical csv file (data.csv)

csv%20file.JPG

Importing data into a DataFrame from CSV:

pandas provide function read_csv() that can be used to read csv file data into a dataframe

import pandas as pd
df=pd.read_csv("G:\pythonVirtualEnv\practicePro\data.csv")
print(df)

   sl    name  age
0   1  ganesh   23
1   2  ramesh   24
2   3  suresh   21
3   4  dinesh   17
4   5   sayam   18

now let us look into few important parameter of read_csv function

read_csv(filepath, sep= ',' , header= 'infer', names= None, skiprows= None, nrows= None, dtype= None)

Parameter:

filepath: used to pass the location of the CSV file, can be local on the system or any remote location using http, ftp etc
sep: the symbol that seperates data present in a row, default is " , ",
header:Row number(s) to use as the column names, and the start of the data
names:used to pass in the name of column for the dataframe in a list
skiprows:used to skip n numbers of rows from the top of the CSV file
nrows:Number of rows of file to read. Useful for reading pieces of large files

Use of names argument

import pandas as pd
df=pd.read_csv("G:\pythonVirtualEnv\practicePro\data.csv", names=["ROll","stuNm","stuAge"])
print(df)

  ROll   stuNm stuAge
0   sl    name    age
1    1  ganesh     23
2    2  ramesh     24
3    3  suresh     21
4    4  dinesh     17
5    5   sayam     18

use of skiprows argument

import pandas as pd
df=pd.read_csv("G:\pythonVirtualEnv\practicePro\data.csv", names=["ROll","stuNm","stuAge"], skiprows=1)
print(df)

   ROll   stuNm  stuAge
0     1  ganesh      23
1     2  ramesh      24
2     3  suresh      21
3     4  dinesh      17
4     5   sayam      18

use of nrows argument

import pandas as pd
df=pd.read_csv("G:\pythonVirtualEnv\practicePro\data.csv", names=["ROll","stuNm","stuAge"], nrows=3)
print(df)

  ROll   stuNm stuAge
0   sl    name    age
1    1  ganesh     23
2    2  ramesh     24

Storing data form a DataFrame into CSV

After perofrming operation on dataframe, we might need to save the dataframe data into a permanent storage like a CSV file. pandas offer to_csv function to write data in dataframe into a CSV file

import pandas as pd
import numpy as np
df=pd.DataFrame((np.arange(0,6)).reshape(3,2), columns=['c1','c2'], index=['r1','r2','r3'])
print(df)
df.to_csv("data2.csv")

    c1  c2
r1   0   1
r2   2   3
r3   4   5

by default when we export data into a CSV, the column headers and the row index of the dataframe are stored in the CSV, now if we dont want to include them in the CSV file, then header, index arguments can be set to False in the to_csv function

import pandas as pd
import numpy as np
df=pd.DataFrame((np.arange(0,6)).reshape(3,2), columns=['c1','c2'], index=['r1','r2','r3'])
print(df)
df.to_csv("data3.csv", header=False, index=False)

    c1  c2
r1   0   1
r2   2   3
r3   4   5

Handling missing values:

if the DataFrame has NaN, i.e. missing data then they will be represented as blank space when exported to a CSV file. to_csv has an argument called na_rep, where we can provide custom representation for missing data(i.e. NaN). Example

import pandas as pd
import numpy as np
df=pd.DataFrame((np.arange(0,6)).reshape(3,2), columns=['c1','c2'], index=['r1','r2','r3'])
df['c1']['r2']=np.NaN
df

the above DataFrame has a missing value at c1 column and r2 row. if this is imported then it will lead to an empty space in the CSV file which can be solved by using na_rep argument

df.to_csv("data4.csv", na_rep="NULL")

References:

Informatics practices with python - sumita arora
pandas documentation

Computers At School

Monday, August 31, 2020

Importing/Exporting Data between CSV files and Data Frames

Contents:

Introduction:

Importing data into a DataFrame from CSV:

Storing data form a DataFrame into CSV

Handling missing values:

	c1	c2
r1	0.0	1
r2	NaN	3
r3	4.0	5