Introduction:
Core idea of using pandas is to analyse data in order to get information which will help in decision making. Now to do that we need large amount of data and CSV can be source of data. CSV stands for Comma Seperated Values, these files can be generated by any spreadsheet application like microsoft excel. pandas have function to both read and write from/to CSV into dataframe. Example of a typical csv file (data.csv)
import pandas as pd
df=pd.read_csv("G:\pythonVirtualEnv\practicePro\data.csv")
print(df)
now let us look into few important parameter of read_csv function
read_csv(filepath, sep= ',' , header= 'infer', names= None, skiprows= None, nrows= None, dtype= None)
Parameter:
- filepath: used to pass the location of the CSV file, can be local on the system or any remote location using http, ftp etc
- sep: the symbol that seperates data present in a row, default is " , ",
- header:Row number(s) to use as the column names, and the start of the data
- names:used to pass in the name of column for the dataframe in a list
- skiprows:used to skip n numbers of rows from the top of the CSV file
- nrows:Number of rows of file to read. Useful for reading pieces of large files
Use of names argument
import pandas as pd
df=pd.read_csv("G:\pythonVirtualEnv\practicePro\data.csv", names=["ROll","stuNm","stuAge"])
print(df)
use of skiprows argument
import pandas as pd
df=pd.read_csv("G:\pythonVirtualEnv\practicePro\data.csv", names=["ROll","stuNm","stuAge"], skiprows=1)
print(df)
use of nrows argument
import pandas as pd
df=pd.read_csv("G:\pythonVirtualEnv\practicePro\data.csv", names=["ROll","stuNm","stuAge"], nrows=3)
print(df)
import pandas as pd
import numpy as np
df=pd.DataFrame((np.arange(0,6)).reshape(3,2), columns=['c1','c2'], index=['r1','r2','r3'])
print(df)
df.to_csv("data2.csv")
by default when we export data into a CSV, the column headers and the row index of the dataframe are stored in the CSV, now if we dont want to include them in the CSV file, then header, index arguments can be set to False in the to_csv function
import pandas as pd
import numpy as np
df=pd.DataFrame((np.arange(0,6)).reshape(3,2), columns=['c1','c2'], index=['r1','r2','r3'])
print(df)
df.to_csv("data3.csv", header=False, index=False)
import pandas as pd
import numpy as np
df=pd.DataFrame((np.arange(0,6)).reshape(3,2), columns=['c1','c2'], index=['r1','r2','r3'])
df['c1']['r2']=np.NaN
df
the above DataFrame has a missing value at c1 column and r2 row. if this is imported then it will lead to an empty space in the CSV file which can be solved by using na_rep argument
df.to_csv("data4.csv", na_rep="NULL")
References:
- Informatics practices with python - sumita arora
- pandas documentation