Wednesday, April 22, 2020

Series Data Structure in python pandas:

Series Data Structure 
Note:
1. One Dimensional Array of index data
2. It consists of two parts
a) An array of actual data
b) An associated array of index for the data array
3. Pandas has to be imported so as to create series object


Creation of Series Data Structure
Syntax:
pandas.Series( data=None, index=None, dtype=None )
Parameters:
data : array-like, Iterable, dict, or scalar value
    Contains data stored in Series.
index : array-like or Index (1d) but not a scalar
    Values must be hashable and have the same length as `data`.
    Non-unique index values are allowed. Will default to
    RangeIndex (0, 1, 2, ..., n) if not provided. If both a dict and index
    sequence are used, the index will override the keys found in the
    dict.
dtype : str, numpy.dtype, or ExtensionDtype, optional
    Data type for the output Series. If not specified, this will be
    inferred from `data`.


Syntax
Example
Remarks
Empty series object
<identifier>=pandas.Series()
srObj=pandas.Series()
Creates a empty series object whose value can be added later
Creating a series object from Sequence
<identifier>=pandas.Series(data,index)
srObj=pandas.Series(range(10))
Creates a series from range function and index starts from 0
srObj=pandas.Series([12,13,14,15,16],index=[1,2,3,4,5,6])
Creates a Series form List and index starts from 1
srObj=pandas.Series((1,2,3,4,5))
Creating a series object from Tuple
Creating a series object from ndarray
<numPyArray>=numpy.array(<squence>)
<identifier>=pandas.Series(<numPyArray>)

import numpy,pandas
npArray=numpy.array([4,5,6,7,8])
srObj=pandas.Series(npArray)
Any kind of ndarray can be passed to the series method
Creating a series object from a Scalar
<identifier>=pandas.Series(<scalar>)

srObj=pandas.Series(10,index=[1])
Scalar mean a number
Creating a series object from dictionary

srObj=pandas.Series({ ‘a’:2, ’b’:5, ’c’:45,  ‘d’:32})
Data values are used as data and keys are used as index
Adding Data,index and Data type to series object
srObject=pandas.Series(data=[1,2,3,4],index=[‘a’,’b’,’c’,’d’],dtype=numpy.float64)


Creating a series object from arithmetic operation

Ls=[2,3,4]
srObject=pandas(data=(Ls*2))

Creates a series object by multiplying each element of the list by 2

Series Data Structure Attributes
Note: Using the series data structure attributes we can access the various details about the Series Object

Attribute
Use
SeriesObject.index
Prints the range of the index of the series object I.e start_index, stop_index, step_index
SeriesObject.values
Return Series as ndarray or ndarray-like depending on the dtype
SeriesObject.dtype
Returns the data type of the under lying data in the series object
SeriesObject.shape
Return a tuple of the shape of the underlying data.
SeriesObject.nbytes
Returns the number of bytes in the underlying data I.e complete series object
SeriesObject.size
Returns the number of element in the series object
SeriesObject.itemsize
Prints the size of each data underlying in the series object
SeriesObject.hasnans
Prints True if the series object has NaN(Not a Number) else prints false
SeriesObject.empty
Return(prints) True if the series object is empty else return False
Example:
srObj=pandas.Series([2,3,4,5,6])
print(srObj.size)
print(srObj.empty)
Output:
5
False

Operation on Series Data Structure

Syntax
Example
 Accessing an element in a series Object

SeriesObject.[<valid index>]
dc={‘a’:45,  ‘b’:56,  ‘c’:78}
srObj=pandas.Series(dc)
print(srObj[‘b’])
Output:
56

 Accessing a Slice of elements from a series Object

SeriesObject.[start_index: Stop_index: step_value]
dc={‘a’:45,  ‘b’:56,  ‘c’:78,  ‘d’:57,  ‘e’:62}
srObj=pandas.Series(dc)
print(srObj[‘a’:’d’])
Output:
a   45
b   56
c   78
d   57
dtype:  int64

Modification of elements in a series object

seriesObject[index]=new_data_value

seriesObject[start: stop]=sequence_of_new_values
dc={‘a’:45,  ‘b’:56,  ‘c’:78}
srObj=pandas.Series(dc)
srObj[‘b’]=32
print(srObj)
a    45
b    32
c    78
dtype: int64

Vectorized operation on series object(operation appliers to each element of the object individually)
<seriesObject> operator <scalar>
srObj +2  # adds 2 to each element of the series object
srObj * 5  # multiply 5 to each element
srObj > 3  #compares each element to 3 and return True/False for each
srObj = srObj+3 #adds 3 to each element and then assign to series object

Arithmetic operation between series object ( performs operation on the matching index)
<seriesObject1> operator <seriesObject2>
Note:
1. The index of the resultant object is the union of common and different index of the series objects
2. If the index are not matching , the arithmetic operation result in NaN

srObj1 + srObj2


Reindexing an series object
Identifier=seriesObject.reindex(<sequence>)
srObj=pandas.Series([10,12,13,14])
Obj1=srObj.reindex([‘a’,  ‘b’,  ‘c’,  ‘d’])
Removing Element form a series Object
seriesObject.drop(‘valid_index’)
dc={‘a’:45,  ‘b’:56,  ‘c’:78}
srObj=pandas.Series(dc)
srObj.drop(‘b’)
print(srObj)

Output:
a   45
c   78




References:
1. Informatics practices by Sumita Arora
2. Python Documentation in Jupyter Notebook



Share: