Thursday, September 10, 2020

Data Visualization

Purpose of ploting:

ploting data makes easy to make sense of the data which otherwise is just rows and columns of data. Few of the advantages are listed below

  • makes decision making process easy
  • Comparison of the datasets becomes easy
  • Becomes easy to demostrate data to average person
  • make easy to publish large data on documents

Line plot:

  1. Drawing a basic Line plot
In [3]:
import matplotlib.pyplot as plt 
a=[1,2,3,4,5]
b=[1,4,9,16,25]
plt.plot(a,b)
Out[3]:
[<matplotlib.lines.Line2D at 0x4422220>]
  1. Drawing multiple Line plot:
In [8]:
import matplotlib.pyplot as plt 
a=[1,2,3,4,5]
b=[1,4,9,16,25]
c=[1, 8, 27, 64, 125]
plt.plot(a,b)
plt.plot(a,c)
Out[8]:
[<matplotlib.lines.Line2D at 0xb2e0fd0>]
  1. Customizing Line plot

ADDING LABELS

In [1]:
import matplotlib.pyplot as plt 
a=[1,2,3,4,5]
b=[1,4,9,16,25]
plt.xlabel("a values")
plt.ylabel("square of a values")
plt.plot(a,b)
Out[1]:
[<matplotlib.lines.Line2D at 0x44cb8e0>]
In [12]:
# ADDING TITLE
In [13]:
import matplotlib.pyplot as plt 
a=[1,2,3,4,5]
b=[1,4,9,16,25]
plt.title("a vs a square graph")
plt.plot(a,b)
Out[13]:
[<matplotlib.lines.Line2D at 0xb357928>]
In [14]:
# ADDING LEGENDS
In [15]:
import matplotlib.pyplot as plt 
a=[1,2,3,4,5]
b=[1,4,9,16,25]
c=[1, 8, 27, 64, 125]
plt.plot(a,b, label="a")
plt.plot(a,c, label="a square")
plt.legend()
Out[15]:
<matplotlib.legend.Legend at 0xc366580>

Practice Program:

Create a Line plot from the following data

State

Jan

Feb

Mar

April

May

June

July

Aug

Sep

Oct

Nov

dec

Tripura

140

130

130

190

160

200

150

170

190

170

150

120

Mizoram

160

200

130

200

170

110

160

130

140

170

200

170

Manipur

110

160

130

110

120

170

130

200

150

160

170

130

 

NOTE: the plot will have 3 lines and also add legends, label and title

Bar Graph:

  1. Drawing a basic Bar Graph
In [17]:
import matplotlib.pyplot as plt 
a=[1,2,3,4,5]
b=[1,4,9,16,25]
plt.bar(a,b)
Out[17]:
<BarContainer object of 5 artists>
  1. Drawing multiple Bar graph
In [3]:
import matplotlib.pyplot as plt 
import numpy as np
a=[1,2,3,4,5]
b=[1,4,9,16,25]
c=[1,8,27,64,125]
x=np.arange(len(a))
plt.bar(x,a, width=.2)
plt.bar(x+.2,b, width=.2)
plt.bar(x+.4,c, width=.2)
Out[3]:
<BarContainer object of 5 artists>

3. Customising Bar Graph:

Adding Labels:

In [5]:
import matplotlib.pyplot as plt 
a=[1,2,3,4,5]
b=[1,4,9,16,25]
plt.xlabel("a values")
plt.ylabel("a square values")
plt.bar(a,b)
Out[5]:
<BarContainer object of 5 artists>

Adding Title:

In [1]:
import matplotlib.pyplot as plt 
a=[1,2,3,4,5]
b=[1,4,9,16,25]
plt.xlabel("a values")
plt.ylabel("a square values")
plt.title(" a vs a square")
plt.bar(a,b)
Out[1]:
<BarContainer object of 5 artists>

Adding Legends:

In [1]:
import matplotlib.pyplot as plt 
import numpy as np
a=[1,2,3,4,5]
b=[1,4,9,16,25]
c=[1,8,27,64,125]
x=np.arange(len(a))
print(x)
plt.bar(x,a, width=.2, label="x vs a")
plt.bar(x+.2,b, width=.2,label="x vs b")
plt.bar(x+.4,c, width=.2,label="x vs c")
plt.legend()
[0 1 2 3 4]
Out[1]:
<matplotlib.legend.Legend at 0x582ca78>

Histogram:

A histogram is a summarisation tool for discrete or continuous data.A histogram provides a visual interpretation of numerical data by showing the number of data points that fall within a specified range of values(called bins). A bar graph has gaps where as a histogram has no gap. NOTE: If the data is continous such as weight, height etc then Histogram are a good choice to represent data but if the data is categorial such as country or subject then bar graph is a better option.

Synatx: matplotlib.pyplot.hist(x, bins=None, histtype='bar', align='mid', orientation='vertical')

parameter:

  • x : array or sequence of data from which the histogram is to ploted
  • bins: basically the number of groups the data is to divided
  • histtype: type of histogram to draw, possible value: 'bar', 'barstacked', 'step', 'stepfilled'
  • orientation: orientation of the hsitogram , 'horizontal' and 'vertical'

1. Ploting a basic histogram

In [11]:
import matplotlib.pyplot as plt
height = [185, 172, 172, 169, 181, 162, 186, 171, 177, 174, 184, 163, 174, 173, 
          182, 169, 174, 170, 176, 179, 169, 182, 181, 179, 181, 171, 175, 170, 
          174, 179, 171, 173, 171, 170, 171, 175, 169, 177, 185, 180, 174, 170, 
          171, 186, 176, 172, 177, 188, 176, 179, 177, 173, 169, 173, 174, 179, 
          181, 181, 177, 181, 171, 183, 179, 174, 178, 175, 182, 185, 189, 167, 
          167, 172, 176, 181, 177, 163, 174, 180, 177, 180, 174, 174, 177, 178, 
          177, 176, 171, 178, 176, 182, 183, 177, 173, 172, 178, 176, 173, 176, 
          172, 180, 173, 183, 178, 179, 169, 177, 180, 170, 174, 176, 167, 177, 
          181, 170, 178, 168, 175, 166, 182, 178, 175, 171, 183, 187, 164, 183, 
          185, 178, 168, 181, 174, 172, 168, 179, 180, 172, 179, 169, 180, 176, 
          174, 175, 181, 180, 179, 176, 176, 179, 177, 180, 174, 161, 182, 189, 
          178, 175, 175, 175, 176, 169, 172, 170, 177, 174, 178, 174, 181, 177, 
          189, 164, 172, 181, 191, 174, 176, 174, 183, 174, 180, 174, 168, 177, 
          179, 183, 175, 172, 179, 177, 177, 175, 182, 178, 187, 182, 179, 166, 
          179, 178, 180, 182, 173, 180, 172, 187, 168, 165, 166, 170, 169, 187, 
          174, 167, 182, 172, 168, 181, 179, 173, 184, 176, 185, 179, 185, 176, 
          168, 190, 172, 174, 171, 174, 177, 177, 179, 186, 175, 168, 168, 172, 
          165, 180, 173, 174, 175, 167, 170, 180, 179, 173, 186, 168]
plt.hist(height,bins=20)
Out[11]:
(array([ 2.,  2.,  4.,  3., 15.,  9., 19., 15., 35., 13., 36., 12., 33.,
        13., 17.,  2., 10.,  4.,  4.,  2.]),
 array([161. , 162.5, 164. , 165.5, 167. , 168.5, 170. , 171.5, 173. ,
        174.5, 176. , 177.5, 179. , 180.5, 182. , 183.5, 185. , 186.5,
        188. , 189.5, 191. ]),
 <a list of 20 Patch objects>)

2. Ploting multiple histogram:

In [10]:
import matplotlib.pyplot as plt
height_M = [185, 172, 172, 169, 181, 162, 186, 171, 177, 174, 184, 163, 174, 173, 
          182, 169, 174, 170, 176, 179, 169, 182, 181, 179, 181, 171, 175, 170, 
          174, 179, 171, 173, 171, 170, 171, 175, 169, 177, 185, 180, 174, 170, 
          171, 186, 176, 172, 177, 188, 176, 179, 177, 173, 169, 173, 174, 179, 
          181, 181, 177, 181, 171, 183, 179, 174, 178, 175, 182, 185, 189, 167, 
          167, 172, 176, 181, 177, 163, 174, 180, 177, 180, 174, 174, 177, 178, 
          177, 176, 171, 178, 176, 182, 183, 177, 173, 172, 178, 176, 173, 176, 
          172, 180, 173, 183, 178, 179, 169, 177, 180, 170, 174, 176, 167, 177]
height_F=[181, 170, 178, 168, 175, 166, 182, 178, 175, 171, 183, 187, 164, 183, 
          185, 178, 168, 181, 174, 172, 168, 179, 180, 172, 179, 169, 180, 176, 
          174, 175, 181, 180, 179, 176, 176, 179, 177, 180, 174, 161, 182, 189, 
          178, 175, 175, 175, 176, 169, 172, 170, 177, 174, 178, 174, 181, 177, 
          189, 164, 172, 181, 191, 174, 176, 174, 183, 174, 180, 174, 168, 177, 
          179, 183, 175, 172, 179, 177, 177, 175, 182, 178, 187, 182, 179, 166, 
          179, 178, 180, 182, 173, 180, 172, 187, 168, 165, 166, 170, 169, 187, 
          174, 167, 182, 172, 168, 181, 179, 173, 184, 176, 185, 179, 185, 176, 
          168, 190, 172, 174, 171, 174, 177, 177, 179, 186, 175, 168, 168, 172, 
          165, 180, 173, 174, 175, 167, 170, 180, 179, 173, 186, 168]
          
plt.hist([height_M,height_F],bins=10)
Out[10]:
(array([[ 3.,  0.,  9., 19., 21., 26., 19.,  8.,  5.,  2.],
        [ 1.,  7., 15., 15., 27., 22., 27., 11.,  9.,  4.]]),
 array([161., 164., 167., 170., 173., 176., 179., 182., 185., 188., 191.]),
 <a list of 2 Lists of Patches objects>)

3. Customizing Histogram:

In [12]:
#Adding label and title to the histogram
In [18]:
import matplotlib.pyplot as plt
height_M = [185, 172, 172, 169, 181, 162, 186, 171, 177, 174, 184, 163, 174, 173, 
          182, 169, 174, 170, 176, 179, 169, 182, 181, 179, 181, 171, 175, 170, 
          174, 179, 171, 173, 171, 170, 171, 175, 169, 177, 185, 180, 174, 170, 
          171, 186, 176, 172, 177, 188, 176, 179, 177, 173, 169, 173, 174, 179, 
          181, 181, 177, 181, 171, 183, 179, 174, 178, 175, 182, 185, 189, 167, 
          167, 172, 176, 181, 177, 163, 174, 180, 177, 180, 174, 174, 177, 178, 
          177, 176, 171, 178, 176, 182, 183, 177, 173, 172, 178, 176, 173, 176, 
          172, 180, 173, 183, 178, 179, 169, 177, 180, 170, 174, 176, 167, 177]
plt.title("Histogram of height of people")
plt.xlabel("x axis")
plt.ylabel("y axis")
plt.hist(height_M,bins=20)
Out[18]:
(array([ 3.,  0.,  0.,  3.,  0., 11.,  8.,  6., 18.,  3.,  9., 17.,  7.,
         5., 11.,  3.,  1.,  5.,  0.,  2.]),
 array([162.  , 163.35, 164.7 , 166.05, 167.4 , 168.75, 170.1 , 171.45,
        172.8 , 174.15, 175.5 , 176.85, 178.2 , 179.55, 180.9 , 182.25,
        183.6 , 184.95, 186.3 , 187.65, 189.  ]),
 <a list of 20 Patch objects>)
In [17]:
# Adding legends to to histogram
In [20]:
import matplotlib.pyplot as plt
height_M = [185, 172, 172, 169, 181, 162, 186, 171, 177, 174, 184, 163, 174, 173, 
          182, 169, 174, 170, 176, 179, 169, 182, 181, 179, 181, 171, 175, 170, 
          174, 179, 171, 173, 171, 170, 171, 175, 169, 177, 185, 180, 174, 170, 
          171, 186, 176, 172, 177, 188, 176, 179, 177, 173, 169, 173, 174, 179, 
          181, 181, 177, 181, 171, 183, 179, 174, 178, 175, 182, 185, 189, 167, 
          167, 172, 176, 181, 177, 163, 174, 180, 177, 180, 174, 174, 177, 178, 
          177, 176, 171, 178, 176, 182, 183, 177, 173, 172, 178, 176, 173, 176, 
          172, 180, 173, 183, 178, 179, 169, 177, 180, 170, 174, 176, 167, 177]
height_F=[181, 170, 178, 168, 175, 166, 182, 178, 175, 171, 183, 187, 164, 183, 
          185, 178, 168, 181, 174, 172, 168, 179, 180, 172, 179, 169, 180, 176, 
          174, 175, 181, 180, 179, 176, 176, 179, 177, 180, 174, 161, 182, 189, 
          178, 175, 175, 175, 176, 169, 172, 170, 177, 174, 178, 174, 181, 177, 
          189, 164, 172, 181, 191, 174, 176, 174, 183, 174, 180, 174, 168, 177, 
          179, 183, 175, 172, 179, 177, 177, 175, 182, 178, 187, 182, 179, 166, 
          179, 178, 180, 182, 173, 180, 172, 187, 168, 165, 166, 170, 169, 187, 
          174, 167, 182, 172, 168, 181, 179, 173, 184, 176, 185, 179, 185, 176, 
          168, 190, 172, 174, 171, 174, 177, 177, 179, 186, 175, 168, 168, 172, 
          165, 180, 173, 174, 175, 167, 170, 180, 179, 173, 186, 168]
          
plt.hist([height_M,height_F],bins=10, label=['height of male','height of female'])
plt.legend()
Out[20]:
<matplotlib.legend.Legend at 0xc9078b0>

Saving a plot

1. Saving a plot to pdf

In [22]:
import matplotlib.pyplot as plt 
a=[1,2,3,4,5]
b=[1,4,9,16,25]
c=[1, 8, 27, 64, 125]
plt.plot(a,b)
plt.plot(a,c)
plt.savefig("multiLineGraph.pdf")

2. Saving a plot to png

In [26]:
import matplotlib.pyplot as plt 
a=[1,2,3,4,5]
b=[1,4,9,16,25]
c=[1, 8, 27, 64, 125]
plt.plot(a,b)
plt.plot(a,c)
plt.savefig("multiLineGraph.png")

References:

  • Informatics practices for class 12 by sumita arora
Share: