CS109 Lecture 2
Concepts
- Infographics
- Distribution
- CDF (cumulative distribution function)
python
import scipy.stats
scipy.stats.norm.cdf(2)
- Histograms
Histogram is easier to interpret than CDF
- CDF (cumulative distribution function)
- Normal Approximation
Most data is not normal,So It’s Important for us to picturing data , looking into data and finding an appropriate way to tell a story - QQ-plots
Observed versus normal approximation quantiles
Topics
Data Wrangling
In most case , we have to clean our data
Install Anconda and Use IPython
Anconda is a great environments for us to use python
Get Start With Data
import pandas as pd
- Read data from Web
python
url = ‘urls’
Data = pd.read_table(url)
- Plot data
python
data_to_plot = Data.column1
data_to_plot.plot()
data_to_plot.plot(kind='bar')
data_to_plot_multiple = Data
data_to_plot_multiple.plot(kind = 'bar')
- Fix the Legend
python
ax = data_to_plot.plot(kind = 'bar' , legend = False)
ax.legend(loc = 'center left' , bbox_to_anchor = (1,0.5))
- Stacked Bar Plot
python
data_to_plot_multiple.plot(kind = 'bar' , legend = False , stacked = True)
data_to_plot_multiple.plot(kind = 'barh' , legend = False , stacked = True) # horizontal Barplot
- Add Labels For Horizontal Barplot
python
data_to_plot_multiple.set_index(index_names,inplace = True)
data_to_plot_multiple.plot(kind = 'barh' , legend = False , stacked = True)
Anscombe’s Quartet
Same mean,variance,correlation,and linear regression line , but the data is very different