CS109 Lecture 2

CS109 Lecture 2

Concepts

  1. Infographics
  2. Distribution
    • CDF (cumulative distribution function)
      python
      import scipy.stats
      scipy.stats.norm.cdf(2)
    • Histograms
      Histogram is easier to interpret than CDF
  3. Normal Approximation
    Most data is not normal,So It’s Important for us to picturing data , looking into data and finding an appropriate way to tell a story
  4. QQ-plots
    Observed versus normal approximation quantiles

Topics

Data Wrangling

In most case , we have to clean our data

Install Anconda and Use IPython

Anconda is a great environments for us to use python

Get Start With Data
import pandas as pd
  1. Read data from Web
    python
    url = ‘urls’
    Data = pd.read_table(url)
  2. Plot data
    python
    data_to_plot = Data.column1
    data_to_plot.plot()
    data_to_plot.plot(kind='bar')
    data_to_plot_multiple = Data
    data_to_plot_multiple.plot(kind = 'bar')
  3. Fix the Legend
    python
    ax = data_to_plot.plot(kind = 'bar' , legend = False)
    ax.legend(loc = 'center left' , bbox_to_anchor = (1,0.5))
  4. Stacked Bar Plot
    python
    data_to_plot_multiple.plot(kind = 'bar' , legend = False , stacked = True)
    data_to_plot_multiple.plot(kind = 'barh' , legend = False , stacked = True) # horizontal Barplot
  5. Add Labels For Horizontal Barplot
    python
    data_to_plot_multiple.set_index(index_names,inplace = True)
    data_to_plot_multiple.plot(kind = 'barh' , legend = False , stacked = True)
Anscombe’s Quartet

Same mean,variance,correlation,and linear regression line , but the data is very different

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值