高编课后作业------第十四周-Jupyter

上一篇博客应该是13周,上次打错了,已纠正。 用jupyter写完可以直接生成 markdown,然后复制过来就行了,挺方便的
%matplotlib inline

import random

import numpy as np
import scipy as sp
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import statsmodels.api as sm
import statsmodels.formula.api as smf

sns.set_context("talk")
# Anscombe’s quartet Anscombe’s quartet comprises of four datasets, and is rather famous. Why? You’ll find out in this exercise.
anascombe = pd.read_csv('C:/Users/sunyy/Desktop/cme193-ipython-notebooks-lecture-master/data/anscombe.csv')
anascombe.head()
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
datasetxy
0I10.08.04
1I8.06.95
2I13.07.58
3I9.08.81
4I11.08.33

计算均值

print(anascombe.groupby('dataset')['x'].mean())
print(anascombe.groupby('dataset')['y'].mean())
dataset
I      9.0
II     9.0
III    9.0
IV     9.0
Name: x, dtype: float64
dataset
I      7.500909
II     7.500909
III    7.500000
IV     7.500909
Name: y, dtype: float64

计算方差

print(anascombe.groupby('dataset')['x'].var())
print(anascombe.groupby('dataset')['y'].var())
dataset
I      11.0
II     11.0
III    11.0
IV     11.0
Name: x, dtype: float64
dataset
I      4.127269
II     4.127629
III    4.122620
IV     4.123249
Name: y, dtype: float64

相关系数不好直接用groupby分组算,所以先手动获取每组数据

Y1 = anascombe.y[0:10].values  
Y2 = anascombe.y[11:21].values  
Y3 = anascombe.y[22:32].values  
Y4 = anascombe.y[33:43].values  
X1 = anascombe.x[0:10].values  
X2 = anascombe.x[11:21].values  
X3 = anascombe.x[22:32].values  
X4 = anascombe.x[33:43].values 

计算相关系数

cof = [0,0,0,0]
cof[0] = sp.stats.pearsonr(X1, Y1)[0]  #返回的第一个参数是相关系数
cof[1] = sp.stats.pearsonr(X2, Y2)[0]  
cof[2] = sp.stats.pearsonr(X3, Y3)[0]  
cof[3] = sp.stats.pearsonr(X4, Y4)[0] 
for co in cof:
    print(co)
0.7970815759062526
0.7773093020784241
0.7985632617088811
0.8146722146933596

线性回归

执行过程大概分成3步:
X=sm.add_constant(X)  #为模型增加常数项
est=sm.OLS(Y,X)
est=est.fit()
est.params
params 里包括了 前2个系数 ,第三个是误差项
X1add=sm.add_constant(X1)  #为模型增加常数项
est=sm.OLS(Y1,X1add)
est=est.fit()
print("For dataset I, β0 is " + str(est.params[0]) + " ,  β1 is" + str(est.params[1]))
X2add=sm.add_constant(X2)  #为模型增加常数项
est=sm.OLS(Y2,X2add)
est=est.fit()
print("For dataset II, β0 is " + str(est.params[0]) + " ,  β1 is" + str(est.params[1]))
X3add=sm.add_constant(X3)  #为模型增加常数项
est=sm.OLS(Y3,X3add)
est=est.fit()
print("For dataset III, β0 is " + str(est.params[0]) + " ,  β1 is" + str(est.params[1]))
X4add=sm.add_constant(X4)  #为模型增加常数项
est=sm.OLS(Y4,X4add)
est=est.fit()
print("For dataset IIII, β0 is " + str(est.params[0]) + " ,  β1 is" + str(est.params[1]))
For dataset I, β0 is 2.9018181818181805 ,  β1 is0.5086363636363642
For dataset II, β0 is 3.417597402597403 ,  β1 is0.46376623376623394
For dataset III, β0 is 2.877099567099566 ,  β1 is0.5106277056277062
For dataset IIII, β0 is 3.023030303030302 ,  β1 is0.49878787878787884

官方的用法就是这样,也没有什么好解释的:

m = sns.FacetGrid(anascombe, col="dataset")    
m.map(plt.scatter, "x","y")  

这里写图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值