本次要求出均值、方差、相关系数和线性回归的β1、β2
代码如下:
import pandas as pd
import scipy as sp
import statsmodels.api as sm
from statsmodels.formula.api import ols
anascombe = pd.read_csv('anscombe.csv')
anascombe.head()
print(anascombe.groupby('dataset')['x'].mean())
print(anascombe.groupby('dataset')['y'].mean())
print("--------------------")
print(anascombe.groupby('dataset')['x'].var())
print(anascombe.groupby('dataset')['y'].var())
print("--------------------")
y1 = anascombe.y[0:10].values
y2 = anascombe.y[11:21].values
y3 = anascombe.y[22:32].values
y4 = anascombe.y[33:43].values
x1 = anascombe.x[0:10].values
x2 = anascombe.x[11:21].values
x3 = anascombe.x[22:32].values
x4 = anascombe.x[33:43].values
print(sp.stats.pearsonr(x1, y1)[0])
print(sp.stats.pearsonr(x2, y2)[0])
print(sp.stats.pearsonr(x3, y3)[0])
print(sp.stats.pearsonr(x4, y4)[0])
print("--------------------")
x12=sm.add_constant(x1)
y12=sm.OLS(y1,x12)
y12=y12.fit()
print("I:β0 = "+str(y12.params[0])+" β1="+str(y12.params[1]))
x22=sm.add_constant(x2)
y22=sm.OLS(y2,x22)
y22=y22.fit()
print("II:β0 = "+str(y22.params[0])+" β1="+str(y22.params[1]))
x32=sm.add_constant(x3)
y32=sm.OLS(y3,x32)
y32=y32.fit()
print("III:β0 = "+str(y32.params[0])+" β1="+str(y32.params[1]))
x42=sm.add_constant(x4)
y42=sm.OLS(y1,x42)
y42=y42.fit()
print("I:β0 = "+str(y42.params[0])+" β1="+str(y42.params[1]))
结果如下:
dataset
I 9.0
II 9.0
III 9.0
IV 9.0
Name: x, dtype: float64
dataset
I 7.500909
II 7.500909
III 7.500000
IV 7.500909
Name: y, dtype: float64
--------------------
dataset
I 11.0
II 11.0
III 11.0
IV 11.0
Name: x, dtype: float64
dataset
I 4.127269
II 4.127629
III 4.122620
IV 4.123249
Name: y, dtype: float64
--------------------
0.797081575906
0.777309302078
0.798563261709
0.814672214693
--------------------
I:β0 = 2.90181818182 β1=0.508636363636
II:β0 = 3.4175974026 β1=0.463766233766
III:β0 = 2.8770995671 β1=0.510627705628
I:β0 = 10.8293939394 β1=-0.345757575758
Process finished with exit code 0