一、安装与使用
安装Ipython与Jupyter,安装好后,接着安装pandas、seaborn、statsmodels库。
或者直接安装anaconda,里面有Jupyter Notebook,直接启动,自动打开一个浏览器,创建一个新的Python3文件。
二、问题解答
导入数据分析要用到的各种库并且导入数据
Part 1
For each of the four datasets...
- Compute the mean and variance of both x and y
- Compute the correlation coefficient between x and y
- Compute the linear regression line: y=β0+β1x+ϵ (hint: use statsmodels and look at the Statsmodels notebook)
(1)Compute the mean and variance of both x and y
gp = anascombe.groupby('dataset') #对数据集按照dataset列进行分组
#分别对四个类别的dataset输出x、y的均值和方差
for index in ['I','II',"III","IV"]:
print("The " + index + " dataset:")
mean_x = gp.get_group(index)['x'].mean()
mean_y = gp.get_group(index)['y'].mean()
var_x = gp.get_group(index)['x'].var()
var_y = gp.get_group(index)['y'].var()
print(" x的均值",mean_x)
print(" y的均值", mean_y)
print(" x的方差", var_x)
print(" y的方差", var_y)
print("")
output:
The I dataset:
x的均值 9.0
y的均值 7.500909090909093
x的方差 11.0
y的方差 4.127269090909091
The II dataset:
x的均值 9.0
y的均值 7.500909090909091
x的方差 11.0
y的方差 4.127629090909091
The III dataset:
x的均值 9.0
y的均值 7.500000000000001
x的方差 11.0
y的方差 4.12262
The IV dataset:
x的均值 9.0
y的均值 7.50090909090909
x的方差 11.0
y的方差 4.12324909090909
(2)Compute the correlation coefficient between x and y
co