Jupyter exercise答案（仅供参考）

最新推荐文章于 2022-05-02 13:56:19 发布

weixin_38224302

最新推荐文章于 2022-05-02 13:56:19 发布

阅读量327

点赞数 1

本文链接：https://blog.csdn.net/weixin_38224302/article/details/80653042

版权

本次练习的数据来源和教学
https://github.com/schmit/cme193-ipython-notebooks-lecture

Part 1

For each of the four datasets...

Compute the mean and variance of both x and y
Compute the correlation coefficient between x and y
Compute the linear regression line: y=β0+β1x+ϵ (hint: use statsmodels and look at the Statsmodels notebook)

Compute the mean and variance of both x and y.

print("          mean")
print(anascombe.groupby("dataset").mean())
print("\n        variance")
print(anascombe.groupby("dataset").var())

输出结果：

          mean
           x         y
dataset               
I        9.0  7.500909
II       9.0  7.500909
III      9.0  7.500000
IV       9.0  7.500909

        variance
            x         y
dataset                
I        11.0  4.127269
II       11.0  4.127629
III      11.0  4.122620
IV       11.0  4.123249

Compute the correlation coefficient between x and y

anascombe.groupby("dataset").corr()

输出结果：

		x	        y
dataset			
I	x	1.000000	0.816421
        y	0.816421	1.000000
II	x	1.000000	0.816237
        y	0.816237	1.000000
III	x	1.000000	0.816287
        y	0.816287	1.000000
IV	x	1.000000	0.816521
        y	0.816521	1.000000

Compute the linear regression line: y=β0+β1x+ϵ (hint: use statsmodels and look at the Statsmodels notebook）

def rmse(y, yhat):
    return np.sum((y - yhat)**2)**0.5

def ols_by_dataset(anascombe, dataset):
    print("For dataset {}:".format(dataset))
    is_dataset = anascombe["dataset"] == dataset
    dataset = anascombe[is_dataset].reset_index(drop = True)
#     print(dataset)
    lin_model = smf.ols("y ~ x", dataset).fit()
    print("y = {}x".format(lin_model.params[0]) + " + {}".format(lin_model.params[1]))
#     print(lin_model.summary())
    preds = lin_model.predict(dataset['x'])
    print('The RMSE is {}\n'.format(rmse(dataset['y'], preds)))

ols_by_dataset(anascombe, 'I')
ols_by_dataset(anascombe, 'II')
ols_by_dataset(anascombe, 'III')
ols_by_dataset(anascombe, 'IV')

输出结果：

For dataset I:
y = 3.000090909090909x + 0.500090909090909
The RMSE is 3.7098099681789622

For dataset II:
y = 3.0009090909090905x + 0.5
The RMSE is 3.711642616024731

For dataset III:
y = 3.002454545454545x + 0.4997272727272728
The RMSE is 3.708934054169988

For dataset IV:
y = 3.0017272727272735x + 0.49990909090909075
The RMSE is 3.7070864570441304

Part 2

Using Seaborn, visualize all four datasets.

hint: use sns.FacetGrid combined with plt.scatter

g = sns.FacetGrid(anascombe, col="dataset")
g.map(plt.scatter, "x", "y")

weixin_38224302

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Jupyter exercise答案（仅供参考）

本次练习的数据来源和教学 https://github.com/schmit/cme193-ipython-notebooks-lecture Part 1For each of the four datasets...Compute the mean and variance of both x and yCompute the correlation coefficient between x...
复制链接

扫一扫