pandas

最新推荐文章于 2022-01-08 07:55:00 发布

JairZhu

最新推荐文章于 2022-01-08 07:55:00 发布

阅读量269

点赞数 1

分类专栏：平时作业文章标签：平时作业

本文链接：https://blog.csdn.net/jairzhu/article/details/80616213

版权

平时作业专栏收录该内容

21 篇文章 0 订阅

订阅专栏

Part 1

For each of the four datasets...

Compute the mean and variance of both x and y
Compute the correlation coefficient between x and y
Compute the linear regression line: y=β0+β1x+ϵ (hint: use statsmodels and look at the Statsmodels notebook)

import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

anascombe = pd.read_csv('anscombe.csv')
print('the mean of x is', anascombe['x'].mean())
print('the variance of x is', anascombe['x'].std())
print('the mean of y is', anascombe['y'].mean())
print('the variance of y is', anascombe['y'].std())
print('the correlation coefficient between x and y is\n', anascombe.corr())

model = ols('x ~ y', anascombe).fit()
print(model.summary())

Part 2

Using Seaborn, visualize all four datasets.

hint: use sns.FacetGrid combined with plt.scatter

import pandas as pd
import matplotlib.pyplot as plt

anascombe = pd.read_csv('anscombe.csv')
f, ax = plt.subplots()
ax.scatter(anascombe['x'], anascombe['y'])
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.show()