jupyter作业

最新推荐文章于 2022-09-19 15:26:31 发布

qq_36325159

最新推荐文章于 2022-09-19 15:26:31 发布

阅读量327

点赞数

分类专栏： python作业

本文链接：https://blog.csdn.net/qq_36325159/article/details/80658595

版权

python作业专栏收录该内容

23 篇文章 1 订阅

订阅专栏

Anscombe's quartet

Anscombe's quartet comprises of four datasets, and is rather famous. Why? You'll find out in this exercise.

所有模块：

%matplotlib inline
import random
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import statsmodels.formula.api as smf

sns.set_context("talk")

Part 1

For each of the four datasets...

Compute the mean and variance of both x and y
Compute the correlation coefficient between x and y
Compute the linear regression line: y=β0+β1x+ϵy=β0+β1x+ϵ (hint: use statsmodels and look at the Statsmodels notebook)

计算均值：

anscombe=pd.read_csv('data/anscombe.csv')
print('mean of x:')
print(anscombe.groupby("dataset").x.mean(),'\n')
print('mean of y:')
print(anscombe.groupby("dataset").y.mean(),'\n')

结果：

计算方差：

print('variance of x:')
print(anscombe.groupby("dataset").x.var(),'\n')
print('variance of y:')
print(anscombe.groupby("dataset").y.var(),'\n')

结果：

相关系数：

print('correlation coefficient between x and y:')
print(anscombe.groupby("dataset").x.corr(anscombe.y))
# print(anscombe.groupby("dataset").y.corr(anscombe.x)) #这样结果和上面一样

结果：

线性回归方程：

def regression(X,Y,num):
    print("dataset "+str(num)+':')
    X=sm.add_constant(X)
    est=sm.OLS(Y,X)
    est=est.fit()
    print('y='+str(est.params[1])+'x+'+str(est.params[0]))
    x=np.linspace(X.x.min(), X.x.max(),100)
    y=est.params[1]*x+est.params[0]
    plt.figure()
    plt.scatter(X.x, Y, alpha=0.3)
    plt.xlabel('x')
    plt.ylabel('y')
    plt.plot(x,y,color='r')
for i in range(4):
    regression(anscombe[i*11:(i+1)*11].x,anscombe[i*11:(i+1)*11].y,i+1)

结果和线性模拟：

dataset 1:
y=0.5000909090909089x+3.0000909090909085

dataset 2:
y=0.4999999999999999x+3.000909090909091

dataset 3:
y=0.4997272727272726x+3.002454545454545

dataset 4:
y=0.49990909090909114x+3.0017272727272735

Part 2

Using Seaborn, visualize all four datasets.

hint: use sns.FacetGrid combined with plt.scatter

代码：

def visualize(datasetx,y):
    plt.figure()
    sns.FacetGrid(datasetx)
    plt.scatter(datasetx.x,y)
for i in range(4):
    visualize(anscombe[i*11:(i+1)*11],anscombe[i*11:(i+1)*11].y)

结果：

dataset1:

dataset2:

dataset3:

dataset4:

qq_36325159

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
jupyter作业

Anscombe's quartetAnscombe's quartet comprises of four datasets, and is rather famous. Why? You'll find out in this exercise.所有模块：%matplotlib inlineimport randomimport numpy as npim...
复制链接

扫一扫

专栏目录