[高级编程技术作业-Week 14]IPython notebooks, Pandas, Statsmodels

最新推荐文章于 2018-06-17 11:50:14 发布

ZacharyLei

最新推荐文章于 2018-06-17 11:50:14 发布

阅读量266

点赞数 1

分类专栏：高级编程技术文章标签： Python

本文链接：https://blog.csdn.net/ZacharyLei/article/details/80650830

版权

高级编程技术专栏收录该内容

16 篇文章 0 订阅

订阅专栏

Anscombe's quartet

Anscombe's quartet comprises of four datasets, and is rather famous. Why? You'll find out in this exercise.

	dataset	x	y
0	I	10	8.04
1	I	8	6.95
2	I	13	7.58
3	I	9	8.81
4	I	11	8.33

Part 1

For each of the four datasets...

Compute the mean and variance of both x and y
Compute the correlation coefficient between x and y
Compute the linear regression line: y=β0+β1x+ϵ (hint: use statsmodels and look at the Statsmodels notebook)

Part 2

Using Seaborn, visualize all four datasets.

hint: use sns.FacetGrid combined with plt.scatter

Codes:

import random

import numpy as np
import scipy as sp
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import statsmodels.api as sm
import statsmodels.formula.api as smf

sns.set_context("talk")

anascombe = pd.read_csv('anscombe.csv')
anascombe.head()

#1-1 Compute the mean and variance of both x and y 
x_mean = anascombe.groupby('dataset')['x'].mean()
print('mean of x', x_mean)
y_mean = anascombe.groupby('dataset')['y'].mean()
print('mean of y', y_mean)

x_var = anascombe.groupby('dataset')['x'].var()
print('variance of x', x_var)
y_var = anascombe.groupby('dataset')['y'].var()
print('variance of y', y_var)

#1-2 Compute the correlation coefficient between x and y
cor = anascombe.groupby("dataset")['x'].corr(anascombe['y'])
print('correlation coefficient between x and y', cor)

#1-3 Compute the linear regression line: y=beta_0+beta_1*x+epsilon
for i in range(0,4):
    X = anascombe[i*11:i*11+11]['x']
    Y = anascombe[i*11:i*11+11]['y']
    X = sm.add_constant(X)
    ols = sm.OLS(Y, X)
    reg_func = ols.fit()
    print('dataset '+str(i+1), "y = "+str(reg_func.params[0])+"+"+str(reg_func.params[1])+"x")

#2 Using Seaborn, visualize all four datasets
m = sns.FacetGrid(anascombe, col="dataset")    
m.map(plt.scatter, "x","y")
plt.show()

ZacharyLei

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
[高级编程技术作业-Week 14]IPython notebooks, Pandas, Statsmodels

Anscombe's quartetAnscombe's quartet comprises of four datasets, and is rather famous. Why? You'll find out in this exercise. datasetxy0I108.041I86.952I137.583I98.814I118.33Part 1For each of the four ...
复制链接

扫一扫

专栏目录