Exercise：Pandas and Statsmodels

最新推荐文章于 2021-06-13 17:31:16 发布

仗剑逐风

最新推荐文章于 2021-06-13 17:31:16 发布

阅读量176

点赞数

分类专栏： python学习 python习题

本文链接：https://blog.csdn.net/weixin_40029849/article/details/80666794

版权

python学习同时被 2 个专栏收录

19 篇文章 0 订阅

订阅专栏

python习题

11 篇文章 1 订阅

订阅专栏

Part 1

For each of the four datasets...

Compute the mean and variance of both x and y
Compute the correlation coefficient between x and y
Compute the linear regression line: y=β0+β1x+ϵ (hint: use statsmodels and look at the Statsmodels notebook)

import random

import numpy as np
import scipy as sp
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import statsmodels.api as sm
import statsmodels.formula.api as smf
sns.set_context("talk")

df = pd.read_csv('anscombe.csv')
#part1
x_mean = df.groupby('dataset')['x'].mean()
x_var = df.groupby('dataset')['x'].var()

y_mean = df.groupby('dataset')['y'].mean()
y_var = df.groupby('dataset')['y'].var()

print( x_mean , x_var )
print( round(y_mean,3) , round(y_var,3) )

#correlation coefficient
c = df.groupby('dataset')['x'].corr(df['y']) #get correlation
print("the correlation coefficient is :" , c )
x = df['x']
y = df['y']
data = df['dataset']

#OLS
for i in range(0,4): #sep the different dataset  
    flag = 'I'
    if( i == 0 ):
        flag = 'I'
    elif ( i == 1 ):
        flag = 'II'
    elif ( i == 2 ):
        flag = 'III'
    else:
        flag = 'IV'
    
    X = []
    Y = []
    for ele in range(len(data)):
        if ( data[ele] == flag):
            X.append( x[ele] )
            Y.append( y[ele])
    
    #operate:
    X = sm.add_constant(X)  
    ols = sm.OLS(Y, X) 
    reg_func = ols.fit() 
    print('dataset '+str(i+1), "y = "+str(reg_func.params[0])+"+"+str(reg_func.params[1])+"x")

 
   
   Part 2 
   Using Seaborn, visualize all four datasets. 
   hint: use sns.FacetGrid combined with plt.scatter 
    
   on the base of part 1, we add the following code 
   #part 2
g = sns.FacetGrid(df, col="dataset") 
g.map(plt.scatter, "x","y")
#g.map(sns.distplot, 'mpg')
plt.show()

仗剑逐风

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Exercise：Pandas and Statsmodels

Part 1For each of the four datasets...Compute the mean and variance of both x and yCompute the correlation coefficient between x and yCompute the linear regression line: y=β0+β1x+ϵy=β0+β1x+ϵ (hint: us...
复制链接

扫一扫

专栏目录