6.3 python ipython

在ipython notebook上完成

%matplotlib inline

import random

import numpy as np
import scipy as sp
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import statsmodels.api as sm
import statsmodels.formula.api as smf

sns.set_context("talk")


Anscombe's quartet

Anscombe's quartet comprises of four datasets, and is rather famous. Why? You'll find out in this exercise.


解释:Anscombe四重奏,用4组数据说明了画图的重要性。这四组数据均值,方差均相等,这将导致线性回归的结果与图像完全不符。


Part 1

For each of the four datasets...

  • Compute the mean and variance of both x and y
  • Compute the correlation coefficient between x and y
  • Compute the linear regression line: y=β0+β1x+ϵy=β0+β1x+ϵ (hint: use statsmodels and look at the Statsmodels notebook)

均值和方差:

调用groupby函数聚类,然后调用mean和var函数对每组的x和y求均值和方差

相关性:

没找到好的函数,因此用了循环,获取每组的10个数据,之后对每组求corr。

(这里不知道为什么不能只用groupby函数,可能这个函数返回的对象不是原来的数据吧)

线性回归:

调用statsmodels.api中的OLS函数,但是没找到如何单个输出所求的东西的方法。

print("每组x的均值")
print(anascombe.groupby('dataset')['x'].mean())
print("\n每组x的方差")
print(anascombe.groupby('dataset')['x'].var())
print("\n每组y的均值")
print(anascombe.groupby('dataset')['y'].mean())
print("\n每组y的方差")
print(anascombe.groupby('dataset')['y'].var())


print("\n相关性")
for i in range(4):
    x = anascombe.x[i*10:(i+1)*10]
    y = anascombe.y[i*10:(i+1)*10]
    corrlation = x.corr(y)
    print("corrlation of group", i, ':', corrlation)
    print()

    
print("\n线性回归")
for i in range(4):
    x = anascombe.x[i*10:(i+1)*10]
    y = anascombe.y[i*10:(i+1)*10]
    mod = sm.OLS(y,x)
    result = mod.fit()
    print(result.summary())

结果如下

 
 
每组x的均值
dataset
I      9.0
II     9.0
III    9.0
IV     9.0
Name: x, dtype: float64

每组x的方差
dataset
I      11.0
II     11.0
III    11.0
IV     11.0
Name: x, dtype: float64

每组y的均值
dataset
I      7.500909
II     7.500909
III    7.500000
IV     7.500909
Name: y, dtype: float64

每组y的方差
dataset
I      4.127269
II     4.127629
III    4.122620
IV     4.123249
Name: y, dtype: float64

相关性
corrlation of group: 0 0.797081575906253

corrlation of group: 1 0.8107567988514719

corrlation of group: 2 0.828558301914895

corrlation of group: 3 0.4695259621639301


线性回归
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.965
Model:                            OLS   Adj. R-squared:                  0.962
Method:                 Least Squares   F-statistic:                     251.5
Date:                Sat, 09 Jun 2018   Prob (F-statistic):           6.95e-08
Time:                        17:13:54   Log-Likelihood:                -18.061
No. Observations:                  10   AIC:                             38.12
Df Residuals:                       9   BIC:                             38.43
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x              0.7881      0.050     15.859      0.000       0.676       0.901
==============================================================================
Omnibus:                        0.651   Durbin-Watson:                   2.507
Prob(Omnibus):                  0.722   Jarque-Bera (JB):                0.396
Skew:                          -0.424   Prob(JB):                        0.820
Kurtosis:                       2.519   Cond. No.                         1.00
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.961
Model:                            OLS   Adj. R-squared:                  0.957
Method:                 Least Squares   F-statistic:                     221.7
Date:                Sat, 09 Jun 2018   Prob (F-statistic):           1.20e-07
Time:                        17:13:54   Log-Likelihood:                -18.584
No. Observations:                  10   AIC:                             39.17
Df Residuals:                       9   BIC:                             39.47
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x              0.7894      0.053     14.889      0.000       0.669       0.909
==============================================================================
Omnibus:                        3.223   Durbin-Watson:                   2.351
Prob(Omnibus):                  0.200   Jarque-Bera (JB):                1.584
Skew:                          -0.969   Prob(JB):                        0.453
Kurtosis:                       2.795   Cond. No.                         1.00
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.963
Model:                            OLS   Adj. R-squared:                  0.959
Method:                 Least Squares   F-statistic:                     235.0
Date:                Sat, 09 Jun 2018   Prob (F-statistic):           9.34e-08
Time:                        17:13:54   Log-Likelihood:                -18.117
No. Observations:                  10   AIC:                             38.23
Df Residuals:                       9   BIC:                             38.54
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x              0.8175      0.053     15.329      0.000       0.697       0.938
==============================================================================
Omnibus:                        0.753   Durbin-Watson:                   1.401
Prob(Omnibus):                  0.686   Jarque-Bera (JB):                0.590
Skew:                          -0.489   Prob(JB):                        0.745
Kurtosis:                       2.323   Cond. No.                         1.00
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.964
Model:                            OLS   Adj. R-squared:                  0.960
Method:                 Least Squares   F-statistic:                     243.1
Date:                Sat, 09 Jun 2018   Prob (F-statistic):           8.06e-08
Time:                        17:13:54   Log-Likelihood:                -17.121
No. Observations:                  10   AIC:                             36.24
Df Residuals:                       9   BIC:                             36.54
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x              0.8537      0.055     15.591      0.000       0.730       0.978
==============================================================================
Omnibus:                        1.048   Durbin-Watson:                   1.199
Prob(Omnibus):                  0.592   Jarque-Bera (JB):                0.714
Skew:                          -0.287   Prob(JB):                        0.700
Kurtosis:                       1.823   Cond. No.                         1.00
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.


可以看到均值,方差,相关性都相等。线性回归的结果也相同。从Part2的练习就可以看到图表的重要性了。

Part 2

Using Seaborn, visualize all four datasets.

hint: use sns.FacetGrid combined with plt.scatter

如hint所说的,调用这两个函数即可。但是注意第一个形成Grid的时候需要使用dataset为(标签?)


g = sns.FacetGrid(anascombe, col = 'dataset')
g_map = g.map(plt.scatter, 'x', 'y')

结果如下



  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
编译原理是计算机专业的一门核心课程,旨在介绍编译程序构造的一般原理和基本方法。编译原理不仅是计算机科学理论的重要组成部分,也是实现高效、可靠的计算机程序设计的关键。本文将对编译原理的基本概念、发展历程、主要内容和实际应用进行详细介绍编译原理是计算机专业的一门核心课程,旨在介绍编译程序构造的一般原理和基本方法。编译原理不仅是计算机科学理论的重要组成部分,也是实现高效、可靠的计算机程序设计的关键。本文将对编译原理的基本概念、发展历程、主要内容和实际应用进行详细介绍编译原理是计算机专业的一门核心课程,旨在介绍编译程序构造的一般原理和基本方法。编译原理不仅是计算机科学理论的重要组成部分,也是实现高效、可靠的计算机程序设计的关键。本文将对编译原理的基本概念、发展历程、主要内容和实际应用进行详细介绍编译原理是计算机专业的一门核心课程,旨在介绍编译程序构造的一般原理和基本方法。编译原理不仅是计算机科学理论的重要组成部分,也是实现高效、可靠的计算机程序设计的关键。本文将对编译原理的基本概念、发展历程、主要内容和实际应用进行详细介绍编译原理是计算机专业的一门核心课程,旨在介绍编译程序构造的一般原理和基本

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值