Jupyter 练习

anscombe里有四个dataset

part1要求算出四个dataset的x、y均值、

算出四个dataset的x、y的相关系数、

实现线性回归

import random

import numpy as np
import scipy as sp
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import statsmodels.api as sm
import statsmodels.formula.api as smf

sns.set_context('talk')

anacombe = pd.read_csv('D:/python_project/data/anscombe.csv')
print(anacombe.head())
print(anacombe.groupby('dataset')['x'].mean())    #以'dataset'分组,每组X的mean
print(anacombe.groupby('dataset')['y'].mean())

输出:

  dataset     x     y
0       I  10.0  8.04
1       I   8.0  6.95
2       I  13.0  7.58
3       I   9.0  8.81
4       I  11.0  8.33
dataset
I      9.0
II     9.0
III    9.0
IV     9.0
Name: x, dtype: float64
dataset
I      7.500909
II     7.500909
III    7.500000
IV     7.500909
Name: y, dtype: float64

Process finished with exit code 0

算出每个部分的相关系数:

print(anacombe.groupby('dataset').corr())

输出:

                  x         y
dataset                      
I       x  1.000000  0.816421
        y  0.816421  1.000000
II      x  1.000000  0.816237
        y  0.816237  1.000000
III     x  1.000000  0.816287
        y  0.816287  1.000000
IV      x  1.000000  0.816521
        y  0.816521  1.000000


线性回归

dataset_1 = anacombe['dataset'] == 'I'
dataset_1 = anacombe[dataset_1].reset_index(drop=True) #取得dataset 为 I的数据
print(dataset_1)
lin_model = smf.ols('y ~ x', dataset_1).fit()
print(lin_model.summary())

输出:

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.667
Model:                            OLS   Adj. R-squared:                  0.629
Method:                 Least Squares   F-statistic:                     17.99
Date:                Mon, 11 Jun 2018   Prob (F-statistic):            0.00217
Time:                        22:17:06   Log-Likelihood:                -16.841
No. Observations:                  11   AIC:                             37.68
Df Residuals:                       9   BIC:                             38.48
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      3.0001      1.125      2.667      0.026       0.456       5.544
x              0.5001      0.118      4.241      0.002       0.233       0.767
==============================================================================
Omnibus:                        0.082   Durbin-Watson:                   3.212
Prob(Omnibus):                  0.960   Jarque-Bera (JB):                0.289
Skew:                          -0.122   Prob(JB):                        0.865
Kurtosis:                       2.244   Cond. No.                         29.1
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Process finished with exit code 0

 数据集II的线性回归:

dataset_2 = anacombe['dataset'] == 'II'
dataset_2 = anacombe[dataset_2].reset_index(drop=True)
lin_model = smf.ols('y ~ x', dataset_2).fit()
print(lin_model.summary())

输出:

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.666
Model:                            OLS   Adj. R-squared:                  0.629
Method:                 Least Squares   F-statistic:                     17.97
Date:                Mon, 11 Jun 2018   Prob (F-statistic):            0.00218
Time:                        22:22:12   Log-Likelihood:                -16.846
No. Observations:                  11   AIC:                             37.69
Df Residuals:                       9   BIC:                             38.49
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      3.0009      1.125      2.667      0.026       0.455       5.547
x              0.5000      0.118      4.239      0.002       0.233       0.767
==============================================================================
Omnibus:                        1.594   Durbin-Watson:                   2.188
Prob(Omnibus):                  0.451   Jarque-Bera (JB):                1.108
Skew:                          -0.567   Prob(JB):                        0.575
Kurtosis:                       1.936   Cond. No.                         29.1
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Process finished with exit code 0


数据集III:

dataset_3 = anacombe['dataset'] == 'III'
dataset_3 = anacombe[dataset_3].reset_index(drop=True)
lin_model = smf.ols('y ~ x', dataset_3).fit()
print(lin_model.summary())


输出:

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.666
Model:                            OLS   Adj. R-squared:                  0.629
Method:                 Least Squares   F-statistic:                     17.97
Date:                Mon, 11 Jun 2018   Prob (F-statistic):            0.00218
Time:                        22:23:51   Log-Likelihood:                -16.838
No. Observations:                  11   AIC:                             37.68
Df Residuals:                       9   BIC:                             38.47
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      3.0025      1.124      2.670      0.026       0.459       5.546
x              0.4997      0.118      4.239      0.002       0.233       0.766
==============================================================================
Omnibus:                       19.540   Durbin-Watson:                   2.144
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               13.478
Skew:                           2.041   Prob(JB):                      0.00118
Kurtosis:                       6.571   Cond. No.                         29.1
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Process finished with exit code 0

数据集IV:

dataset_4 = anacombe['dataset'] == 'IV'
dataset_4 = anacombe[dataset_4].reset_index(drop=True)
lin_model = smf.ols('y ~ x', dataset_4).fit()
print(lin_model.summary())

输出:

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.667
Model:                            OLS   Adj. R-squared:                  0.630
Method:                 Least Squares   F-statistic:                     18.00
Date:                Mon, 11 Jun 2018   Prob (F-statistic):            0.00216
Time:                        22:25:40   Log-Likelihood:                -16.833
No. Observations:                  11   AIC:                             37.67
Df Residuals:                       9   BIC:                             38.46
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      3.0017      1.124      2.671      0.026       0.459       5.544
x              0.4999      0.118      4.243      0.002       0.233       0.766
==============================================================================
Omnibus:                        0.555   Durbin-Watson:                   1.662
Prob(Omnibus):                  0.758   Jarque-Bera (JB):                0.524
Skew:                           0.010   Prob(JB):                        0.769
Kurtosis:                       1.931   Cond. No.                         29.1
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Process finished with exit code 0



part2,画图

anacombe = pd.read_csv('D:/python_project/data/anscombe.csv')
g = sns.FacetGrid(anacombe, col='dataset') #以'dataset'为变量画图
g = g.map(plt.scatter, 'x', 'y')
plt.show()

输出:













































  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Jupyter Notebook是一个开源的交互式编程环境,它可以让你在浏览器中创建和共享文档,其中包含实时代码、方程、可视化图像和说明文本。它支持多种编程语言,包括Python、R和Julia等。 使用Jupyter Notebook进行练习有以下几个优点: 1. 交互性:你可以在Notebook中编写代码,并立即执行并查看结果。这种交互性使得学习和实验变得更加方便和直观。 2. 可视化:你可以在Notebook中插入图表、图片和其他可视化元素,以更好地展示你的数据和结果。 3. 文档性:你可以在Notebook中编写文本、注释和说明,使得你的代码更易于理解和分享。 4. 共享性:你可以将Notebook保存为文件,并与他人共享。这样,其他人可以查看你的代码、运行它们,并进行修改和扩展。 如果你想开始使用Jupyter Notebook进行练习,可以按照以下步骤: 1. 安装Jupyter Notebook:你可以通过Anaconda或pip等方式安装Jupyter Notebook。 2. 启动Jupyter Notebook:在命令行中输入`jupyter notebook`命令,即可启动Jupyter Notebook服务,并在浏览器中打开Notebook界面。 3. 创建新的Notebook:在Notebook界面中,点击"New"按钮,选择你想要使用的编程语言(如Python),即可创建一个新的Notebook。 4. 编写代码和文档:在Notebook中的代码单元格中编写代码,并在需要的地方插入文本和注释。 5. 执行代码:点击代码单元格中的"Run"按钮或使用快捷键(通常是Shift+Enter)来执行代码,并查看结果。 6. 保存和共享:在Notebook界面中,点击"File"菜单,选择"Save and Checkpoint"来保存Notebook。你可以将Notebook文件分享给他人,或者将其导出为其他格式(如HTML或PDF)。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值