这是第一次接触和使用jupyter,感觉它很强大;
Jupyter Notebook 的本质是一个 Web 应用程序,便于创建和共享文学化程序文档,支持实时代码,数学方程,可视化和 markdown。 用途包括:数据清理和转换,数值模拟,统计建模,机器学习等等
输入:
import random
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import statsmodels.formula.api as smf
anascombe = pd.read_csv('anscombe.csv')
print(anascombe.groupby('dataset')['x'].mean())
print(anascombe.groupby('dataset')['y'].mean())
print(anascombe.groupby('dataset')['x'].var())
print(anascombe.groupby('dataset')['y'].var())
print(anascombe.groupby('dataset').corr())
dataset_names = ['I', 'II', 'III', 'IV']
for i in dataset_names:
n = len(anascombe[anascombe.dataset == i])
is_train = np.random.rand(n) < 0.7
train = anascombe[anascombe.dataset == i][is_train].reset_index(drop=True)
test = anascombe[anascombe.dataset == i][~is_train].reset_index(drop=True)
lin_model = smf.ols('y ~ x', train).fit()
print(lin_model.summary())
g = sns.FacetGrid(anascombe, col='dataset')
g.map(plt.scatter, 'x', 'y')
plt.show()
输出结果:
dataset
I 9.0
II 9.0
III 9.0
IV 9.0
Name: x, dtype: float64
dataset
I 7.500909
II 7.500909
III 7.500000
IV 7.500909
Name: y, dtype: float64
dataset
I 11.0
II 11.0
III 11.0
IV 11.0
Name: x, dtype: float64
dataset
I 4.127269
II 4.127629
III 4.122620
IV 4.123249
Name: y, dtype: float64
x y
dataset
I x 1.000000 0.816421
y 0.816421 1.000000
II x 1.000000 0.816237
y 0.816237 1.000000
III x 1.000000 0.816287
y 0.816287 1.000000
IV x 1.000000 0.816521
y 0.816521 1.000000
C:\Users\10617\AppData\Local\Programs\Python\Python36\lib\site-packages\scipy\stats\stats.py:1394: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=8
"anyway, n=%i" % int(n))
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.650
Model: OLS Adj. R-squared: 0.592
Method: Least Squares F-statistic: 11.15
Date: Sun, 10 Jun 2018 Prob (F-statistic): 0.0156
Time: 12:18:34 Log-Likelihood: -12.931
No. Observations: 8 AIC: 29.86
Df Residuals: 6 BIC: 30.02
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 2.4459 1.497 1.634 0.153 -1.216 6.108
x 0.5464 0.164 3.339 0.016 0.146 0.947
==============================================================================
Omnibus: 0.157 Durbin-Watson: 3.211
Prob(Omnibus): 0.925 Jarque-Bera (JB): 0.343
Skew: -0.096 Prob(JB): 0.842
Kurtosis: 2.004 Cond. No. 27.8
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
C:\Users\10617\AppData\Local\Programs\Python\Python36\lib\sit