作业网址:
代码:
# print(anascombe)
x_mean=anascombe.groupby('dataset')['x'].mean()
y_mean=anascombe.groupby('dataset')['y'].mean()
print('x_mean:',x_mean)
print('y_mean:',y_mean)
x_var=anascombe.groupby('dataset')['x'].var()
y_var=anascombe.groupby('dataset')['y'].var()
print('x_variance:',x_var)
print('y_variance:',y_var)
corr_mat=anascombe.groupby('dataset').corr()
# print(corr_mat)
print('correlation coefficient:')
print('I:',corr_mat['x']['I']['y'])
print('II:',corr_mat['x']['II']['y'])
print('III:',corr_mat['x']['III']['y'])
print('IV:',corr_mat['x']['IV']['y'])
data_group=anascombe.groupby('dataset')
indices=data_group.indices
print('the linear regression:')
for key in indices:
group=data_group.get_group(key)
n = len(group)
is_train = np.random.rand(n)>-np.inf
train = group[is_train].reset_index(drop=True)
lin_model = smf.ols('y ~ x', train).fit()
print('dataset '+str(key)+':')
print(lin_model.summary())
结果:
x_mean: dataset
I 9.0
II 9.0
III 9.0
IV 9.0
Name: x, dtype: float64
y_mean: dataset
I 7.500909
II 7.500909
III 7.500000
IV 7.500909
Name: y, dtype: float64
x_variance: dataset
I 11.0
II 11.0
III 11.0
IV 11.0
Name: x, dtype: float64
y_variance: dataset
I 4.127269
II 4.127629
III 4.122620
IV 4.123249
Name: y, dtype: float64
correlation coefficient:
I: 0.816420516345
II: 0.816236506
III: 0.81628673949
IV: 0.816521436889
the linear regression:
dataset I:
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.667
Model: OLS Adj. R-squared: 0.629
Method: Least Squares F-statistic: 17.99
Date: Thu, 07 Jun 2018 Prob (F-statistic): 0.00217
Time: 12:36:23 Log-Likelihood: -16.841
No. Observations: 11 AIC: 37.68
Df Residuals: 9 BIC: 38.48
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 3.0001 1.125 2.667 0.026 0.456 5.544
x 0.5001 0.118 4.241 0.002 0.233 0.767
==============================================================================
Omnibus: 0.082 Durbin-Watson: 3.212
Prob(Omnibus): 0.960 Jarque-Bera (JB): 0.289
Skew: -0.122 Prob(JB): 0.865
Kurtosis: 2.244 Cond. No. 29.1
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
dataset II:
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.666
Model: OLS Adj. R-squared: 0.629
Method: Least Squares F-statistic: 17.97
Date: Thu, 07 Jun 2018 Prob (F-statistic): 0.00218
Time: 12:36:23 Log-Likelihood: -16.846
No. Observations: 11 AIC: 37.69
Df Residuals: 9 BIC: 38.49
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 3.0009 1.125 2.667 0.026 0.455 5.547
x 0.5000 0.118 4.239 0.002 0.233 0.767
==============================================================================
Omnibus: 1.594 Durbin-Watson: 2.188
Prob(Omnibus): 0.451 Jarque-Bera (JB): 1.108
Skew: -0.567 Prob(JB): 0.575
Kurtosis: 1.936 Cond. No. 29.1
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
dataset III:
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.666
Model: OLS Adj. R-squared: 0.629
Method: Least Squares F-statistic: 17.97
Date: Thu, 07 Jun 2018 Prob (F-statistic): 0.00218
Time: 12:36:23 Log-Likelihood: -16.838
No. Observations: 11 AIC: 37.68
Df Residuals: 9 BIC: 38.47
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 3.0025 1.124 2.670 0.026 0.459 5.546
x 0.4997 0.118 4.239 0.002 0.233 0.766
==============================================================================
Omnibus: 19.540 Durbin-Watson: 2.144
Prob(Omnibus): 0.000 Jarque-Bera (JB): 13.478
Skew: 2.041 Prob(JB): 0.00118
Kurtosis: 6.571 Cond. No. 29.1
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
dataset IV:
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.667
Model: OLS Adj. R-squared: 0.630
Method: Least Squares F-statistic: 18.00
Date: Thu, 07 Jun 2018 Prob (F-statistic): 0.00216
Time: 12:36:23 Log-Likelihood: -16.833
No. Observations: 11 AIC: 37.67
Df Residuals: 9 BIC: 38.46
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 3.0017 1.124 2.671 0.026 0.459 5.544
x 0.4999 0.118 4.243 0.002 0.233 0.766
==============================================================================
Omnibus: 0.555 Durbin-Watson: 1.662
Prob(Omnibus): 0.758 Jarque-Bera (JB): 0.524
Skew: 0.010 Prob(JB): 0.769
Kurtosis: 1.931 Cond. No. 29.1
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
代码:
g = sns.FacetGrid(anascombe,row="dataset")
g.map(plt.scatter,'x','y')
结果: