题目1:
代码1:
# your code here
x_mean=anascombe.groupby('dataset')['x'].mean()
y_mean=anascombe.groupby('dataset')['y'].mean()
x_var=anascombe.groupby('dataset')['x'].var()
y_var=anascombe.groupby('dataset')['y'].var()
print('X_mean:',x_mean)
print('X_variance:',x_var)
print('Y_mean:',y_mean)
print('Y_variance:',y_var)
输出1:
X_mean: dataset
I 9.0
II 9.0
III 9.0
IV 9.0
Name: x, dtype: float64
X_variance: dataset
I 11.0
II 11.0
III 11.0
IV 11.0
Name: x, dtype: float64
Y_mean: dataset
I 7.500909
II 7.500909
III 7.500000
IV 7.500909
Name: y, dtype: float64
Y_variance: dataset
I 4.127269
II 4.127629
III 4.122620
IV 4.123249
Name: y, dtype: float64
Correlation Coefficient:
I: 0.81642051634484
II: 0.8162365060002428
III: 0.8162867394895981
IV: 0.8165214368885028
代码2:
corr_mat=anascombe.groupby('dataset').corr()
#print(corr_mat)
print('Correlation Coefficient:')
print('I:',corr_mat['x']['I']['y'])
print('II:',corr_mat['x']['II']['y'])
print('III:',corr_mat['x']['III']['y'])
print('IV:',corr_mat['x']['IV']['y'])
输出2:
Correlation Coefficient:
I: 0.81642051634484
II: 0.8162365060002428
III: 0.8162867394895981
IV: 0.8165214368885028
代码3:
data_group=anascombe.groupby('dataset')
indices=data_group.indices
print('the linear regression:')
for each in indices:
group=data_group.get_group(each)
n = len(group)
is_train = np.random.rand(n)< 0.8
train = group[is_train].reset_index(drop=True)
lin_model = smf.ols('y ~ x', train).fit()
print(lin_model.summary())
输出3:
the linear regression:
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.635
Model: OLS Adj. R-squared: 0.590
Method: Least Squares F-statistic: 13.94
Date: Sun, 10 Jun 2018 Prob (F-statistic): 0.00576
Time: 23:20:06 Log-Likelihood: -15.771
No. Observations: 10 AIC: 35.54
Df Residuals: 8 BIC: 36.15
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 2.9018 1.346 2.156 0.063 -0.202 6.006
x 0.5086 0.136 3.733 0.006 0.194 0.823
==============================================================================
Omnibus: 0.212 Durbin-Watson: 2.962
Prob(Omnibus): 0.900 Jarque-Bera (JB): 0.384
Skew: -0.082 Prob(JB): 0.825
Kurtosis: 2.054 Cond. No. 32.4
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.666
Model: OLS Adj. R-squared: 0.629
Method: Least Squares F-statistic: 17.97
Date: Sun, 10 Jun 2018 Prob (F-statistic): 0.00218
Time: 23:20:06 Log-Likelihood: -16.846
No. Observations: 11 AIC: 37.69
Df Residuals: 9 BIC: 38.49
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 3.0009 1.125 2.667 0.026 0.455 5.547
x 0.5000 0.118 4.239 0.002 0.233 0.767
==============================================================================
Omnibus: 1.594 Durbin-Watson: 2.188
Prob(Omnibus): 0.451 Jarque-Bera (JB): 1.108
Skew: -0.567 Prob(JB): 0.575
Kurtosis: 1.936 Cond. No. 29.1
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.684
Model: OLS Adj. R-squared: 0.644
Method: Least Squares F-statistic: 17.31
Date: Sun, 10 Jun 2018 Prob (F-statistic): 0.00316
Time: 23:20:06 Log-Likelihood: -15.457
No. Observations: 10 AIC: 34.91
Df Residuals: 8 BIC: 35.52
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 2.8438 1.174 2.422 0.042 0.136 5.552
x 0.5277 0.127 4.160 0.003 0.235 0.820
==============================================================================
Omnibus: 14.071 Durbin-Watson: 2.077
Prob(Omnibus): 0.001 Jarque-Bera (JB): 7.003
Skew: 1.666 Prob(JB): 0.0301
Kurtosis: 5.390 Cond. No. 27.4
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.664
Model: OLS Adj. R-squared: 0.622
Method: Least Squares F-statistic: 15.80
Date: Sun, 10 Jun 2018 Prob (F-statistic): 0.00409
Time: 23:20:06 Log-Likelihood: -15.707
No. Observations: 10 AIC: 35.41
Df Residuals: 8 BIC: 36.02
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 3.0825 1.207 2.554 0.034 0.299 5.866
x 0.4957 0.125 3.975 0.004 0.208 0.783
==============================================================================
Omnibus: 0.865 Durbin-Watson: 1.659
Prob(Omnibus): 0.649 Jarque-Bera (JB): 0.607
Skew: -0.106 Prob(JB): 0.738
Kurtosis: 1.812 Cond. No. 28.7
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
题目2:
代码:
# your code here
g = sns.FacetGrid(anascombe, col='dataset')
g.map(plt.scatter, 'x', 'y')
plt.show()
输出: