Solution:
print("the mean of x and y:")
print(anascombe.groupby(["dataset"])[['x', 'y']].mean())
print("the variance of x and y:")
print(anascombe.groupby(["dataset"])[['x', 'y']].var())
print("the correlation coefficient between x and y:")
print(anascombe.groupby(["dataset"])[['x', 'y']].corr())
datasets = ['I', 'II', 'III', 'IV']
for dataset in datasets:
lin_model = smf.ols('y ~ x', anascombe[anascombe["dataset"] == dataset]).fit()
print('\nThe linear model for dataset %s:' %(dataset))
print(lin_model.summary())
Output:
the mean of x and y:
x y
dataset
I 9.0 7.500909
II 9.0 7.500909
III 9.0 7.500000
IV 9.0 7.500909
the variance of x and y:
x y
dataset
I 11.0 4.127269
II 11.0 4.127629
III 11.0 4.122620
IV 11.0 4.123249
the correlation coefficient between x and y:
x y
dataset
I x 1.000000 0.816421
y 0.816421 1.000000
II x 1.000000 0.816237
y 0.816237 1.000000
III x 1.000000 0.816287
y 0.816287 1.000000
IV x 1.000000 0.816521
y 0.816521 1.000000
The linear model for dataset I:
OLS Regression Results
===================================&#