time X Y
0.000543 0 10
0.000575 0 10
0.041324 1 10
0.041331 2 10
0.041336 3 10
0.04134 4 10
...
9.987735 55 239
9.987739 56 239
9.987744 57 239
9.987749 58 239
9.987938 59 239
数据集中的第三列(Y)是我的真实值——这就是我想要预测(估计)的值。我想做一个Y的预测(即根据X之前的3个滚动值预测{}的当前值。为此,我使用python脚本工作statsmodels。在
^{pr2}$
它给出了以下格式的输出示例。在time X Y a b1 b2 predicted
0 0.000543 0 10 None None None NaN
1 0.000575 0 10 None None None NaN
2 0.041324 1 10 None None None NaN
3 0.041331 2 10 None None None NaN
4 0.041336 3 10 None None None NaN
.. ... .. .. ... ... ... ...
50 0.041340 4 10 10 0 1.55431e-15 NaN
51 0.041345 5 10 10 1.7053e-13 7.77156e-16 10
52 0.041350 6 10 10 1.74623e-09 -7.99361e-15 10
53 0.041354 7 10 10 6.98492e-10 -6.21725e-15 10
.. ... .. .. ... ... ... ...
509 0.160835 38 20 20 4.88944e-09 -1.15463e-14 20
510 0.160839 39 20 20 1.86265e-09 5.32907e-15 20
.. ... .. .. ... ... ... ...
最后,我想包括所有预测值的均方误差(MSE),这是OLS回归分析的总结。例如,如果我们看第5行,X的值是2,Y的值是10。假设当前行的y的预测值是6,因此mse将是{}。当我们做print (RollOLS.summary())时,sm.OLS返回这个类的一个实例。在OLS Regression Results
==============================================================================
Dep. Variable: Y R-squared: -inf
Model: OLS Adj. R-squared: -inf
Method: Least Squares F-statistic: -48.50
Date: Tue, 04 Jul 2017 Prob (F-statistic): 1.00
Time: 22:19:18 Log-Likelihood: 2359.7
No. Observations: 100 AIC: -4713.
Df Residuals: 97 BIC: -4706.
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
------------------------------------------------------------------------------
const 239.0000 2.58e-09 9.26e+10 0.000 239.000 239.000
time 4.547e-13 2.58e-10 0.002 0.999 -5.12e-10 5.13e-10
X -3.886e-16 1.1e-13 -0.004 0.997 -2.19e-13 2.19e-13
==============================================================================
Omnibus: 44.322 Durbin-Watson: 0.000
Prob(Omnibus): 0.000 Jarque-Bera (JB): 86.471
Skew: -1.886 Prob(JB): 1.67e-19
Kurtosis: 5.556 Cond. No. 9.72e+04
==============================================================================
但是rsquared(print (RollOLS.rsquared)))的值应该在0和{}之间,而不是{},这似乎是{}的问题所在。如果我们想打印mse,我们要print (RollOLS.mse_model)。。。等等,根据documentation。我们如何添加intercepts并打印带有正确值的回归统计信息,就像我们对预测值所做的那样?我在这里做错什么了?或者,有没有其他方法可以使用scikit-learn库来实现这一点?在