Local Linear Model, Semi Local Linear Model and Local Level Model of TFP.STS

最新推荐文章于 2021-03-16 19:41:00 发布

sphw

最新推荐文章于 2021-03-16 19:41:00 发布

阅读量806

点赞数

分类专栏： machine learning

本文链接：https://blog.csdn.net/zxxr123/article/details/104124746

版权

machine learning 专栏收录该内容

9 篇文章 1 订阅

订阅专栏

Local Linear Model

tfp.sts.LocalLinearTrend is formal representation of a local linear trend model.Source

Local Linear Model本质上是分段线性拟合，线性拟合的核心是最小二乘法(Ordinary Least Squares)及其置信区间，即：分段拟合的直线对应的置信区间没有重合( If 95% confidence intervals for these two means are calculated (approximately) by adding or subtracting two standard errors, the intervals do not overlap, so the difference in means is statistically very significant.)。

此模型适合在短期内数据呈现的趋势及斜率一致且不变，但长期会变化的时间序列数据（This model is appropriate for data where the trend direction and magnitude (latent slope) is consistent within short periods but may evolve over time.）。

如何进行分段？
The leves are defined by Random Walk Model which is simple random walk model (good simulation). When faced with a time series that shows irregular growth, the best strategy may not be to try to directly predict the level of the series at each period (i.e., the quantity $Y_t$ ). Instead, it may be better to try to predict the change that occurs from one period to the next (i.e., the quantity $Y_t - Y_{t-1}$ ).
That is, it may be better to look at the first difference of the series, to see if a predictable pattern can be found there. For purposes of one-period-ahead forecasting, it is just as good to predict the next change as to predict the next level of the series, since the predicted change can be added to the current level to yield a predicted level. The simplest case of such a model is one that always predicts that the next change will be zero, as if the series is equally likely to go up or down in the next period regardless of what it has done in the past.
在这里插入图片描述
(Source)
这段话的是意思似乎是探测分段是先根据头几步的数据，生成一个linear model；然后根据这个linear model生成一个预测值，并计算预测值与真实值的差异；根据这个差异，若差异很小，则不分段，若差异较大则分段。这个理解似乎有些问题，往有缘的大神予以纠正。

In tfp.sts.LocalLinearTrend , the levels are defined by Gaussian random walk model (i.e. random walks with Gaussian steps) whose steps are continuous normal (i.e. Gaussian) random variables, rather than discrete random variables. While this loses the simplicity of the random walk on a lattice, it gains in uniformity; the distribution of values at each time step is always Gaussian.

tfp.sts.LocalLinearTrend中有2个关键词: level and slope, level 应该表示分段或数据分组, slope 指拟合直线的斜率。

Local Linear Model如下图所示：在这里插入图片描述

>>> import numpy as np
>>> import statsmodels.api as sm
>>> import matplotlib.pyplot as plt
>>> from statsmodels.sandbox.regression.predstd import wls_prediction_std
>>> 
>>> np.random.seed(9876789)
>>> nsample = 50
>>> groups = np.zeros(nsample, int)
>>> groups[20:40] = 1
>>> groups[40:] = 2
>>> #dummy = (groups[:,None] == np.unique(groups)).astype(float)
... 
>>> dummy = sm.categorical(groups, drop=True)
>>> x = np.linspace(0, 20, nsample)
>>> # drop reference category
... X = np.column_stack((x, dummy[:,1:]))
>>> X = sm.add_constant(X, prepend=False)
>>> 
>>> beta = [1., 3, -3, 10]
>>> y_true = np.dot(X, beta)
>>> e = np.random.normal(size=nsample)
>>> y = y_true + e
>>> print(X[:5,:])
[[0.         0.         0.         1.        ]
 [0.40816327 0.         0.         1.        ]
 [0.81632653 0.         0.         1.        ]
 [1.2244898  0.         0.         1.        ]
 [1.63265306 0.         0.         1.        ]]
>>> print(y[:5])
[ 9.15948411 12.00565852 11.28186857 10.71633086 14.56695876]
>>> print(groups)
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 2 2 2 2 2 2 2 2 2 2]
>>> print(dummy[:5,:])
[[1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]]
>>> res2 = sm.OLS(y, X).fit()
>>> print(res2.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.968
Model:                            OLS   Adj. R-squared:                  0.966
Method:                 Least Squares   F-statistic:                     459.4
Date:                Fri, 31 Jan 2020   Prob (F-statistic):           2.78e-34
Time:                        15:46:13   Log-Likelihood:                -73.569
No. Observations:                  50   AIC:                             155.1
Df Residuals:                      46   BIC:                             162.8
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1             0.8866      0.072     12.379      0.000       0.742       1.031
x2             3.7504      0.680      5.514      0.000       2.381       5.119
x3            -1.3477      1.108     -1.216      0.230      -3.578       0.883
const         10.6017      0.371     28.592      0.000       9.855      11.348
==============================================================================
Omnibus:                        1.111   Durbin-Watson:                   2.314
Prob(Omnibus):                  0.574   Jarque-Bera (JB):                1.155
Skew:                           0.309   Prob(JB):                        0.561
Kurtosis:                       2.583   Cond. No.                         96.3
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
>>> prstd, iv_l, iv_u = wls_prediction_std(res2)
>>> 
>>> fig, ax = plt.subplots(figsize=(8,6))
>>> 
>>> ax.plot(x, y, 'o', label="Data")
[<matplotlib.lines.Line2D object at 0x7f4716c0c0d0>]
>>> ax.plot(x, y_true, 'b-', label="True")
[<matplotlib.lines.Line2D object at 0x7f4716c0c1d0>]
>>> ax.plot(x, res2.fittedvalues, 'r--.', label="Predicted")
[<matplotlib.lines.Line2D object at 0x7f4716c0cdd0>]
>>> ax.plot(x, iv_u, 'r--')
[<matplotlib.lines.Line2D object at 0x7f4716c0cd90>]
>>> ax.plot(x, iv_l, 'r--')
[<matplotlib.lines.Line2D object at 0x7f4716c1d810>]
>>> legend = ax.legend(loc="best")
>>> plt.show()

(Code Source)

Reference

Semi-Local Linear Model

tfp.sts.SemiLocalLinearTrend
Local Linear Model 和 Semi-Local Linear Model之间的主要差别是分段或进化的方法不同，前者通过Gaussian Random Walk Model，后者通过AR1 model。Semi-Local Linear Model在长期预报的情况下，将会生成比Local Linear Model更合理的置信区间。
Unlike the random walk used in LocalLinearTrend, a stationary AR1 process (coefficient in (-1, 1)) maintains bounded variance over time, so a SemiLocalLinearTrend model will often produce more reasonable uncertainties when forecasting over long timescales.(Source)

Local Level Model

tfp.sts.LocalLevel
这个模型只生成分段，不做线性趋势分析，也是通过Gaussian random walk model.

The local level model posits a level evolving via a Gaussian random walk. The latent state is [level]. We observe a noisy realization of the current level: f[t] = level[t] + Normal(0., observation_noise_scale) at each timestep.(Source)