目录
权重分配:Exponential Almon polynomial 约束一致系数
示例:
根据上述函数及数据要求,利用R代码实现混频回归示例。具体代码如下:
R代码实现
加载包
library(midasr)
生成符合条件的随机数
set.seed(1001) # 设置随机数种子
n <- 250#低频数据样本容量
trend <- c(1:n)#线性趋势
x <- rnorm(4 * n)#高频解释变量,eg:季度数据
z <- rnorm(12 * n)#高频解释变量,eg:月度数据
权重分配:Exponential Almon polynomial 约束一致系数
常见加权函数形式:
#权重分配:指数阿尔蒙滞后多项式nealmon
fn_x <- nealmon(p = c(1, -0.5), d = 8)
fn_z <- nealmon(p = c(2, 0.5, -0.1), d = 17)
低频序列模拟 (e.g. 年度)
y <- 2 + 0.1 * trend + mls(x, 0:7, 4) %*% fn_x + mls(z, 0:16, 12) %*% fn_z + rnorm(n)
绘制系数散点图
plot(fn_z, col = "red")
points(fn_x,col = "blue")
MIDAS 回归示例 月度、季度数据转化为同频
基于最小二乘的线性模型
eq_u1 <- lm(y ~ trend + mls(x, k = 0:7, m = 4) + mls(z, k = 0:16, m = 12))
summary(eq_u1)
eq_u1结果:
Call:
lm(formula = y ~ trend + mls(x, k = 0:7, m = 4) + mls(z, k = 0:16,
m = 12))
Residuals:
Min 1Q Median 3Q Max
-2.2651 -0.6489 0.1073 0.6780 2.7707
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.9694327 0.1261210 15.615 < 2e-16 ***
trend 0.1000072 0.0008769 114.047 < 2e-16 ***
mls(x, k = 0:7, m = 4)X.0/m 0.5268124 0.0643322 8.189 2.07e-14 ***
mls(x, k = 0:7, m = 4)X.1/m 0.3782006 0.0641497 5.896 1.38e-08 ***
mls(x, k = 0:7, m = 4)X.2/m 0.1879689 0.0680465 2.762 0.006219 **
mls(x, k = 0:7, m = 4)X.3/m -0.0052409 0.0658730 -0.080 0.936658
mls(x, k = 0:7, m = 4)X.4/m 0.1504419 0.0627623 2.397 0.017358 *
mls(x, k = 0:7, m = 4)X.5/m 0.0104345 0.0655386 0.159 0.873647
mls(x, k = 0:7, m = 4)X.6/m 0.0698753 0.0692803 1.009 0.314270
mls(x, k = 0:7, m = 4)X.7/m 0.1463317 0.0650285 2.250 0.025412 *
mls(z, k = 0:16, m = 12)X.0/m 0.3671055 0.0618960 5.931 1.14e-08 ***
mls(z, k = 0:16, m = 12)X.1/m 0.3502401 0.0599615 5.841 1.83e-08 ***
mls(z, k = 0:16, m = 12)X.2/m 0.4514656 0.0659546 6.845 7.37e-11 ***
mls(z, k = 0:16, m = 12)X.3/m 0.3733747 0.0577702 6.463 6.42e-10 ***
mls(z, k = 0:16, m = 12)X.4/m 0.3609667 0.0700954 5.150 5.75e-07 ***
mls(z, k = 0:16, m = 12)X.5/m 0.2155748 0.0632036 3.411 0.000769 ***
mls(z, k = 0:16, m = 12)X.6/m 0.0648163 0.0630194 1.029 0.304828
mls(z, k = 0:16, m = 12)X.7/m 0.0665581 0.0673284 0.989 0.323955
mls(z, k = 0:16, m = 12)X.8/m -0.0014853 0.0624569 -0.024 0.981048
mls(z, k = 0:16, m = 12)X.9/m 0.0466486 0.0675116 0.691 0.490305
mls(z, k = 0:16, m = 12)X.10/m 0.0384882 0.0664136 0.580 0.562824
mls(z, k = 0:16, m = 12)X.11/m -0.0077722 0.0591409 -0.131 0.895564
mls(z, k = 0:16, m = 12)X.12/m -0.0283221 0.0620632 -0.456 0.648589
mls(z, k = 0:16, m = 12)X.13/m -0.0375062 0.0608348 -0.617 0.538179
mls(z, k = 0:16, m = 12)X.14/m 0.0297271 0.0652273 0.456 0.649018
mls(z, k = 0:16, m = 12)X.15/m 0.0184075 0.0588059 0.313 0.754558
mls(z, k = 0:16, m = 12)X.16/m -0.0546460 0.0677214 -0.807 0.420574
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9383 on 222 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.9855, Adjusted R-squared: 0.9838
F-statistic: 579.3 on 26 and 222 DF, p-value: < 2.2e-16
基于无约束的混频回归
eq_u2 <- midas_u(y ~ trend + mls(x, 0:7, 4) + mls(z, 0:16, 12))
summary(eq_u2)
eq_u2结果:
Call:
lm(formula = y ~ trend + mls(x, 0:7, 4) + mls(z, 0:16, 12), data = ee)
Residuals:
Min 1Q Median 3Q Max
-2.2651 -0.6489 0.1073 0.6780 2.7707
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.9694327 0.1261210 15.615 < 2e-16 ***
trend 0.1000072 0.0008769 114.047 < 2e-16 ***
mls(x, 0:7, 4)X.0/m 0.5268124 0.0643322 8.189 2.07e-14 ***
mls(x, 0:7, 4)X.1/m 0.3782006 0.0641497 5.896 1.38e-08 ***
mls(x, 0:7, 4)X.2/m 0.1879689 0.0680465 2.762 0.006219 **
mls(x, 0:7, 4)X.3/m -0.0052409 0.0658730 -0.080 0.936658
mls(x, 0:7, 4)X.4/m 0.1504419 0.0627623 2.397 0.017358 *
mls(x, 0:7, 4)X.5/m 0.0104345 0.0655386 0.159 0.873647
mls(x, 0:7, 4)X.6/m 0.0698753 0.0692803 1.009 0.314270
mls(x, 0:7, 4)X.7/m 0.1463317 0.0650285 2.250 0.025412 *
mls(z, 0:16, 12)X.0/m 0.3671055 0.0618960 5.931 1.14e-08 ***
mls(z, 0:16, 12)X.1/m 0.3502401 0.0599615 5.841 1.83e-08 ***
mls(z, 0:16, 12)X.2/m 0.4514656 0.0659546 6.845 7.37e-11 ***
mls(z, 0:16, 12)X.3/m 0.3733747 0.0577702 6.463 6.42e-10 ***
mls(z, 0:16, 12)X.4/m 0.3609667 0.0700954 5.150 5.75e-07 ***
mls(z, 0:16, 12)X.5/m 0.2155748 0.0632036 3.411 0.000769 ***
mls(z, 0:16, 12)X.6/m 0.0648163 0.0630194 1.029 0.304828
mls(z, 0:16, 12)X.7/m 0.0665581 0.0673284 0.989 0.323955
mls(z, 0:16, 12)X.8/m -0.0014853 0.0624569 -0.024 0.981048
mls(z, 0:16, 12)X.9/m 0.0466486 0.0675116 0.691 0.490305
mls(z, 0:16, 12)X.10/m 0.0384882 0.0664136 0.580 0.562824
mls(z, 0:16, 12)X.11/m -0.0077722 0.0591409 -0.131 0.895564
mls(z, 0:16, 12)X.12/m -0.0283221 0.0620632 -0.456 0.648589
mls(z, 0:16, 12)X.13/m -0.0375062 0.0608348 -0.617 0.538179
mls(z, 0:16, 12)X.14/m 0.0297271 0.0652273 0.456 0.649018
mls(z, 0:16, 12)X.15/m 0.0184075 0.0588059 0.313 0.754558
mls(z, 0:16, 12)X.16/m -0.0546460 0.0677214 -0.807 0.420574
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9383 on 222 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.9855, Adjusted R-squared: 0.9838
F-statistic: 579.3 on 26 and 222 DF, p-value: < 2.2e-16
同频时基于最小二乘和无约束混频回归的eq_u1与eq_u2结果一致!
基于midas_r的非线性估计
eq_r <- midas_r(y ~ trend + mls(x, 0:7, 4, nealmon) + mls(z, 0:16, 12, nealmon), start = list(x = c(1,-0.5), z = c(2, 0.5, -0.1)))
summary(eq_r)
收敛性检验
通过计算梯度和黑森矩阵检验是否满足收敛的充分必要条件,计算梯度的欧氏范数和hessian的特征值。然后判断梯度范数是否接近零,特征值是否为正
deriv_tests(eq_r, tol = 1e-06)
结果:
$`first`
[1] FALSE
$second
[1] TRUE
$gradient
[1] 0.005542527 0.173490055 -0.005754945 0.028655981 -0.030150807 0.019224017 0.053829528
$eigenval
[1] 1.047988e+07 5.891393e+04 3.664513e+02 1.221231e+02 8.116275e+01 5.148844e+01 4.598507e+01
系数输出
coef(eq_r)#返回t检验显著的系数
结果:
(Intercept) trend x1 x2 z1 z2 z3
1.98820762 0.09988275 1.35336157 -0.50747744 2.26312231 0.40927986 -0.07293300
coef(eq_r, midas = TRUE)#返回所有系数
结果:
(Intercept) trend x1 x2 x3 x4 x5 x6 x7
1.988208e+00 9.988275e-02 5.480768e-01 3.299490e-01 1.986333e-01 1.195797e-01 7.198845e-02 4.333793e-02 2.608997e-02
x8 z1 z2 z3 z4 z5 z6 z7 z8
1.570648e-02 3.346806e-01 4.049071e-01 4.233810e-01 3.826120e-01 2.988388e-01 2.017282e-01 1.176921e-01 5.934435e-02
z9 z10 z11 z12 z13 z14 z15 z16 z17
2.586203e-02 9.740853e-03 3.170901e-03 8.921121e-04 2.169239e-04 4.558760e-05 8.280130e-06 1.299807e-06 1.763484e-07
eq_r的基础上使用函数amweights来形成几个标准的周期函数约束
amweights(p = c(1, -0.5), d = 8, m = 4, weight = nealmon, type = "C")
nealmon(p = c(1, -0.5), d = 4)
eq_r1 <- midas_r(y ~ trend + mls(x, 0:7, 4, amweights, nealmon, "C") + mls(z, 0:16, 12, nealmon), start = list(x = c(1, -0.5), z = c(2, 0.5, -0.1)))
summary(eq_r1)
其它加权形式
fn <- gompertzp#概率密度函数的权重设定
eq_r2 <- midas_r(y ~ trend + mls(x, 0:7, 4, nealmon) + mls(z, 0:16, 12, fn), start = list(x = c(1,-0.5), z = c(1, 0.5, 0.1)))
summary(eq_r2)
eq_r3 <- midas_r(y ~ trend + mls(x, 0:7, 4) + mls(z, 0:16, 12, nealmon), start = list(z = c(1,-0.5)))
summary(eq_r3)
eq_r4 <- midas_r(y ~ trend + mls(y, 1:2, 1) + mls(x, 0:7, 4, nealmon), start = list(x = c(1,-0.5)))
summary(eq_r4)
eq_r5 <- midas_r(y ~ trend + mls(y, 1:2, 1, "*") + mls(x, 0:7, 4, nealmon), start = list(x = c(1,-0.5)))
summary(eq_r5)
eq_r6 <- midas_r(y ~ trend + mls(y, 1:4, 1, nealmon) + mls(x, 0:7, 4, nealmon), start = list(y = c(1, -0.5), x = c(1, -0.5)))
summary(eq_r6)
eq_r7 <- midas_r(y ~ trend + mls(x, 0:7, 4, amweights, nealmon, "B"), start = list(x = c(1, 1, -0.5))) summary(eq_r7)
eq_r8 <- midas_r(y ~ trend + mls(x, 0:7, 4, amweights, nealmon, "C"), start = list(x = c(1, -0.5))) summary(eq_r8)
eq_r9 <- midas_r(y ~ trend + mls(x, 0:7, 4, amweights, nealmon, "A"), start = list(x = c(1, 1, 1, -0.5)))
summary(eq_r9)
fn <- function(p, d) {
p[1] * c(1:d)^p[2]
}
eq_r10 <- midas_r(y ~ trend + mls(x, 0:101, 4, fn), start = list(x = rep(0, 2)))
summary(eq_r10)
约束的充分性检验
只要误差是独立同分布的,就可以使用hAh_test检验,而hAhr_test是HAC-Roust稳健版本检验。只要在残差中没有观察到显著的hac,在小样本中使用hAh_test检验更精确。
eq_r12 <- midas_r(y ~ trend + mls(x, 0:7, 4, nealmon) + mls(z, 0:16, 12, nealmon), start = list(x = c(1, -0.5), z = c(2, 0.5, -0.1)))
summary(eq_r12)
#约束的充分性检验
hAh_test(eq_r12)
hAhr_test(eq_r12)
结果:
hAh restriction test
data:
hAh = 16.552, df = 20, p-value = 0.6818
hAh restriction test (robust version)
data:
hAhr = 14.854, df = 20, p-value = 0.7847
对比:将变量z的函数约束的参数数量由17减少为2个
eq_r13 <- midas_r(y ~ trend + mls(x, 0:7, 4, nealmon) + mls(z, 0:12, 12, nealmon), start = list(x = c(1, -0.5), z = c(2, -0.1)))
hAh_test(eq_r13)
hAhr_test(eq_r13)
summary(eq_r13)
结果:
hAh restriction test
data:
hAh = 36.892, df = 17, p-value = 0.00348
hAh restriction test (robust version)
data:
hAhr = 32.879, df = 17, p-value = 0.01168
拒绝原假设,即约束不充分
最优模型选取
step1:函数expand_weights_lags定义每个模型的潜在模型集
set_x <- expand_weights_lags(weights = c("nealmon", "almonp"), from = 0, to = c(5, 10), m = 1, start = list(nealmon = rep(1, 2), almonp = rep(1, 3)))
set_z <- expand_weights_lags(weights = c("nealmon", "nbeta"), from = 1, to = c(2, 3), m = 1, start = list(nealmon = rep(0,2), nbeta = rep(0.5, 3)))
step2:所有可能模型的估计
函数midas_r_ic_table返回所有模型的汇总表,以及常用信息准则的相应值和参数约束充分性检验(默认是hAh_test检验)的经验大小。给出了导数检验的结果和优化函数的收敛状态。
eqs.ic <- midas_r_ic_table(y ~ trend + mls(x, 0, m = 4) + fmls(z, 0, m = 12), table = list(z = set_z,x = set_x), start = c(`(Intercept)` = 0, trend = 0))
mod <- modsel(eqs.ic, IC = "AIC", type = "restricted")
step3:查看某个候选模型可微调其收敛性,并更新结果:
eqs_ic$candlist[[5]] <- update(eqs_ic$candlist[[5]], Ofunction = "nls")
eqs_ic <- update(eqs_ic)
手动选取模型
step1:训练集(样本内预测)、测试集(样本外预测)分割
datasplit <- split_data(list(y = y, x = x, z = z, trend = trend), insample = 1:200, outsample = 201:250)
step2:分别拟合待选模型
mod1 <- midas_r(y ~ trend + mls(x, 4:14, 4, nealmon) + mls(z, 12:22, 12, nealmon), data = datasplit$indata, start = list(x = c(10, 1, -0.1), z = c(2, -0.1)))
mod2 <- midas_r(y ~ trend + mls(x, 4:20, 4, nealmon) + mls(z, 12:25, 12, nealmon), data = datasplit$indata, start = list(x = c(10, 1, -0.1), z = c(2, -0.1)))
step3:计算
avgf <- average_forecast(list(mod1, mod2), data = list(y = y, x = x, z = z, trend = trend), insample = 1:200, outsample = 201:250, type = "fixed", measures = c("MSE", "MAPE", "MASE"), fweights = c("EW", "BICW", "MSFE", "DMSFE"))
样本内、样本外预测精度输出:
avgf$accuracy
结果对比:
$`individual`
Model MSE.out.of.sample MAPE.out.of.sample
1 y ~ trend + mls(x, 4:14, 4, nealmon) + mls(z, 12:22, 12, nealmon) 1.848474 4.474242
2 y ~ trend + mls(x, 4:20, 4, nealmon) + mls(z, 12:25, 12, nealmon) 1.763432 4.391594
MASE.out.of.sample MSE.in.sample MAPE.in.sample MASE.in.sample
1 0.8155477 1.768720 14.05880 0.6992288
2 0.7976764 1.726365 13.56287 0.6897242
$average
Scheme MSE MAPE MASE
1 EW 1.803640 4.432918 0.8066121
2 BICW 1.763433 4.391595 0.7976766
3 MSFE 1.802640 4.431945 0.8064017
4 DMSFE 1.801787 4.431112 0.8062216
mod1,mod2预测结果输出:
avgf$forecast
结果对比:
[,1] [,2]
[1,] 22.24057 22.38300
[2,] 22.12774 22.13778
[3,] 22.22678 22.21942
[4,] 22.44535 22.61872
[5,] 22.36448 22.46157
[6,] 22.49847 22.66663
[7,] 22.70192 22.70818
[8,] 22.49411 22.53326
[9,] 22.47978 22.54235
[10,] 22.43439 22.46522
[11,] 22.61563 22.57788
[12,] 22.87686 22.88271
[13,] 22.59329 22.64370
[14,] 23.39675 23.47606
[15,] 23.25864 23.27932
[16,] 22.97588 23.09488
[17,] 23.13196 23.21956
[18,] 24.05489 24.04334
[19,] 24.11690 24.16742
[20,] 23.79328 23.91429
[21,] 24.58047 24.58925
[22,] 24.33794 24.45825
[23,] 23.96658 24.03107
[24,] 24.00510 24.12939
[25,] 23.96473 23.99645
[26,] 24.02198 24.04416
[27,] 25.01521 24.91966
[28,] 24.47063 24.59918
[29,] 24.58748 24.51964
[30,] 24.69662 24.72768
[31,] 25.19092 25.22659
[32,] 25.57452 25.72345
[33,] 25.06312 25.17678
[34,] 24.99302 25.09359
[35,] 25.13005 25.16293
[36,] 25.85870 25.85633
[37,] 25.43593 25.46734
[38,] 25.80690 25.88224
[39,] 26.80900 26.82394
[40,] 26.44056 26.63095
[41,] 26.04786 26.17996
[42,] 26.09545 26.29647
[43,] 25.99309 25.98112
[44,] 25.89755 26.03643
[45,] 25.99814 26.18442
[46,] 26.25727 26.38797
[47,] 26.79563 26.92593
[48,] 26.57034 26.68217
[49,] 26.53808 26.51001
[50,] 26.72772 26.76183
组合预测效果输出:
avgf$avgforecast
结果对比:
其中EW:基于特征值加权组合;BICW:BIC信息准则加权组合;MSFE: 预测均方误差加权组合;DMSFE:discounted MSFE
EW BICW MSFE DMSFE
[1,] 22.31178 22.38299 22.31346 22.31489
[2,] 22.13276 22.13778 22.13288 22.13298
[3,] 22.22310 22.21942 22.22301 22.22294
[4,] 22.53204 22.61872 22.53408 22.53582
[5,] 22.41303 22.46157 22.41417 22.41515
[6,] 22.58255 22.66663 22.58453 22.58622
[7,] 22.70505 22.70818 22.70512 22.70519
[8,] 22.51369 22.53326 22.51415 22.51454
[9,] 22.51107 22.54235 22.51180 22.51244
[10,] 22.44980 22.46522 22.45017 22.45048
[11,] 22.59675 22.57788 22.59631 22.59593
[12,] 22.87979 22.88271 22.87986 22.87991
[13,] 22.61849 22.64370 22.61909 22.61960
[14,] 23.43641 23.47606 23.43734 23.43814
[15,] 23.26898 23.27932 23.26922 23.26943
[16,] 23.03538 23.09488 23.03678 23.03798
[17,] 23.17576 23.21956 23.17679 23.17767
[18,] 24.04912 24.04334 24.04898 24.04887
[19,] 24.14216 24.16742 24.14276 24.14326
[20,] 23.85379 23.91429 23.85521 23.85643
[21,] 24.58486 24.58925 24.58497 24.58505
[22,] 24.39809 24.45825 24.39951 24.40072
[23,] 23.99882 24.03107 23.99958 24.00023
[24,] 24.06724 24.12938 24.06870 24.06996
[25,] 23.98059 23.99645 23.98096 23.98128
[26,] 24.03307 24.04416 24.03334 24.03356
[27,] 24.96744 24.91967 24.96631 24.96535
[28,] 24.53491 24.59918 24.53642 24.53771
[29,] 24.55356 24.51964 24.55277 24.55208
[30,] 24.71215 24.72768 24.71252 24.71283
[31,] 25.20876 25.22659 25.20918 25.20954
[32,] 25.64898 25.72345 25.65074 25.65224
[33,] 25.11995 25.17678 25.12129 25.12244
[34,] 25.04331 25.09359 25.04449 25.04550
[35,] 25.14649 25.16293 25.14688 25.14721
[36,] 25.85751 25.85633 25.85749 25.85746
[37,] 25.45163 25.46734 25.45200 25.45232
[38,] 25.84457 25.88224 25.84546 25.84622
[39,] 26.81647 26.82394 26.81665 26.81680
[40,] 26.53575 26.63095 26.53799 26.53991
[41,] 26.11391 26.17996 26.11546 26.11679
[42,] 26.19596 26.29647 26.19833 26.20035
[43,] 25.98711 25.98112 25.98696 25.98684
[44,] 25.96699 26.03642 25.96862 25.97002
[45,] 26.09128 26.18442 26.09348 26.09535
[46,] 26.32262 26.38797 26.32416 26.32548
[47,] 26.86078 26.92593 26.86232 26.86363
[48,] 26.62626 26.68217 26.62757 26.62870
[49,] 26.52404 26.51001 26.52371 26.52343
[50,] 26.74478 26.76183 26.74518 26.74552
参考文献:《Mixed Frequency Data Sampling Regression Models: The R Package midasr》