R语言实现混频数据分析(二)----示例解析

目录

示例: 

R代码实现

 加载包

生成符合条件的随机数

权重分配:Exponential Almon polynomial 约束一致系数

低频序列模拟 (e.g. 年度)

MIDAS 回归示例 月度、季度数据转化为同频

基于最小二乘的线性模型

基于无约束的混频回归

基于midas_r的非线性估计

收敛性检验

其它加权形式

约束的充分性检验

最优模型选取

手动选取模型


示例: 

根据上述函数及数据要求,利用R代码实现混频回归示例。具体代码如下:

R代码实现

 加载包

library(midasr)

生成符合条件的随机数

set.seed(1001)  # 设置随机数种子
n <- 250#低频数据样本容量
trend <- c(1:n)#线性趋势
x <- rnorm(4 * n)#高频解释变量,eg:季度数据
z <- rnorm(12 * n)#高频解释变量,eg:月度数据

权重分配:Exponential Almon polynomial 约束一致系数

常见加权函数形式:

#权重分配:指数阿尔蒙滞后多项式nealmon
fn_x <- nealmon(p = c(1, -0.5), d = 8)
fn_z <- nealmon(p = c(2, 0.5, -0.1), d = 17)

低频序列模拟 (e.g. 年度)

y <- 2 + 0.1 * trend + mls(x, 0:7, 4) %*% fn_x + mls(z, 0:16, 12) %*% fn_z + rnorm(n)

绘制系数散点图

plot(fn_z, col = "red")
points(fn_x,col = "blue")

MIDAS 回归示例 月度、季度数据转化为同频

基于最小二乘的线性模型

eq_u1 <- lm(y ~ trend + mls(x, k = 0:7, m = 4) + mls(z, k = 0:16, m = 12))
summary(eq_u1)

eq_u1结果:

Call:
lm(formula = y ~ trend + mls(x, k = 0:7, m = 4) + mls(z, k = 0:16, 
    m = 12))

Residuals:
    Min      1Q  Median      3Q     Max 
-2.2651 -0.6489  0.1073  0.6780  2.7707 

Coefficients:
                                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)                     1.9694327  0.1261210  15.615  < 2e-16 ***
trend                           0.1000072  0.0008769 114.047  < 2e-16 ***
mls(x, k = 0:7, m = 4)X.0/m     0.5268124  0.0643322   8.189 2.07e-14 ***
mls(x, k = 0:7, m = 4)X.1/m     0.3782006  0.0641497   5.896 1.38e-08 ***
mls(x, k = 0:7, m = 4)X.2/m     0.1879689  0.0680465   2.762 0.006219 ** 
mls(x, k = 0:7, m = 4)X.3/m    -0.0052409  0.0658730  -0.080 0.936658    
mls(x, k = 0:7, m = 4)X.4/m     0.1504419  0.0627623   2.397 0.017358 *  
mls(x, k = 0:7, m = 4)X.5/m     0.0104345  0.0655386   0.159 0.873647    
mls(x, k = 0:7, m = 4)X.6/m     0.0698753  0.0692803   1.009 0.314270    
mls(x, k = 0:7, m = 4)X.7/m     0.1463317  0.0650285   2.250 0.025412 *  
mls(z, k = 0:16, m = 12)X.0/m   0.3671055  0.0618960   5.931 1.14e-08 ***
mls(z, k = 0:16, m = 12)X.1/m   0.3502401  0.0599615   5.841 1.83e-08 ***
mls(z, k = 0:16, m = 12)X.2/m   0.4514656  0.0659546   6.845 7.37e-11 ***
mls(z, k = 0:16, m = 12)X.3/m   0.3733747  0.0577702   6.463 6.42e-10 ***
mls(z, k = 0:16, m = 12)X.4/m   0.3609667  0.0700954   5.150 5.75e-07 ***
mls(z, k = 0:16, m = 12)X.5/m   0.2155748  0.0632036   3.411 0.000769 ***
mls(z, k = 0:16, m = 12)X.6/m   0.0648163  0.0630194   1.029 0.304828    
mls(z, k = 0:16, m = 12)X.7/m   0.0665581  0.0673284   0.989 0.323955    
mls(z, k = 0:16, m = 12)X.8/m  -0.0014853  0.0624569  -0.024 0.981048    
mls(z, k = 0:16, m = 12)X.9/m   0.0466486  0.0675116   0.691 0.490305    
mls(z, k = 0:16, m = 12)X.10/m  0.0384882  0.0664136   0.580 0.562824    
mls(z, k = 0:16, m = 12)X.11/m -0.0077722  0.0591409  -0.131 0.895564    
mls(z, k = 0:16, m = 12)X.12/m -0.0283221  0.0620632  -0.456 0.648589    
mls(z, k = 0:16, m = 12)X.13/m -0.0375062  0.0608348  -0.617 0.538179    
mls(z, k = 0:16, m = 12)X.14/m  0.0297271  0.0652273   0.456 0.649018    
mls(z, k = 0:16, m = 12)X.15/m  0.0184075  0.0588059   0.313 0.754558    
mls(z, k = 0:16, m = 12)X.16/m -0.0546460  0.0677214  -0.807 0.420574    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9383 on 222 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.9855,	Adjusted R-squared:  0.9838 
F-statistic: 579.3 on 26 and 222 DF,  p-value: < 2.2e-16

基于无约束的混频回归

eq_u2 <- midas_u(y ~ trend + mls(x, 0:7, 4) + mls(z, 0:16, 12))
summary(eq_u2)

eq_u2结果:

Call:
lm(formula = y ~ trend + mls(x, 0:7, 4) + mls(z, 0:16, 12), data = ee)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.2651 -0.6489  0.1073  0.6780  2.7707 

Coefficients:
                         Estimate Std. Error t value Pr(>|t|)    
(Intercept)             1.9694327  0.1261210  15.615  < 2e-16 ***
trend                   0.1000072  0.0008769 114.047  < 2e-16 ***
mls(x, 0:7, 4)X.0/m     0.5268124  0.0643322   8.189 2.07e-14 ***
mls(x, 0:7, 4)X.1/m     0.3782006  0.0641497   5.896 1.38e-08 ***
mls(x, 0:7, 4)X.2/m     0.1879689  0.0680465   2.762 0.006219 ** 
mls(x, 0:7, 4)X.3/m    -0.0052409  0.0658730  -0.080 0.936658    
mls(x, 0:7, 4)X.4/m     0.1504419  0.0627623   2.397 0.017358 *  
mls(x, 0:7, 4)X.5/m     0.0104345  0.0655386   0.159 0.873647    
mls(x, 0:7, 4)X.6/m     0.0698753  0.0692803   1.009 0.314270    
mls(x, 0:7, 4)X.7/m     0.1463317  0.0650285   2.250 0.025412 *  
mls(z, 0:16, 12)X.0/m   0.3671055  0.0618960   5.931 1.14e-08 ***
mls(z, 0:16, 12)X.1/m   0.3502401  0.0599615   5.841 1.83e-08 ***
mls(z, 0:16, 12)X.2/m   0.4514656  0.0659546   6.845 7.37e-11 ***
mls(z, 0:16, 12)X.3/m   0.3733747  0.0577702   6.463 6.42e-10 ***
mls(z, 0:16, 12)X.4/m   0.3609667  0.0700954   5.150 5.75e-07 ***
mls(z, 0:16, 12)X.5/m   0.2155748  0.0632036   3.411 0.000769 ***
mls(z, 0:16, 12)X.6/m   0.0648163  0.0630194   1.029 0.304828    
mls(z, 0:16, 12)X.7/m   0.0665581  0.0673284   0.989 0.323955    
mls(z, 0:16, 12)X.8/m  -0.0014853  0.0624569  -0.024 0.981048    
mls(z, 0:16, 12)X.9/m   0.0466486  0.0675116   0.691 0.490305    
mls(z, 0:16, 12)X.10/m  0.0384882  0.0664136   0.580 0.562824    
mls(z, 0:16, 12)X.11/m -0.0077722  0.0591409  -0.131 0.895564    
mls(z, 0:16, 12)X.12/m -0.0283221  0.0620632  -0.456 0.648589    
mls(z, 0:16, 12)X.13/m -0.0375062  0.0608348  -0.617 0.538179    
mls(z, 0:16, 12)X.14/m  0.0297271  0.0652273   0.456 0.649018    
mls(z, 0:16, 12)X.15/m  0.0184075  0.0588059   0.313 0.754558    
mls(z, 0:16, 12)X.16/m -0.0546460  0.0677214  -0.807 0.420574    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9383 on 222 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.9855,	Adjusted R-squared:  0.9838 
F-statistic: 579.3 on 26 and 222 DF,  p-value: < 2.2e-16

同频时基于最小二乘和无约束混频回归的eq_u1与eq_u2结果一致!

基于midas_r的非线性估计

eq_r <- midas_r(y ~ trend + mls(x, 0:7, 4, nealmon) + mls(z, 0:16, 12, nealmon), start = list(x = c(1,-0.5), z = c(2, 0.5, -0.1)))
summary(eq_r)

收敛性检验

通过计算梯度和黑森矩阵检验是否满足收敛的充分必要条件,计算梯度的欧氏范数和hessian的特征值。然后判断梯度范数是否接近零,特征值是否为正

deriv_tests(eq_r, tol = 1e-06)

结果:

$`first`
[1] FALSE

$second
[1] TRUE

$gradient
[1]  0.005542527  0.173490055 -0.005754945  0.028655981 -0.030150807  0.019224017  0.053829528

$eigenval
[1] 1.047988e+07 5.891393e+04 3.664513e+02 1.221231e+02 8.116275e+01 5.148844e+01 4.598507e+01

系数输出

coef(eq_r)#返回t检验显著的系数

结果:

(Intercept)       trend          x1          x2          z1          z2          z3 
 1.98820762  0.09988275  1.35336157 -0.50747744  2.26312231  0.40927986 -0.07293300 

 

coef(eq_r, midas = TRUE)#返回所有系数

结果:

 (Intercept)        trend           x1           x2           x3           x4           x5           x6           x7 
1.988208e+00 9.988275e-02 5.480768e-01 3.299490e-01 1.986333e-01 1.195797e-01 7.198845e-02 4.333793e-02 2.608997e-02 
          x8           z1           z2           z3           z4           z5           z6           z7           z8 
1.570648e-02 3.346806e-01 4.049071e-01 4.233810e-01 3.826120e-01 2.988388e-01 2.017282e-01 1.176921e-01 5.934435e-02 
          z9          z10          z11          z12          z13          z14          z15          z16          z17 
2.586203e-02 9.740853e-03 3.170901e-03 8.921121e-04 2.169239e-04 4.558760e-05 8.280130e-06 1.299807e-06 1.763484e-07

eq_r的基础上使用函数amweights来形成几个标准的周期函数约束

amweights(p = c(1, -0.5), d = 8, m = 4, weight = nealmon, type = "C")
nealmon(p = c(1, -0.5), d = 4)

eq_r1 <- midas_r(y ~ trend + mls(x, 0:7, 4, amweights, nealmon, "C") + mls(z, 0:16, 12, nealmon), start = list(x = c(1, -0.5), z = c(2, 0.5, -0.1)))
summary(eq_r1)

其它加权形式

fn <- gompertzp#概率密度函数的权重设定
eq_r2 <- midas_r(y ~ trend + mls(x, 0:7, 4, nealmon) + mls(z, 0:16, 12, fn), start = list(x = c(1,-0.5), z = c(1, 0.5, 0.1)))
summary(eq_r2)
eq_r3 <- midas_r(y ~ trend + mls(x, 0:7, 4) + mls(z, 0:16, 12, nealmon), start = list(z = c(1,-0.5)))
summary(eq_r3)

eq_r4 <- midas_r(y ~ trend + mls(y, 1:2, 1) + mls(x, 0:7, 4, nealmon), start = list(x = c(1,-0.5)))                                                                                         
summary(eq_r4)

eq_r5 <- midas_r(y ~ trend + mls(y, 1:2, 1, "*") + mls(x, 0:7, 4, nealmon), start = list(x = c(1,-0.5)))                                                                                               
summary(eq_r5)

eq_r6 <- midas_r(y ~ trend + mls(y, 1:4, 1, nealmon) + mls(x, 0:7, 4, nealmon), start = list(y = c(1, -0.5), x = c(1, -0.5)))                                                                                           
summary(eq_r6)

eq_r7 <- midas_r(y ~ trend + mls(x, 0:7, 4, amweights, nealmon, "B"), start = list(x = c(1, 1, -0.5)))                                                                                         summary(eq_r7)

eq_r8 <- midas_r(y ~ trend + mls(x, 0:7, 4, amweights, nealmon, "C"), start = list(x = c(1, -0.5)))                                                                                         summary(eq_r8)

eq_r9 <- midas_r(y ~ trend + mls(x, 0:7, 4, amweights, nealmon, "A"), start = list(x = c(1, 1, 1, -0.5)))                                                                                      
summary(eq_r9)
fn <- function(p, d) {
  p[1] * c(1:d)^p[2]
}
eq_r10 <- midas_r(y ~ trend + mls(x, 0:101, 4, fn), start = list(x = rep(0, 2)))
summary(eq_r10)

约束的充分性检验

只要误差是独立同分布的,就可以使用hAh_test检验,而hAhr_test是HAC-Roust稳健版本检验。只要在残差中没有观察到显著的hac,在小样本中使用hAh_test检验更精确。

eq_r12 <- midas_r(y ~ trend + mls(x, 0:7, 4, nealmon) + mls(z, 0:16, 12, nealmon), start = list(x = c(1, -0.5), z = c(2, 0.5, -0.1)))
summary(eq_r12)
#约束的充分性检验
hAh_test(eq_r12)
hAhr_test(eq_r12)

结果:

hAh restriction test

data:  
hAh = 16.552, df = 20, p-value = 0.6818

hAh restriction test (robust version)

data:  
hAhr = 14.854, df = 20, p-value = 0.7847

对比:将变量z的函数约束的参数数量由17减少为2个

eq_r13 <- midas_r(y ~ trend + mls(x, 0:7, 4, nealmon) + mls(z, 0:12, 12, nealmon), start = list(x = c(1, -0.5), z = c(2, -0.1)))                                                                                                
hAh_test(eq_r13)
hAhr_test(eq_r13)
summary(eq_r13)

结果:

	hAh restriction test

data:  
hAh = 36.892, df = 17, p-value = 0.00348


	hAh restriction test (robust version)

data:  
hAhr = 32.879, df = 17, p-value = 0.01168

拒绝原假设,即约束不充分

最优模型选取

step1:函数expand_weights_lags定义每个模型的潜在模型集

set_x <- expand_weights_lags(weights = c("nealmon", "almonp"), from = 0, to = c(5, 10), m = 1, start = list(nealmon = rep(1, 2), almonp = rep(1, 3)))


set_z <- expand_weights_lags(weights = c("nealmon", "nbeta"), from = 1, to = c(2, 3), m = 1, start = list(nealmon = rep(0,2), nbeta = rep(0.5, 3)))

step2:所有可能模型的估计

函数midas_r_ic_table返回所有模型的汇总表,以及常用信息准则的相应值和参数约束充分性检验(默认是hAh_test检验)的经验大小。给出了导数检验的结果和优化函数的收敛状态。

eqs.ic <- midas_r_ic_table(y ~ trend + mls(x, 0, m = 4) + fmls(z, 0, m = 12), table = list(z = set_z,x = set_x), start = c(`(Intercept)` = 0, trend = 0))

mod <- modsel(eqs.ic, IC = "AIC", type = "restricted")

step3:查看某个候选模型可微调其收敛性,并更新结果:

eqs_ic$candlist[[5]] <- update(eqs_ic$candlist[[5]], Ofunction = "nls")

eqs_ic <- update(eqs_ic)

手动选取模型

step1:训练集(样本内预测)、测试集(样本外预测)分割

datasplit <- split_data(list(y = y, x = x, z = z, trend = trend), insample = 1:200, outsample = 201:250)

step2:分别拟合待选模型

mod1 <- midas_r(y ~ trend + mls(x, 4:14, 4, nealmon) + mls(z, 12:22, 12, nealmon), data = datasplit$indata, start = list(x = c(10, 1, -0.1), z = c(2, -0.1)))

mod2 <- midas_r(y ~ trend + mls(x, 4:20, 4, nealmon) + mls(z, 12:25, 12, nealmon), data = datasplit$indata, start = list(x = c(10, 1, -0.1), z = c(2, -0.1)))

step3:计算

avgf <- average_forecast(list(mod1, mod2), data = list(y = y, x = x, z = z, trend = trend), insample = 1:200, outsample = 201:250, type = "fixed", measures = c("MSE", "MAPE", "MASE"), fweights = c("EW", "BICW", "MSFE", "DMSFE"))

样本内、样本外预测精度输出:

avgf$accuracy

结果对比:

$`individual`
                                                              Model MSE.out.of.sample MAPE.out.of.sample
1 y ~ trend + mls(x, 4:14, 4, nealmon) + mls(z, 12:22, 12, nealmon)          1.848474           4.474242
2 y ~ trend + mls(x, 4:20, 4, nealmon) + mls(z, 12:25, 12, nealmon)          1.763432           4.391594
  MASE.out.of.sample MSE.in.sample MAPE.in.sample MASE.in.sample
1          0.8155477      1.768720       14.05880      0.6992288
2          0.7976764      1.726365       13.56287      0.6897242

$average
  Scheme      MSE     MAPE      MASE
1     EW 1.803640 4.432918 0.8066121
2   BICW 1.763433 4.391595 0.7976766
3   MSFE 1.802640 4.431945 0.8064017
4  DMSFE 1.801787 4.431112 0.8062216

mod1,mod2预测结果输出:

avgf$forecast

结果对比:

          [,1]     [,2]
 [1,] 22.24057 22.38300
 [2,] 22.12774 22.13778
 [3,] 22.22678 22.21942
 [4,] 22.44535 22.61872
 [5,] 22.36448 22.46157
 [6,] 22.49847 22.66663
 [7,] 22.70192 22.70818
 [8,] 22.49411 22.53326
 [9,] 22.47978 22.54235
[10,] 22.43439 22.46522
[11,] 22.61563 22.57788
[12,] 22.87686 22.88271
[13,] 22.59329 22.64370
[14,] 23.39675 23.47606
[15,] 23.25864 23.27932
[16,] 22.97588 23.09488
[17,] 23.13196 23.21956
[18,] 24.05489 24.04334
[19,] 24.11690 24.16742
[20,] 23.79328 23.91429
[21,] 24.58047 24.58925
[22,] 24.33794 24.45825
[23,] 23.96658 24.03107
[24,] 24.00510 24.12939
[25,] 23.96473 23.99645
[26,] 24.02198 24.04416
[27,] 25.01521 24.91966
[28,] 24.47063 24.59918
[29,] 24.58748 24.51964
[30,] 24.69662 24.72768
[31,] 25.19092 25.22659
[32,] 25.57452 25.72345
[33,] 25.06312 25.17678
[34,] 24.99302 25.09359
[35,] 25.13005 25.16293
[36,] 25.85870 25.85633
[37,] 25.43593 25.46734
[38,] 25.80690 25.88224
[39,] 26.80900 26.82394
[40,] 26.44056 26.63095
[41,] 26.04786 26.17996
[42,] 26.09545 26.29647
[43,] 25.99309 25.98112
[44,] 25.89755 26.03643
[45,] 25.99814 26.18442
[46,] 26.25727 26.38797
[47,] 26.79563 26.92593
[48,] 26.57034 26.68217
[49,] 26.53808 26.51001
[50,] 26.72772 26.76183

组合预测效果输出:

avgf$avgforecast

结果对比:

其中EW:基于特征值加权组合;BICW:BIC信息准则加权组合;MSFE: 预测均方误差加权组合;DMSFE:discounted MSFE

         EW     BICW     MSFE    DMSFE
 [1,] 22.31178 22.38299 22.31346 22.31489
 [2,] 22.13276 22.13778 22.13288 22.13298
 [3,] 22.22310 22.21942 22.22301 22.22294
 [4,] 22.53204 22.61872 22.53408 22.53582
 [5,] 22.41303 22.46157 22.41417 22.41515
 [6,] 22.58255 22.66663 22.58453 22.58622
 [7,] 22.70505 22.70818 22.70512 22.70519
 [8,] 22.51369 22.53326 22.51415 22.51454
 [9,] 22.51107 22.54235 22.51180 22.51244
[10,] 22.44980 22.46522 22.45017 22.45048
[11,] 22.59675 22.57788 22.59631 22.59593
[12,] 22.87979 22.88271 22.87986 22.87991
[13,] 22.61849 22.64370 22.61909 22.61960
[14,] 23.43641 23.47606 23.43734 23.43814
[15,] 23.26898 23.27932 23.26922 23.26943
[16,] 23.03538 23.09488 23.03678 23.03798
[17,] 23.17576 23.21956 23.17679 23.17767
[18,] 24.04912 24.04334 24.04898 24.04887
[19,] 24.14216 24.16742 24.14276 24.14326
[20,] 23.85379 23.91429 23.85521 23.85643
[21,] 24.58486 24.58925 24.58497 24.58505
[22,] 24.39809 24.45825 24.39951 24.40072
[23,] 23.99882 24.03107 23.99958 24.00023
[24,] 24.06724 24.12938 24.06870 24.06996
[25,] 23.98059 23.99645 23.98096 23.98128
[26,] 24.03307 24.04416 24.03334 24.03356
[27,] 24.96744 24.91967 24.96631 24.96535
[28,] 24.53491 24.59918 24.53642 24.53771
[29,] 24.55356 24.51964 24.55277 24.55208
[30,] 24.71215 24.72768 24.71252 24.71283
[31,] 25.20876 25.22659 25.20918 25.20954
[32,] 25.64898 25.72345 25.65074 25.65224
[33,] 25.11995 25.17678 25.12129 25.12244
[34,] 25.04331 25.09359 25.04449 25.04550
[35,] 25.14649 25.16293 25.14688 25.14721
[36,] 25.85751 25.85633 25.85749 25.85746
[37,] 25.45163 25.46734 25.45200 25.45232
[38,] 25.84457 25.88224 25.84546 25.84622
[39,] 26.81647 26.82394 26.81665 26.81680
[40,] 26.53575 26.63095 26.53799 26.53991
[41,] 26.11391 26.17996 26.11546 26.11679
[42,] 26.19596 26.29647 26.19833 26.20035
[43,] 25.98711 25.98112 25.98696 25.98684
[44,] 25.96699 26.03642 25.96862 25.97002
[45,] 26.09128 26.18442 26.09348 26.09535
[46,] 26.32262 26.38797 26.32416 26.32548
[47,] 26.86078 26.92593 26.86232 26.86363
[48,] 26.62626 26.68217 26.62757 26.62870
[49,] 26.52404 26.51001 26.52371 26.52343
[50,] 26.74478 26.76183 26.74518 26.74552

参考文献:《Mixed Frequency Data Sampling Regression Models: The R Package midasr》

  • 13
    点赞
  • 91
    收藏
    觉得还不错? 一键收藏
  • 18
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 18
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值