R语言时间序列分析小例

最新推荐文章于 2023-03-01 16:17:01 发布

_唯欣主义

最新推荐文章于 2023-03-01 16:17:01 发布

阅读量2.1k

点赞数

分类专栏：基础数学学习文章标签： r语言数学建模

本文链接：https://blog.csdn.net/m0_50523097/article/details/122135206

版权

基础数学学习专栏收录该内容

17 篇文章 2 订阅

订阅专栏

本文介绍了使用ARIMA模型对工资数据进行时间序列分析的过程，包括数据预处理、建模选择、参数估计和模型诊断。通过对数据进行一阶差分和Box-Cox变换消除上升趋势，然后尝试了多种ARIMA模型组合。最终，选择了两个模型进行预测，但发现预测结果高于实际值，可能由于数据的对数上升趋势。建议可能需要考虑趋势项的惩罚来改善预测效果。

摘要由CSDN通过智能技术生成

复习心烦，偶遇大作业，故摸鱼

作业题目

自由选取一组数据(可以是R 自带的数据集、或者其它来源，鼓励选取一些有趣的课题进行数据分析)，利用我们这学期所学知识建立恰当模型（ARIMA、GARCH 等），小组成员（不超过三人一组, 自由组队）中的一人安排 5 至 10 分钟左右的PPT 课堂展示（教学周第16 周、无需展示代码，仅汇报初步结果即可，我给出点评后再完善后提交）。内容基本要求如下:
• 说明数据来源背景, 需要分析解决的问题;
• 包含完整的建模过程(拟合、估计、诊断检验、预测)；
• 结论或者建议(依赖于你拟解决的问题)。
注意！！！数据量的选取需要适中, 留取部分原始数据作为评估模型预测能力使用。
此部分对最后成绩的贡献比例为20%。

Background of Data Set

A multivariate yearly time series from 1960 to 1979 with variables (Included in dataset TSA)

Test set and fit set

Select the first 60 data for fitting , while the last 12 (one year) as the test data to test the final model we struct (cut the data frame by using R function window( ))

Data preprocessing

Observe the image directly and there is an obvious logarithmic upward trend. Carry out differential processing on dataset——wages to eliminate the upward trend,
在这里插入图片描述
test the unit root(DK test), the result are as followings:

> adf.test(model)

	Augmented Dickey-Fuller Test

data:  model
Dickey-Fuller = -3.6536, Lag order = 3, p-value = 0.03639
alternative hypothesis: stationary

We are not satisfied with the $p$ value. In order to avoid excessive difference, we try to use Box-Cox transformation. From the results, it can be seen that the value of $\lambda$ is 1. Although the value of $p$ value for significance 0.01 is not good, it can only reduce the significance requirement ( $\alpha$ = 0.05), rather than searching complex alternatives.
在这里插入图片描述
We finally preprocess the dataset by difference : model = diff (log(waves)). From the figure below, we can see that the peaks and troughs are distributed regularly. Combined with the actual background and common sense of life of the data set, workers’ salary will increase at the end of the year. After all, everyone wants to run. Workers’ salary should be fluctuated in an annual cycle. We can see from the image of data after difference. The salary at the end of each year is relatively high, but there are three hidden dangers:
① the periodicity is not very obvious. Although it has practical significance, for the size of sample is so small.
② Although it is expected that the salary at the end of middle age should be high, the data in 1986 conflicts with this statement, which is likely to be related to the social background at that time, and this information is unknown to us. The prediction work we do later may not be so good
③ It can also be seen from the figure that the growth rate of wages has slowed down (because we have just done differential processing, in fact, wages have been rising all the time)
在这里插入图片描述

Choose Model

Calculate the ACF and PACF of the model. In fact, the diagrams of ACF and PACF can’t put forward feasible suggestions for us to choose $p$ and $q$ , hence it’s not necessary to calculate eacf. The results given must not be so good. We try to make a seasonal difference to the model.
在这里插入图片描述
After seasonal difference, it is also difficult to select appropriate models according to ACF and PACF images,

This model is a little troublesome, for model $\rm ARIMA(p,d,q)\times(P,D,Q)_s$ ，now we juat know the value of $d$ and $D$ $ARIMA(p,1,q)\times(P,1,Q)_{11}$ , Next we select the optimal subset based on the BIC criterion. The following two figures respectively select the optimal subset from fit_set and model2.
在这里插入图片描述
We consider the models： $ARIMA(0,1,0)\times(0,1,1)_{12}$ , $ARIMA(0,1,0)\times(1,1,1)_{12}$ , $ARIMA(0,1,1)\times(0,1,0)_{12}$ ,
$ARIMA(0,1,0)\times(1,1,0)_{12}$ , $ARIMA(1,1,0)\times(0,1,0)_{12}$ , $ARIMA(1,1,0)\times(1,1,1)_{12}$

Parameter estimation

a1=arima(fit_set,order=c(0,1,0),seasonal = list(order=c(0,1,1),period = 12))
a2=arima(fit_set,order=c(0,1,0),seasonal = list(order=c(1,1,1),period = 12))
a3=arima(fit_set,order=c(0,1,1),seasonal = list(order=c(0,1,0),period = 12))
a4=arima(fit_set,order=c(0,1,0),seasonal = list(order=c(1,1,0),period = 12))
a5=arima(fit_set,order=c(1,1,0),seasonal = list(order=c(0,1,0),period = 12))
a6=arima(fit_set,order=c(1,1,0),seasonal = list(order=c(1,1,1),period = 12))

Model diagnosis

Diagnose the six models by R function tsdiag( ). During the diagnosing, it was found that the Ljung box test of model a1 was not good, one point was below the critical line and one point was very close to the critical line; The residual autocorrelation diagram of model a3 and model a5 is also not good, there’re some multiple outliers, so we exclude model a1, model a3 and model a5 first. There are still model a2, model a4 and model a6 models left. From the results of parameter estimation and model diagnosis, these models have no obvious advantages or disadvantages. However, based on the modeling principle of “simple but not simpler”, we exclude model a6 and leave model a2 and model a4 models for final prediction. The following are the results of testing a2 and a4 model.
在这里插入图片描述

Model prediction

a2=stats::arima(fit_set,order=c(0,1,0),seasonal = list(order=c(1,1,1),period = 12))
a2_fore=forecast(a2,h=11,level = c(95))
plot(a2_fore)
lines(a2_fore$fitted,col="black")
lines(wages,col="red")

在这里插入图片描述

a4=stats::arima(fit_set,order=c(0,1,0),seasonal = list(order=c(1,1,0),period = 12))
a4_fore=forecast(a4,h=11,level = c(95))
plot(a4_fore)
lines(a4_fore$fitted,col="black")
lines(wages,col="red")

在这里插入图片描述

Conclusion

We can see the prediction is not successful, for each predictive value is a little higher than the true value. It may be caused by the logarithmic upward trend of the data. Differential processing is suitable for linear trends, but not for logarithmic trend. A possible solution is that we can add a approxiate penalization on t, in order to slow down the upward trend.

Code

library(forecast)
library(TSA)
library(xts)
library(tseries)
data(wages)
help(wages)
plot(wages)
# 分组
fit_set=window(wages,start=c(1981,7),end=c(1986,6))
test_set=window(wages,start=c(1986,7),end=c(1987,6))
# # 确定趋势
# vec_wage=as.vector(wages)
# newwage=data.frame(log_wage=log(vec_wage),day=c(1:length(vec_wage)))
# fit=lm(newwage$log_wage~newwage$day,data=newwage)
# summary(fit)
# rm(fit)
# 一次差分
model=diff(fit_set)
plot(model,ylab="model")
abline(h=0)
abline(v=c(1982:1986),lty=2)
adf.test(model)
# 尝试Box-Cox变换
b=BoxCox.lambda(model)
lambda=which.max(b)
b=BoxCox.ar(model-min(model)*1.1)
# 模型识别
par(mfrow=c(1,2))
acf(as.vector(model),lag.max=60,main="model")
pacf(as.vector(model),lag.max = 60,main="model")
# 考虑季节模型
model2=diff(model,lag = 12)
acf(as.vector(model2),lag.max=60,main="model2")
pacf(as.vector(model2),lag.max = 60,main="model2")
par(mfrow=c(1,1))
# 模型识别
par(mfrow=c(1,2))
dis=armasubsets(y=fit_set,nar=13,nma=13,y.name="test",ar.method="ols")
plot(dis)
dis=armasubsets(y=model2,nar=4,nma=4,y.name="diff(1)",ar.method="ols")
plot(dis)
# 模型拟合
a1=arima(fit_set,order=c(0,1,0),seasonal = list(order=c(0,1,1),period = 12))
a2=arima(fit_set,order=c(0,1,0),seasonal = list(order=c(1,1,1),period = 12))
a3=arima(fit_set,order=c(0,1,1),seasonal = list(order=c(0,1,0),period = 12))
a4=arima(fit_set,order=c(0,1,0),seasonal = list(order=c(1,1,0),period = 12))
a5=arima(fit_set,order=c(1,1,0),seasonal = list(order=c(0,1,0),period = 12))
a6=arima(fit_set,order=c(1,1,0),seasonal = list(order=c(1,1,1),period = 12))
# 模型诊断
tsdiag(a1)
tsdiag(a2)
tsdiag(a3)
tsdiag(a4)
tsdiag(a5)
tsdiag(a6)
# 模型预测
a2=stats::arima(fit_set,order=c(0,1,0),seasonal = list(order=c(1,1,1),period = 12))
a2_fore=forecast(a2,h=11,level = c(95))
plot(a2_fore)
lines(a2_fore$fitted,col="black")
lines(wages,col="red")

a4=stats::arima(fit_set,order=c(0,1,0),seasonal = list(order=c(1,1,0),period = 12))
a4_fore=forecast(a4,h=11,level = c(95))
plot(a4_fore)
lines(a4_fore$fitted,col="black")
lines(wages,col="red")