# R语言中ARMA，ARIMA（Box-Jenkins），SARIMA和ARIMAX模型用于预测时间序列数据


By <- diff(y) <span style="color:#888888"># y_i - B y_i</span>
B3y <- diff(y, <span style="color:#880000">3</span>) <span style="color:#888888"># y_i - B^3 y_i</span>
message(paste0(<span style="color:#880000">"y is: "</span>, paste(y, collapse = <span style="color: 
## y is: 1,3,5,10,20
## By is: 2,2,5,10
## B^3y is: 9,17

自相关函数

<span style="color:#000000"><span style="color:#000000"><code>get_autocor <- <strong>function</strong>(x, lag) {
x.left <- x[<span style="color:#880000">1</span>:(length(x) - lag)]
x.right <- x[(<span style="color:#880000">1</span>+lag):(length(x))]
autocor <- cor(x.left, x.right)
<strong>return</strong>(autocor)
}</code></span></span>

<span style="color:#000000"><span style="color:#000000"><code><span style="color:#888888"># correlation of measurements 1 time point apart (lag 1)</span>
get_autocor(y, <span style="color:#880000">1</span>) </code></span></span>
## [1] 0.9944627
<span style="color:#000000"><span style="color:#000000"><code><span style="color:#888888"># correlation of measurements 2 time points apart (lag 2)</span>
get_autocor(y, <span style="color:#880000">2</span>)</code></span></span>
## [1] 0.9819805

φkk:=Corr(yt,yt−k|yt−1,⋯,yt−k+1)k=0,1,2,⋯φkk:=Corr⁡(yt,yt−k|yt−1,⋯,yt−k+1)k=0,1,2,⋯

<span style="color:#000000"><span style="color:#000000"><code>par(mfrow = c(<span style="color:#880000">1</span>,<span style="color:#880000">2</span>))
acf(y) <span style="color:#888888"># conventional ACF</span>
pacf(y) <span style="color:#888888"># pACF</span></code></span></span>

### 分解时间序列数据

•  StSt
• TtTt
• ϵtϵt

yt = St + Tt + ϵt.yt = St + Tt + ϵt.

• 添加剂：每个时期的季节效应放大器相似。
• 乘法：季节性趋势随时间序列的变化而变化。

AirPassengers数据集提供了乘法时间序列的示例。

<span style="color:#000000"><span style="color:#000000"><code>data(AirPassengers)
plot(AirPassengers)</code></span></span>

log(StTtϵt)=log(St)+log(Tt)+log(ϵt)log⁡(StTtϵt)=log⁡(St)+log⁡(Tt)+log⁡(ϵt)AirPassengers 数据集：

<span style="color:#000000"><span style="color:#000000"><code>plot(log(AirPassengers))</code></span></span>

<span style="color:#000000"><span style="color:#000000"><code>plot(decompose(AirPassengers, type = <span style="color:#880000">"multiplicative"</span>))</code></span></span>

<span style=" /span>] <span style="color:#888888"># DAX data</span>
<span style="color:#888888"># data do not seem to be multiplicative, use additive decomposition</span>
decomposed <- decompose(daxData, type = < 

• 整体价值稳步上升。
• 季节性趋势强烈：每年年初，股价相对较低，并在夏季结束时达到相对最大值。
• 除1997年和1998年之间的最终测量外，随机噪声的贡献可以忽略不计。

### 固定与非固定过程

<span style="color "># climate data </span>
<strong>library</strong>(tseries)
data(nino) 

## ARMA模型

ARMA代表自回归移动平均线。ARMA模型仅适用于固定过程，并具有两个参数：

• p：自回归（AR）模型的顺序
• q：移动平均（MA）模型的顺序

ARMA模型可以指定为

ÿ^Ť= c + εŤ+ Σi = 1pφ一世ÿt - 我- Σj = 1qθĴεt - j。y^t=c+ϵt+∑i=1pϕiyt−i−∑j=1qθjϵt−j.

• cc
• ϵtϵtttϵt∼N(0,σ2)ϵt∼N(0,σ2)
• ϕ∈Rpϕ∈Rp
• ytyttt
• θ∈Rqθ∈Rq
• ϵtϵttt

### 使用backshift运算符制定ARMA模型

（ 1 - Σi = 1pφ一世乙一世）yŤ= （ 1 - Σj = 1qθĴ乙Ĵ）εĴ(1−∑i=1pϕiBi)yt=(1−∑j=1qθjBj)ϵj

ϕp(B)=1−∑pi=1ϕiBiϕp(B)=1−∑i=1pϕiBiθq(B)=1−∑qj=1θjBjθq(B)=1−∑j=1qθjBj

ϕp(B)yt=θq(B)ϵt.ϕp(B)yt=θq(B)ϵt.

## ARIMA模型

dd

• p：自回归（AR）模型的顺序
• d：差异程度
• q：移动平均（MA）模型的顺序

(1−B)dyt.(1−B)dyt.

ϕp(B)(1−B)dyt=θq(B)ϵt.ϕp(B)(1−B)dyt=θq(B)ϵt.

d=0d=0(1−B)0yt=yt(1−B)0yt=ytdd

(1−B)1yt(1−B)2yt=yt−yt−1=(1−2B+B2)yt=yt−2yt−1+yt−2(1−B)1yt=yt−yt−1(1−B)2yt=(1−2B+B2)yt=yt−2yt−1+yt−2

### pp

p∈N0p∈N0d=0d=0Byt=yt−1Byt=yt−1ϕ1ϕ1yt−1yt−1yt−2yt−2ϕ1ϕ1ϕ2ϕ2

p=1p=1d=0d=0q=0q=0

y^t=μϵt+ϕ1yt−1y^t=μϵt+ϕ1yt−1

<span styl 0" /span>)
par(mfrow = c(<span style="color:#880000">2</span>, <span style="color:#880000">2</span>))
<span style="color:#888888">#  880000">"ARIMA(1,0,0)"</span>)
<span style="color:#888888"># plot partial acf</span>
acf(x, type = <span style="color:#880000">"partial"</span>, main = <span style="color:#880000">"Partial aut "color:#880000">0.65</span>, <span style="color:#880000">0.3</span>)),
n = <span style="color:#880000">1000</span>)
plot(x, main = <span style="color:#880000">"ARIMA(2,0,0)"</span>)
acf(x, type = <span style="color:#880000">"partial"< span></span>

### dd

ARIMA（0,1,0）模型简化为随机游走模型

y^t=μ+ϵt+yt−1.y^t=μ+ϵt+yt−1.

差异的影响

<span style="color:#0 "><span style /span>), main = <span style="color:#880000">"After differencing"</span>)</code></span></span>

### 移动平均线的影响

<span style="color:#00 # Example for ARIMA(0,0,1)</span>
x <- arima.sim(list(ma = <span style="color:#880000">0.75</span>),
n = <span style="color:#880000">1000</span>)
plot(x, main = <span style="color:#880000">"ARIMA(0,0,1)"</span>)
acf(x, main = <span style="color:#88000 n"</span>)</code></span></span>

### 在AR和MA术语之间进行选择

• pp
• rr

AR和MA术语的影响

AR和MA术语的组合导致以下时间序列数据：

<span style="color:# "># ARIMA(1,0,1)</span>
x <- arima.sim(list(order = c(<span style="color:#880000">1</span>,<span style="color:#880000">0</span>,<span style="color:#880000">1</span>), ar = <span style="color:#880000">0.8</span>, ma = <span style="color:#880000">0.8</span>), n = <span style="color:#880000">1000</span>)
plot(x, main = <span style="color:#880000">"ARIMA(1,0,1)"</span>)
acf(x, main = <span   ARIMA(2,0,2)</span>
x <- arima.sim( )
plot(x, main = <span style="color:#880000">"ARIMA(2,0,2)"</span>)
acf(x, main = <span style="color:#880000">"ACF"</span>)
pacf(x, main = <span style="color:#880000">"pACF"</span>)</code></span></span>

SARIMA模型

•  P：季节性自回归（SAR）项的数量
• D：季节差异程度
• 问：季节性移动平均线（SMA）的数量

## R中的预测

auto.arimaforecastppddqqPPDDQQstepwiseapproximationFALSE

### SARIMA模型用于固定过程

<span style="color:#000000"><span style="color:#000000"><code>plot(nino3.4)</code></span></span>

d=0d=0

<span style="color:#000000"><span style="color:#000000"><code>nino.components <- decompose(nino3.4) an>

(P,D,Q)S(P,D,Q)SD=0D=0ninoS=12S=12

<span style="color:#000000"><span style="color:#000000"><code>nino.season <- nino.components$seasonal ode></span></span> P=2P=2Q=0Q=0 非季节性模型 ppqq <span style="color:#000000">< tyle="color:#000000"><code>par(mfrow = c(<span style="color:#880000">1</span>,<span style="color:#880000">2</span>)) acfp <- acf(n #888888"># transform lag from years to months</span> acfp$lag <- acfp$lag * <span style="color:#880000">12</span> plot(acfp, main = <span style="color:#880000"> months</span> acfpl$lag <- acfpl$lag * <span style="color:#880000">12</span> plot(acfpl, main = <span style="color:#880000">"pACF"</span>)</code></span></span> 我们可以使用包中的Arima函数来拟合模型forecast <span style=" r:#888888"># non-seasonal model: (p,d,q)</span> order.non.seasonal <- c(<span style="color:#880000">2</span>,<span style="color:#880000">0</span> r:#880000">2</span>,<span style="color:#880000">0</span>,<span style="color:#880000">0</span>) A <- Arima(nino3.4, order = order.non.seasonal, seasonal = order.seasonal)</code> </span> 我们现在可以使用该模型来预测未来Nino 3.4地区的气温如何变化。有两种方法可以从预测模型中获得预测。第一种方法依赖于predict函数，而第二种方法使用包中的forecast函数forecast。使用该predict功能，我们可以通过以下方式预测和可视化结果： <span sty #000000"><span style="color:#000000"><code><span style="color:#888888"># to construct a custom plot, we can us n> ## ## Attaching package: 'ggplot2' ## The following object is masked from 'package:forecast': ## ## autolayer <span style="color:#00000 yle="color:#880000">0</span>), cbind(fortify(fore df$y + plot.df$sd * <span style="color:#880000">1.96</span> plot.df$lower <- plot.df$y - plot.df$sd * <span style="color:#880000">1.96</span>
ggplot(plot.df, aes(x = x ,y = y)) +
ylab(<span style="color:#880000">"Temperature"</span>) + xlab(<span style="color:#880000">"Year"</span>)</code></span></span>

# use the forecast function to use the built-in plotting function:
forecast <- forecast(A, h = 60) # predict 5 years into the future
plot(forecast)

### 用于非平稳数据的ARIMA模型

<span style="color:#000000"><span style=" 0"><code><strong>library</strong>(astsa)
data(gtemp)  

d=1d=1

<span style="color:#000000"><span style=" 

<span style="color:#000 0"><span style="color:#000000"><code>par(mfrow = c(<span style="color:#880000">1</span>,<span s code></span></span>

p=0p=0q=1q=1

<span style="color:#000000"><span style=" 

<span style="color:#000000"> tyle="color:#000000"><cod 8"># predict 30 years into the future</span> pan></span>

### 关于空气质量数据集的ARIMAX

<span style="color:#000000"><span style="color:#000000"><code>data(airquality)
ozone <- subset(na.omit(airquality))
set.seed(<span style="color:#880000">123</span>)
N.train <- ceiling(<span style="color:#880000">0.7</span> * nrow(ozone))
N.test <- nrow(ozone) - N.train
<span style="color:#888888"># ensure to take only subsequent measurements for time-series analysis:</span>
trainset <- seq_len(nrow(ozone))[<span style="color:#880000">1</span>:N.train]
testset <- setdiff(seq_len(nrow(ozone)), trainset)</code></span></span>

<span :#000000"><span st style="color:#880000">"%j"</span>))
max.date <- as.numeric(format(max(dates), <span style="color:#880000">"%j"</span>))
ozone.ts <- ts(ozone$Ozone, start = min.date, end = max.date, frequency = <span style="color:#880000">1</span>) ozone.ts <- window(ozone.ts, <span style="color:#880000" 21</span>, <span style="color:#880000">231</span>) <span style="color:#888888"># deal with repetition due to missing time values</span> ozone$t <- seq(start(ozone.ts t</span></code></span></span>

<span style="color:#000000"><span style="color:#000000"><code><strong>library</strong>(ggplot2)
ggplot(ozone, aes(x = t, y = Ozone)) + geom_line() +
geom_point()</code></span></span>

<span style="color:#000000">< :#880000">"partial"</span>)</code></span></span>

<span style="color eviously developed weighted negative binomial model</span>
<strong>library</strong>(MASS)
get.weights <- <strong>function</strong>(ozone) {
z.scores <- (ozone$Ozone - mean(ozone$Ozone)) / sd(ozone$Ozone) weights <- exp(z.scores) weights <- l$pred, ozone[testset, <span style="color:#880000">"Ozone"</span>])^<span style="color:#880000">2</span>
print(Rsquared.linear)</code></span></span>
## [1] 0.7676977
<span style="color:#000000"><span style="color:#000000"><code>print(Rsquared.temporal)</code></span></span>
## [1] 0.7569718

### 关于空气质量数据集的ARIMAX

<span style="color:#000000">< 

Icecream数据集包含以下变量：

• 缺点：人均品脱的冰淇淋消费量。
• 收入::美元平均每周家庭收入。
• 价格：每品脱冰淇淋的价格。
• temp：华氏温度的平均温度。

<span style="col  style="color:#880000">"1951-03-18"</span>), as.Date(<span style="color:#880000">"1953-07-11"</span>)))
months <- c(seq(<span style="color:#880000">3</span>,<span style="color:#880000">12</span>), se or:#880000">1</span>, <span style="color:#880000">52</span>, <span style="color:#880000">4</span>))
ice.ts <- ts(Icecream$cons, start = c(<span style="color:#880000">1951</span>, <span style="color:#880000">3</span>), end = c(<span style="color:#880000">1953</span>, <span style="color:#880000">6</span>), frequency = <span style="color:#880000">52</span>/<span style="color:#880000">4</span>)</code></span></span> 我们现在调查数据： <span style="color:#000000"><span style="color:#000000"><code>plot(decompose(ice.ts))</code></span></span> 因此，数据有两种趋势： 1. 总体而言，1951年至1953年间，冰淇淋的消费量大幅增加。 2. 冰淇淋销售在夏季达到顶峰。 ppqq <span style="color:#000000"><span style acf(ice.ts, type = <span style="color:#880000">"partial"</span>)</code></span></span> 由于季节性趋势，我们可能适合ARIMA（1,0,0）（1,0,0）模型。但是，由于我们知道温度和外生变量的收入，因此它们可以解释数据的趋势：   <span style="color:#000000"><span style="color:#000000"><code>plot(Icecream$temp) <span style="color:#888888"># explains the seasonal trend</span></code>< 

 yle="color:#880000">"income"</span>, <span style="color:#880000">"temp"</span>)],
order = c(<span style="color:#880000">1</span>,<span style="color:#880000">0</span>,<span style="color:#880000">0</span>))
preds <- forecast(A, xreg = Icecream[test, c(<span style="color:#880000">"income"</span>, <span style="color:#880000">"temp"</span>)])
plot(preds)
lines(window(ice.ts, c(<span style="color:#880000">1951</span>, <span style="color:#880000">22</span>), c(<span style="color:#880000">1951</span>, <span  ="color:#880000">24</span>)
lines(x = as.numeric(rownames(as.data.frame(preds))), y = as.data.frame(preds)[,<span style="color:#880000">2</span>], lty = <span style="color:#880000">2</span>)</code></span></span>

ARIMAX（1,0,0）模型的预测显示为蓝色，而ARIMA（1,0,0）（1,0,0）模型的预测显示为虚线。实际观察值显示为黑线。结果表明，ARIMAX（1,0,0）明显比ARIMA（1,0,0）（1,0,0）模型更准确。

<span style="color:#000000"><span style="color:#000000"><code>preds <- forecast(A.season, h = <span style="color:#880000">60</span>)
plot(preds)</code></span></span>