Time series Analysis with R(二)

本文来源: <<Introductory Time Series With R>>

1. Time Series Data

(1)Air passenger bookings

trend:非period
seasonal variation:repeating pattern.

#########################################################
# Time Series Data
#########################################################

#########################################################
# A flying start:Air passenger bookings
#########################################################
data(AirPassengers)
head(AirPassengers)
AP <- AirPassengers
class(AP)
start(AP)
end(AP)
frequency(AP)


plot(AP, ylab = "Passengers(1000's)")



aggregate:可消除季节的影响,将trend限制在可控制的范围内;

cycle:抽取出每条数据中的season数据

layout(1:2)
layout.show(2)
plot(aggregate(AP))
boxplot(AP ~ cycle(AP))



(2)Decomposition of series

Models

x_t = m_t + s_t + z_t(x_t 是observed series, m_t是trend,s_t是seasonal effect, z_t是error term【服从均值为0的同方差的正态分布】)


(3)smoothing

时间序列平滑预测是指用平均的方法,把时间序列中的随机波动剔除掉,使序列变得比较平滑,以反映出其基本轨迹,并结合一定的模型进行预测。所平均的范围可以是整个序列(整体平均数),也可以是序列中的一部分(局部平均数)

方法:

在将预测评估的时间点前后使用points,当在smoothing algorithm在begin 及end points处的点缺失若干个时,算法仍能与end points相适应时,该序列即可满足要求。

R中的stl方法即是平滑计算的方法,还有filtering


(4)Decompostion in R

data(AirPassengers)
head(AirPassengers)
AP <- AirPassengers
class(AP)
start(AP)
end(AP)
frequency(AP)
plot(AP, ylab = "Passengers(1000's)")
AP.decom <- decompose(AP, 'multiplicative')
plot(AP.decom)
Trend <- AP.decom$trend
Seasonal <- AP.decom$seasonal
ts.plot(cbind(Trend, Trend*Seasonal), lty = 1:2)






(5)随机游走模型

x <- w <- rnorm(1000)
for (t in 2:1000) x[t] <- x[t-1] + w[t]
plot(x, type = 'l')


(6)fitted AR model

wwww = 'http://www.massey.ac.nz/~pscowper/ts/global.dat'
Global <- scan(www)
Global.ts <- ts(Global, st = c(1856, 1), end = c(2005, 12), fr = 12)
Global.ar <- ar(aggregate(Global.ts, FUN = mean), method = "mle") #拟合一个AR模型
mean(aggregate(Global.ts, FUN = mean)) #求Global.ts中各子集的均值
Global.ar$order  #求Global.ar的除数
Global.ar$ar #求Global.ar中ar的coefficient
acf(Global.ar$res[-(1:Global.ar$order)], lag = 50)





2. Regression

Trends: stochastic(随机)或deterministic(确定)

随机趋势:random walk或autoregressive process

确定趋势及季节变动可用regression进行modelled

(1)线型模型


simulation

############################################################
# simulation Z_t是Z_t = 0.8*Z_t-1 + w_t的AR(1),w_t是白噪声
############################################################
set.seed(1)
z <- w <- rnorm(100, sd = 20)
for (t in 2:100) z[t] <- 0.8*z[t-1] + w[t]
Time <- 1:100
x <- 50 + 3*Time +z
plot(x, xlab = 'time', type = 'l')


Fitted

############################################################
# Fitted models
############################################################
x.lm <- lm(x~ Time)
coef(x.lm)
vcov(x.lm) #计算一个模型的variance-covariance,返回的是模型的var与co-var
sqrt(diag(vcov(x.lm)))
############################################################
# > vcov(x.lm)
#(Intercept)         Time
#(Intercept)   23.815013 -0.355447951
#Time          -0.355448  0.007038573

#sqrt(diag(vcov(x.lm)))
#(Intercept)        Time 
#4.88006278  0.08389621 
############################################################

acf(resid(x.lm))
pacf(resid(x.lm))




(2)广义最小二乘

library(nlme)
x.gls <- gls(x ~ Time, cor = corAR1(0.8))
coef(x.gls)
sqrt(diag(vcov(x.gls)))

> coef(x.gls)
(Intercept)        Time 
  58.233018    3.042245 
> sqrt(diag(vcov(x.gls)))
(Intercept)        Time 
 11.9245679   0.2024447 


temp.gls <- gls(temp ~ time(temp), cor = corAR1(0.7))
confint(temp.gls)  #confint求置信区间

(3)Linear models with seasonal variables

叠加的季节模型
x_t = m_t + s_t + z_t
(m_t确定趋势,s_t季节趋势,z_t白噪声)

###########################################
# 预测
###########################################
new.t <- seq(2006, len = 2*12, by = 1/12)
alpha <- coef(temp.lm)[1]
beta <- rep(coef(temp.lm)[2:13], 2)
(alpha*new.t + beta)[1:4]


new .dat <- data.frame9(Time = new.t, Seas = rep(1:12, 2))
predict(temp.ln, new.dat)[1:24]


(4)Harmonic seasonal models

TIME <- seq(1, 12, len = 1000)
plot(TIME, sin(2*pi*TIME/12), type = 'l')
plot(TIME, sin(2*pi*TIME/12) + 0.2* sin(2*pi*2*TIME/12) + 0.1* sin(2*pi*4*TIME/12) + 0.1 * cos(2*pi*4*TIME/12), type = 'l')



###########################################
# Fit to simulated series
###########################################
SIN <- COS <- matrix(nr = length(TIME), nc = 6)
for (i in 1: 6) {
  COS[, i] <- cos(2*pi*i*TIME/12)
  SIN[, i] <- sin(2*pi*i*TIME/12)
} 
x.lm1 <- lm(x~ TIME + I(TIME^2) + COS[, 1] + SIN[,1] +
              COS[,2] + SIN[,2] + COS[, 3] + SIN[,3] +COS[, 4] +
              SIN[,4] + COS[,5] + SIN[,5] + COS[,6] + SIN[,6])
coef(x.lm1)/sqrt(diag(vcov(x.lm1)))

x.lm2 <- lm(x ~ I(TIME^2) + SIN[, 1] + SIN[, 2])
coef(x.lm2)/sqrt(diag(vcov(x.lm2)))

coef(x.lm2)

AIC(x.lm1)
AIC(x.lm2)
AIC(lm(x ~ TIME + I(TIME^2) + SIN[,1] + SIN[,2] + SIN[,4] + COS[,4]))


(5)Logarithmic transformations

data(AirPassengers)
AP <- AirPassengers
plot(AP)
plot(log(AP))
SIN <- COS <- matrix(nr = length(AP), nc = 6)
for (i in 1:6) {
  SIN[, i] <- sin(2*pi*i*time(AP))
  COS[, i] <- cos(2*pi*i*time(AP))
}
TIME <- (time(AP) - mean(time(AP)))/sd(time(AP))
mean(time(AP))
sd(time(AP))
AP.lm1 <- lm(log(AP) ~ TIME + I(TIME^2) + I(TIME^3) + I(TIME^4) +
               SIN[,1] + COS[,1] + SIN[,2] + COS[,2] + SIN[,3] + COS[,3] +
               SIN[,4] + COS[,4] + SIN[,5] + COS[, 5] + SIN[,6] + COS[,6])
coef(AP.lm1)/sqrt(diag(vcov(AP.lm1)))


AP.lm2 <- lm(log(AP) ~ TIME + I(TIME^2) + SIN[,1] + COS[,1] + 
               SIN[, 2] + COS[, 2] + SIN[,3] + SIN[, 4] + COS[, 4] + SIN[,5])
coef(AP.lm2)/sqrt(diag(vcov(AP.lm2)))

AIC(AP.lm1)
AIC(AP.lm2)

acf(resid(AP.lm2))

AP.gls <- gls(log(AP) ~ TIME + I(TIME^2) + SIN[, 1] + COS[,1] +
                SIN[,2] + COS[,2] + SIN[,3] + SIN[,4] + COS[, 4] + SIN[,5],
              cor = corAR1(0.6))
coef(AP.gls)/sqrt(diag(vcov(AP.gls)))

AP.ar <- ar(resid(AP.lm2), order = 1, method = 'mle')
AP.ar$ar

运行结果 

> AP <- AirPassengers
> plot(AP)
> plot(log(AP))
> SIN <- COS <- matrix(nr = length(AP), nc = 6)
> for (i in 1:6) {
+ SIN[, i] <- sin(2*pi*i*time(AP))
+ COS[, i] <- cos(2*pi*i*time(AP))
+ }
> TIME <- (time(AP) ~ mean(time(AP)))/sd(time(AP))
Error in (time(AP) ~ mean(time(AP)))/sd(time(AP)) : 
  non-numeric argument to binary operator
> TIME <- (time(AP) - mean(time(AP)))/sd(time(AP))
> mean(time(AP))
[1] 1954.958
> sd(time(AP))
[1] 3.476109
> AP.lm1 <- lm(log(AP) ~ TIME + I(TIME^2) + I(TIME^3) + I(TIME^4) +
+ SIN[,1] + COS[,1] + SIN[,2] + COS[,2] + SIN[,3] + COS[,3] +
+ SIN[,4] + COS[,4] + SIN[,5] + COS[, 5] + SIN[,6] + COS[,6])
> coef(AP.lm1)/sqrt(diag(vcov(AP.lm1)))
(Intercept)        TIME   I(TIME^2)   I(TIME^3)   I(TIME^4)    SIN[, 1]    COS[, 1] 
744.6853769  42.3818033  -4.1623354  -0.7514937   1.8725575   4.8678543 -26.0548660 
   SIN[, 2]    COS[, 2]    SIN[, 3]    COS[, 3]    SIN[, 4]    COS[, 4]    SIN[, 5] 
 10.3951486  10.0039542  -4.8436269  -1.5601366  -5.6662082   1.9459153  -3.7662018 
   COS[, 5]    SIN[, 6]    COS[, 6] 
  1.0259589   0.1500951  -0.5213860 
> AP.lm2 <- lm(log(AP) ~ TIME + I(TIME^2) + SIN[,1] + COS[,1] +
+ SIN[, 2] + COS[, 2] + SIN[,3] + SIN[, 4] + COS[, 4] + SIN[,5])
> coef(AP.lm2)/sqrt(diag(vcov(AP.lm2)))
(Intercept)        TIME   I(TIME^2)    SIN[, 1]    COS[, 1]    SIN[, 2]    COS[, 2] 
 922.625445  103.520628   -8.235569    4.916041  -25.813114   10.355319    9.961335 
   SIN[, 3]    SIN[, 4]    COS[, 4]    SIN[, 5] 
  -4.789997   -5.612235    1.949391   -3.730666 
> AIC(AP.lm1)
[1] -447.9475
> AIC(AP.lm2)
[1] -451.0749
> acf(resid(AP.lm2))
> AP.gls <- gls(log(AP) ~ TIME + I(TIME^2) + SIN[, 1] + COS[,1] +
+ SIN[,2] + COS[,2] + SIN[,3] + SIN[,4] + COS[, 4] + SIN[,5],
+ cor = corAR1(0.6))
> coef(AP.gls)/sqrt(diag(vcov(AP.gls)))
(Intercept)        TIME   I(TIME^2)    SIN[, 1]    COS[, 1]    SIN[, 2]    COS[, 2] 
 398.841575   45.850520   -3.651287    3.299713  -18.177844   11.768934   11.432678 
   SIN[, 3]    SIN[, 4]    COS[, 4]    SIN[, 5] 
  -7.630660  -10.749938    3.571395   -7.920777 
> AP.ar <- ar(resid(AP.lm2), order = 1, method = 'mle')
> AP.ar$ar
[1] 0.6410685





(6)Non-linear models

set.seed(1)
w <- rnorm(100, sd = 10)
z <- rep(0, 100)
for (t in 2:100) z[t] <- 0.7*z[t-1] + w[t]
Time <- 1:100
f <- function(x) exp(1 + 0.05*x)
x <- f(Time) +z
plot(x, type = 'l')
abline(0,0)

x.nls <- nls(x ~ exp(apl0 + apl1*Time), start = list(apl0 = 0.1, apl1 = 0.5))
summary(x.nls)$parameters


运行结果:

> set.seed(1)
> w <- rnorm(100, sd = 10)
> z <- rep(0, 100)
> for (t in 2:100) z[t] <- 0.7*z[t-1] + w[t]
> Time <- 1:100
> f <- function(x) exp(1 + 0.05*x)
> x <- f(Time) +z
> plot(x, type = 'l')
> abline(0,0)
> x.nls <- nls(x ~ exp(apl0 + apl1*Time), start = list(apl0 = 0.1, apl1 = 0.5))
> summary(x.nls)$parameters
       Estimate   Std. Error  t value     Pr(>|t|)
apl0 1.17640122 0.0742954415 15.83410 9.204016e-29
apl1 0.04828424 0.0008188461 58.96619 2.347533e-78

(7)Forecasting from regression

new.t <- time(ts(start = 1961, end = c(1970, 12), fr = 12))
TIME <- (new.t - mean(time(AP)))/sd(time(AP))
SIN <- COS <- matrix(nr = length(new.t), nc = 6)
for ( i in 1:6) {
  COS[, i] <- cos(2*pi*i*new.t)
  SIN[, i] <- sin(2*pi*i*new.t)
}
SIN <- SIN[, -6]
new.dat <- data.frame(TIME = as.vector(TIME), SIn= SIN, COS = COS)
AP.pred.ts <- exp(ts(predict(AP.lm2, new.dat), st = 1961, fr = 12))
ts.plot(log(AP), log(AP.pred.ts), lty = 1:2)
ts.plot(AP, AP.pred.ts, lty = 1:2)



(8)Inverse transform and bias correction

set.seed(1)
sigma <- 1
w <- rnorm(1e+06, sd = sigma)
mean(w)
mean(exp(w))
exp(sigma^2/2)



summary(AP.lm2)$r.sq
sigma <- summary(AP.lm2)$sigma
lognorm.correction.factor <- exp((1/2)*sigma^2)
empirical.correction.factor <- mean(exp(resid(AP.lm2)))
lognorm.correction.factor
empirical.correction.factor

AP.pred.ts <- AP.pred.ts*empirical.correction.factor

> set.seed(1)
> sigma <- 1
> w <- rnorm(1e+06, sd = sigma)
> mean(w)
[1] 4.690776e-05
> mean(exp(w))
[1] 1.64861
> exp(sigma^2/2)
[1] 1.648721
> summary(AP.lm2)$r.sq
[1] 0.9888318
> sigma <- summary(AP.lm2)$sigma
> lognorm.correction.factor <- exp((1/2)*sigma^2)
> empirical.correction.factor <- mean(exp(resid(AP.lm2)))
> lognorm.correction.factor
[1] 1.001171
> empirical.correction.factor
[1] 1.00108
> AP.pred.ts <- AP.pred.ts*empirical.correction.factor


3.平稳时间序列与非平稳时间序列 

(1)平稳时间序列

simulation and fitting

set.seed(1)
x <- arima.sim(n = 10000, list(ar = -0.6, ma = 0.5))
coef(arima(x, order = c(1, 0, 1)))
set.seed(1)
> x <- arima.sim(n = 10000, list(ar = -0.6, ma = 0.5))
> coef(arima(x, order = c(1, 0, 1)))
         ar1          ma1    intercept 
-0.596966371  0.502703368 -0.006571345 

(2)非平稳时间序列

simulation and fitting

set.seed(1)
x <- w <- rnorm(1000)
for ( i in 3:1000) x[i] <- 0.5 * x[i-1] + x[i - 1] - 0.5 * x[i - 2] + w[i] + 0.3 * w[i-1]
arima(x, order = c(1, 1 , 1))

x <- arima.sim(model = list(order = c(1, 1, 1), ar = 0.5, ma = 0.3), n = 1000)
arima(x, order = c(1, 1, 1))

> set.seed(1)
> x <- w <- rnorm(1000)
> for ( i in 3:1000) x[i] <- 0.5 * x[i-1] + x[i - 1] - 0.5 * x[i - 2] + w[i] + 0.3 * w[i-1]
> arima(x, order = c(1, 1 , 1))

Call:
arima(x = x, order = c(1, 1, 1))

Coefficients:
         ar1     ma1
      0.4235  0.3308
s.e.  0.0433  0.0450

sigma^2 estimated as 1.067:  log likelihood = -1450.13,  aic = 2906.26
> x <- arima.sim(model = list(order = c(1, 1, 1), ar = 0.5, ma = 0.3), n = 1000)
> arima(x, order = c(1, 1, 1))

Call:
arima(x = x, order = c(1, 1, 1))

Coefficients:
         ar1     ma1
      0.5567  0.2502
s.e.  0.0372  0.0437

sigma^2 estimated as 1.079:  log likelihood = -1457.45,  aic = 2920.91

季节ARIMA模型






GARCH series



fit  example




4. Long-Memory Processes

library(fracdiff)
set.seed(1)
fds.sim <- fracdiff.sim(10000, ar = 0.9, d = 0.4)
x <- fds.sim$series
fds.fit <- fracdiff(x, nar = 1)

n <- lenght(x)
L <- 30
d <- fds.fit$d
fdc <- d
fdc[1] <-fdc
for (k in 2:L) fdc[k] <- fdc[k-1] * (d + 1 - k) / k
y <- rep(0, L)

for (i in (L+1) : n) {
  csm <- x[i]
  for (j in 1: L) csm <- csm + ((-1)^j)*fdc[j]*x[i-j]
  y[i] <- csm
}

y <- y[(L+1): n]
z.ar <- ar(y)
ns <- 1 + z.ar$order
z <- z.ar$res [ns: length(y)]
par(mfcol = c(2, 2))
plot(as.ts(x), ylab = 'x')
acf(x); acf(y); acf(z)

summary(fds.fit)
ar(y)


5. Spectral Analysis

(1)periodic signals

periodic signals: 周期性的发生

周期,频率

(2)Spectrum

sine波

cosin波

(3)Spectral of simulated series

layout(1:2)
set.seed(1)
x <- rnorm(2048)
spectrum(x, log = c("no")) #默认的为log方法进行处理,此处我们将log设置为no
spectrum(x, log = c("no"), span = 65)#span是有关滑动平均的


(4)Discrete Fourier Transform(DFT)离散傅立叶转换

6. State Space Models




























  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值