Forecasting with daily data

I’ve had sev­eral emails recently ask­ing how to fore­cast daily data in R. Unless the time series is very long, the eas­i­est approach is to sim­ply set the fre­quency attribute to 7.

y <- ts(x, frequency=7)

Then any of the usual time series fore­cast­ing meth­ods should pro­duce rea­son­able fore­casts. For example

fit <- ets(y)
fc <- forecast(fit)

When the time series is long enough to take in more than a year, then it may be nec­es­sary to allow for annual sea­son­al­ity as well as weekly sea­son­al­ity. In that case, a mul­ti­ple sea­sonal model such as TBATS is required.

y <- msts(x, seasonal.periods=c(7,365.25))
fit <- tbats(y)
fc <- forecast(fit)

This should cap­ture the weekly pat­tern as well as the longer annual pat­tern. The period 365.25 is the aver­age length of a year allow­ing for leap years. In some coun­tries, alter­na­tive or addi­tional year lengths may be nec­es­sary. For exam­ple, with the Turk­ish elec­tric­ity data analysed in De Liv­era et al (JASA 2011), we used three sea­sonal peri­ods: 7, 354.35 and 365.25. The period 354.37 is the aver­age length of the Islamic calendar.

Cap­tur­ing sea­son­al­ity asso­ci­ated with mov­ing events such as Easter or the Chi­nese New Year is more dif­fi­cult. Even with monthly data, this can be tricky as the fes­ti­vals can fall in either March or April (for Easter) or in Jan­u­ary or Feb­ru­ary (for the Chi­nese New Year). The usual sea­sonal mod­els don’t allow for this, and even the com­plex sea­son­al­ity dis­cussed in my JASA paper assumes that the sea­sonal pat­terns occur at the same time in each year. The best way to deal with mov­ing hol­i­day effects is to use dummy vari­ables. How­ever, nei­ther ETS nor TBATS mod­els allow for covari­ates.  A state space model of the same form as TBATS but with mul­ti­ple sources of error and covari­ates could be used, but I don’t have any R code to do that.

Instead, I would use a regres­sion model with ARIMA errors, where the regres­sion terms include any dummy hol­i­day effects as well as the longer annual sea­son­al­ity. Unless there are many decades of data, it is usu­ally rea­son­able to assume that the annual sea­sonal shape is unchanged from year to year, and so Fourier terms can be used to model the annual sea­son­al­ity. Sup­pose we use K=5 Fourier terms to model annual sea­son­al­ity, and that the hol­i­day dummy vari­ables are in the vec­tor holiday with 100 future val­ues in holidayf. Then the fol­low­ing code will fit an appro­pri­ate model.

y <- ts(x, frequency=7)
z <- fourier(ts(x, frequency=365.25), K=5)
zf <- fourierf(ts(x, frequency=365.25), K=5, h=100)
fit <- auto.arima(y, xreg=cbind(z,holiday))
fc <- forecast(fit, xreg=cbind(zf,holidayf), h=100)

The order K can be cho­sen by min­i­miz­ing the AIC of the fit­ted model.

