- Stationarity test
The first step in time series preprocessing is to judge its stationarity. Only the time series of non-white noise has the value of analysis and prediction.
Stationary means that the statistical value fluctuates around a constant and the range of fluctuation is bounded.
We can distinguish between two different types of stationary processes. If the common distribution function of a stochastic process does not change with time, it is called a strict stationary , and the other is a weak stationary.[1]
There are three general ways of judging:
(1)Draw the trend graph of time series and see the trend judgment
(2)Draw ACF graph and PACF graph, ACF graph and PACF graph of stationary time series, and either drag tail or truncate tail.
(3)Check whether there is unit root in the sequence, if there is unit root, it is non-stationary time series.
Next, the AirPassengers were analyzed in time series
Build time series:
1:30 represents the establishment of 30 time sequence values based on 1-30, frequence==4 represents quarterly cycle, and start represents the starting date
Install the “tseries” package, just install it on first run.
Install the “forecast” package, just install it on first run
Using tsdisplay function to display ACF and PACF graphs, this function can also be used to determine the parameters of arima function (we use arima model later), the results drawn are as follows:
It can be seen from the autocorrelation graph that the time series is not very smooth, and the partial autocorrelation graph works well. Although there are a few popping points, but the overall is still shaking around 0. Now we can use another method to do the stationarity test, by unit root.
Adf. test was used to test the unit root. According to the autocorrelation diagram, the autocorrelation coefficient was reduced to 0 without express, showing a tail. The unit root test further verified that there was a unit root, so the sequence was non-stationary.
When the time series is non-stationary, the solution is difference. What is difference?
First-order difference refers to the subtraction between two sequence values one phase apart from the original sequence value. The k-order difference is the subtraction between the values of two sequences separated by k periods. If a time series is stationary after difference operation, it is a differential stationary series and can be analyzed by ARIMA model.
We can use ndiffs to determine how many orders of difference are needed
So let’s do a first order difference, and then check it out.
Check autocorrelation graph ACF:
Partial autocorrelation PACF was tested as shown in the figure below:
Establish ARIMA model
ARIMA model is also called autoregressive moving average model. It refers to the model established by converting non-stationary time series into stationary time series, and then regression of dependent variable only to its lag value, present value and lag value of random error term. ARIMA model regards the data series formed by prediction indicators over time as a random sequence. The dependence of this set of random variables reflects the continuity of the original data in time. It is not only affected by external factors, but also has its own variation rules[2]
Use decompose () function to break up the different components, the results the graph is:
As can be seen from the figure, the seasonality of time series is still very strong.
Start building models:
To build the arima model, we first use the auto-arima in the forcast package for parameter estimation, and automatically find the most P and q values
The summary() function can also be used to summarize the results of the fitting,Training set error measures:
It can be seen from the above results that the optimal model for fitting is ARIMA(2,1,1)(0,1,0)[12].
Use models to predict
Prediction model is the most important work when using quantitative prediction method to make prediction. A predictive model is one that predicts the quantitative relationships between things described in mathematical language or formulas. It reveals the inner regularity of things to some extent, and it is taken as the direct basis to calculate the predicted value. Therefore, it has a great impact on the accuracy of prediction. Any specific prediction method is characterized by its specific mathematical model. There are many kinds of prediction methods, each with its own prediction model
According to the above code, I chose the data predicted 10 years later, and the result figure is:
It is predicted that the time series data value will still show an upward trend after 10 years, and it is very similar to the curve change radians of previous years.