查看数据结构:
import pandas as pd
data = pd.read_csv('/Users/liailan/tmp/AirPassengers.csv')
print(data.head())
Unnamed: 0 time value
0 1 1949-01 112
1 2 1949-02 118
2 3 1949-03 132
3 4 1949-04 129
4 5 1949-05 121
输出:
Unnamed: 0 time value
0 1 1949-01 112
1 2 1949-02 118
2 3 1949-03 132
3 4 1949-04 129
4 5 1949-05 121
载入数据:
dateparse = lambda x:pd.datetime.strptime(x,'%Y-%m')
data = pd.read_csv('/Users/liailan/tmp/AirPassengers.csv',parse_dates=['time'],date_parser=dateparse)
data = data.set_index('time')
确定序列是否稳定:
直观观察:
import matplotlib.pylab as plt
%matplotlib inline
plt.plot(data.value)
可以看到,这个序列具有明显的趋势性和季节性,那么首先需要将序列转化为稳定序列,常用的有差分方法,这里采用另外一种,将数据分解为趋势序列,季节序列和残差序列
import numpy as np
ts_log = np.log(data['value'])
from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(ts_log,freq=12)
trend = decomposition.trend #趋势
seasonal = decomposition.seasonal #季节性
residual = decomposition.resid #残差序列
residual.dropna(inplace=True)
判断残差序列的稳定性: