季节性时间序列模型与多重季节性模型（代码+分析）

最新推荐文章于 2025-03-05 20:49:16 发布

finance褪黑素

最新推荐文章于 2025-03-05 20:49:16 发布

阅读量8.7k

点赞数 9

文章标签： python 金融大数据数据挖掘

本文链接：https://blog.csdn.net/celiaweiwei/article/details/133860795

版权

本文聚焦季节性时间序列模型，介绍了SARIMA、STL、ETS、SARIMAX等常见模型，阐述了季节指数计算方法和季节性差分操作。还提及多重季节性模型的定阶、建模、预测及还原季节性，详细讲解了SARIMA模型的组成、表示形式和建立步骤。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

专题三季节性模型

3.1 季节性时间序列模型

季节性时间序列模型是一种用于处理时间序列数据中包含季节性模式的方法。它通常用于预测和分析在固定时间间隔（如每季度、每月或每周）内出现的重复性模式。

常见的季节性时间序列模型包括：

季节性自回归移动平均模型（Seasonal Autoregressive Integrated Moving Average，SARIMA）：它是ARIMA模型的季节性扩展。ARIMA模型是自回归（AR）和移动平均（MA）的组合，而SARIMA模型还考虑了季节性成分。它的参数包括季节性自回归阶数、季节性移动平均阶数、非季节性自回归阶数、非季节性移动平均阶数和差分阶数。
季节性分解模型（Seasonal Decomposition of Time Series，STL）：它通过将时间序列拆分成季节性、趋势和残差三个成分，来捕捉季节性模式。这种方法可以帮助我们理解时间序列数据中季节性和趋势的变化。
季节性指数平滑模型（Seasonal Exponential Smoothing，ETS）：它是指数平滑模型的季节性扩展。指数平滑模型用于对时间序列进行平滑和预测，而ETS模型还考虑了季节性因素。
季节性自回归集成移动平均模型（Seasonal Autoregressive Integrated Moving-Average Exogenous，SARIMAX）：它是SARIMA模型的扩展，可以包含外生变量（如其他影响因素）来改进预测性能。

季节性时间序列模型的选择取决于数据的特点、季节性模式的复杂性以及是否有外生变量等因素。在应用季节性时间序列模型之前，通常需要对数据进行季节性调整、平稳化处理，以及选择合适的模型参数。这些模型在处理季节性数据方面都有一定的优势和适用场景。

# 载入相关的库
from scipy import  stats
import statsmodels.api as sm  # 统计相关的库
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(color_codes=True) #seaborn设置背景

from statsmodels.graphics.tsaplots import plot_acf as ACF #自相关图
from statsmodels.tsa.stattools import adfuller as ADF  #平稳性检测
from statsmodels.graphics.tsaplots import plot_pacf as PACF   #偏自相关图
from statsmodels.stats.diagnostic import acorr_ljungbox as LjungBox   #白噪声检验
from statsmodels.tsa.arima.model import ARIMA

# 导入数据
data=pd.read_csv('AirPassengers.csv')
data.head()

	Unnamed: 0	Month	#Passengers
0	0	1949-01-01	112
1	1	1949-02-01	118
2	2	1949-03-01	132
3	3	1949-04-01	129
4	4	1949-05-01	121

del data['Unnamed: 0']
data=data.set_index(['Month'])
data.head()

	#Passengers
Month
1949-01-01	112
1949-02-01	118
1949-03-01	132
1949-04-01	129
1949-05-01	121

# 可视化时间序列

data.plot(figsize=(12,4))
plt.xlabel('Date')
plt.ylabel('Number of air passengers')

在这里插入图片描述

总体上可以看出可以看出序列有着季节性，我们在看看每年的情况

compare={}
for index in data.index:
    year = index[:4]
    month = index[5:7]
    if year not in compare.keys():
        compare.update({year:{month:data['#Passengers'][index]}})
    else:
        compare[year].update({month:data['#Passengers'][index]})
compare = pd.DataFrame(compare)     

compare.plot(figsize=(20,6))
plt.legend(loc=0)

在这里插入图片描述

可以看出序列整体走势非常相似

3.1.1 季节指数

所谓季节指数就是用简单平均法计算的周期内各时期季节性影响的相对数。

$\large x_{ij} = \bar x \cdot S_j + I_{ij}$

首先计算周期内各期平均数 $\large \bar x_k = \frac {\sum_{i=1}^n x_{ik}}{n} , k=1,2 \cdots ,m$

然后计算总平均数 $\large \bar x = \frac{\sum_{i=1}^n \sum_{k=1}^m x_{ik}}{nm}$

再计算季节指数 $\large S_k = \frac{\bar x_k}{\bar x},k=1,2 \cdots ,m$

季节指数反映了该季度与总平均值之间的一种比较稳定的关系：

如果比值大于1，说明该季度的值常常会高于总平均值；

如果比值小于1，说明该季度的值常常低于总平均值；

如果序列的季节指数都近似为1，就说明该序列没有明显的季节性。

我们来计算一下上面数据的季节指数：

ave_k = compare.mean(axis=1)
ave = ave_k.mean()
compare['S'] = ave_k/ave
compare

	1949	1950	1951	1952	1953	1954	1955	1956	1957	1958	1959	1960	S
01	112	115	145	171	196	204	242	284	315	340	360	417	0.862473
02	118	126	150	180	196	188	233	277	301	318	342	391	0.838392
03	132	141	178	193	236	235	267	317	356	362	406	419	0.963853
04	129	135	163	181	235	227	269	313	348	348	396	461	0.952853
05	121	125	172	183	229	234	270	318	355	363	420	472	0.969799
06	135	149	178	218	243	264	315	374	422	435	472	535	1.111909
07	148	170	199	230	264	302	364	413	465	491	548	622	1.253425
08	148	170	199	242	272	293	347	405	467	505	559	606	1.252533
09	136	158	184	209	237	259	312	355	404	404	463	508	1.078909
10	119	133	162	191	211	229	274	306	347	359	407	461	0.951069
11	104	114	146	172	180	203	237	271	305	310	362	390	0.830662
12	118	140	166	194	201	229	278	306	336	337	405	432	0.934123

根据季节指数来看数据具有一定的季节性，各月份的季节指数存在差异

3.1.2 季节性差分

在有些应用中，季节性是次要的，我们需要把它从数据中消除，这个过程叫季节调整，其中季节性差分化是一种常见的方法。

在之前我们介绍过差分（正规差分化），其形式为： $\large \Delta x_t = x_t - x_{t-1}$

我们将它推广，如果一个序列具有周期性，且周期为 $s$ ，则季节性差分化为： $\large \Delta_s x_t = x_t - x_{t-s}$

dataDiff = data.diff(12)[12:]
dataDiff.plot(figsize=(15,6))

在这里插入图片描述

可以看出，季节性基本上被消除，但是序列不够平稳，我们不妨再做1阶差分看看

diff1 = dataDiff.diff()[1:]
diff1.plot(figsize=(15,6))

在这里插入图片描述

t = sm.tsa.stattools.adfuller(diff1)  # ADF检验
print("p-value:   ",t[1])

p-value:    1.856511600123444e-28

经过ADF检验，我们发现p-value小于显著性水平0.05，序列是平稳的。

这里我也写了一个函数来进行季节性检验

#用于检测季节性的函数
def test_stationarity(timeseries):
    
    #设置滚动窗口
    movingAverage = timeseries.rolling(window=12).mean()
    movingSTD = timeseries.rolling(window=12).std()
    
    #绘制滚动平均值和方差
    plt.figure(figsize=(15,4))
    # 使用range函数生成连续的数字列表作为横坐标
    x_values = list(range(len(timeseries)))
    orig = plt.plot(x_values,timeseries, color='blue', label='Original')
    mean = plt.plot(x_values,movingAverage, color='red', label='Rolling Mean')
    std = plt.plot(x_values,movingSTD, color='black', label='Rolling Std')
    plt.legend(loc='best')
    plt.title('Rolling Mean & Standard Deviation')
   
    plt.show(block=False)
    
    #ADF检验:
    print('Results of Dickey Fuller Test:')
    dftest = ADF(timeseries['#Passengers'], autolag='AIC')
    dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
    for key,value in dftest[4].items():
        dfoutput['Critical Value (%s)'%key] = value
    print(dfoutput)

diff1.dropna(inplace=True)
test_stationarity(diff1)

在这里插入图片描述

Results of Dickey Fuller Test:
Test Statistic                -1.559562e+01
p-value                        1.856512e-28
#Lags Used                     0.000000e+00
Number of Observations Used    1.300000e+02
Critical Value (1%)           -3.481682e+00
Critical Value (5%)           -2.884042e+00
Critical Value (10%)          -2.578770e+00
dtype: float64

移动平均值和标准差几乎平行于 x 轴，因此它们没有趋势性。这说明季节性已经被消除了。

3.2 多重季节性模型

上一节中，经过了季节性差分和正规差分后，序列成为了平稳时间序列，则我们可以用ARMA模型对多重差分后的模型建模。则我们有模型：
$\large (1-B^s)(1-B)x_t = a_t - \theta a_{t-1} - \Theta a_{t-s} + \theta \Theta a_{t-s-1} \\ \large |\theta| < 1 ,|\Theta| < 1$

其中， $B$ 为向后推移算子，代表前一时刻的值。 $B$ 的 $s$ 次方则为前 $s$ 时刻的值。 { $a_t$ }为白噪声序列， $s$ 为序列的周期。上述模型为航空模型。

另外，序列经过了季节性差分和正规差分消除序列相关性（序列平稳），则可以认为是间隔 $s$ 和间隔1的周期共同影响，因此该模型称为多重季节性ARMA模型

3.2.1 定阶

#ACF & PACF plots

lag_acf = ACF(diff1, lags=30)
lag_pacf = PACF(diff1, lags=30)
plt.show()

在这里插入图片描述

从图中可知，q的取值在9阶、23阶都有可能，p的取值在9阶、22阶都有可能

进一步结合信息准则来判断

print(sm.tsa.arma_order_select_ic(diff1,max_ar=10,max_ma=10,ic='aic')['aic_min_order'])

(6, 9)

3.2.2 建模和预测

from statsmodels.tsa.arima.model import ARIMA

temp = np.array(dataDiff.values) # 载入passengers序列，使用去除季节性后的数据
model =ARIMA(temp,order=(6, 1, 9))  
res = model.fit()  
print(res.summary())

plt.rcParams['font.sans-serif'] = ['simhei'] #字体为黑体
plt.rcParams['axes.unicode_minus'] = False #正常显示负号 
plt.figure(figsize=(10,4))
plt.plot(dataDiff.values,'b',label='#passengers')
plt.plot(res.fittedvalues[1:], 'r',label='SARIMA model')
# plt.title('RSS: %.4f'%sum((res.fittedvalues - dataDiff['#Passengers'])**2))
plt.xticks(None)
plt.legend()
plt.show()

                               SARIMAX Results                                
==============================================================================
Dep. Variable:                      y   No. Observations:                  132
Model:                 ARIMA(6, 1, 9)   Log Likelihood                -492.804
Date:                Sun, 06 Aug 2023   AIC                           1017.607
Time:                        13:36:26   BIC                           1063.610
Sample:                             0   HQIC                          1036.300
                                - 132                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1         -0.5441      0.199     -2.738      0.006      -0.934      -0.155
ar.L2         -0.5765      0.177     -3.257      0.001      -0.923      -0.230
ar.L3         -0.0131      0.260     -0.051      0.960      -0.523       0.496
ar.L4          0.2455      0.203      1.210      0.226      -0.152       0.643
ar.L5          0.6091      0.200      3.045      0.002       0.217       1.001
ar.L6          0.7437      0.145      5.140      0.000       0.460       1.027
ma.L1          0.2867     42.163      0.007      0.995     -82.351      82.924
ma.L2          0.5228     30.148      0.017      0.986     -58.567      59.612
ma.L3         -0.1872     52.095     -0.004      0.997    -102.292     101.917
ma.L4         -0.4904     60.006     -0.008      0.993    -118.100     117.119
ma.L5         -0.6672     39.349     -0.017      0.986     -77.789      76.455
ma.L6         -0.8580     67.335     -0.013      0.990    -132.833     131.117
ma.L7          0.2039     31.195      0.007      0.995     -60.938      61.346
ma.L8         -0.1734     22.657     -0.008      0.994     -44.581      44.234
ma.L9          0.3646     15.300      0.024      0.981     -29.622      30.351
sigma2        96.9965   4077.036      0.024      0.981   -7893.847    8087.840
===================================================================================
Ljung-Box (L1) (Q):                   0.23   Jarque-Bera (JB):                 9.62
Prob(Q):                              0.63   Prob(JB):                         0.01
Heteroskedasticity (H):               2.23   Skew:                             0.06
Prob(H) (two-sided):                  0.01   Kurtosis:                         4.32
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

在这里插入图片描述

res.fittedvalues

array([ 0.00000000e+00,  2.99987745e+00,  6.63906357e+00,  8.47045456e+00,
        6.43421418e+00,  3.96721019e+00,  1.16104199e+01,  1.95559745e+01,
        1.98457537e+01,  1.79953720e+01,  1.64304319e+01,  1.27013626e+01,
        1.69093980e+01,  2.55311769e+01,  2.63350764e+01,  3.15830421e+01,
        2.97081039e+01,  4.10358898e+01,  2.82549533e+01,  2.60002427e+01,
        2.95319863e+01,  3.20272994e+01,  2.09427277e+01,  3.13813568e+01,
        2.49872644e+01,  3.19489266e+01,  2.55643820e+01,  1.63835304e+01,
        1.46053515e+01,  1.75140003e+01,  3.26483739e+01,  3.74955701e+01,
        3.24416747e+01,  2.37433037e+01,  3.44731413e+01,  2.36261594e+01,
        2.68838810e+01,  2.11135456e+01,  2.24752008e+01,  3.52466452e+01,
        5.65939181e+01,  3.89366952e+01,  2.56592326e+01,  2.96908778e+01,
        3.69353872e+01,  2.77045383e+01,  1.39875950e+01,  1.15654045e+01,
        1.81691062e+01,  7.18002453e+00, -9.15149400e+00,  4.82689410e-01,
        4.88001848e-02,  7.29863475e+00,  1.74440408e+01,  2.96417253e+01,
        2.38426846e+01,  2.63265181e+01,  1.14139002e+01,  2.48281715e+01,
        2.24997723e+01,  3.57679424e+01,  4.62938499e+01,  3.97311106e+01,
        2.16377247e+01,  3.94896840e+01,  4.90119607e+01,  6.13330608e+01,
        5.29900976e+01,  4.36337683e+01,  4.01339935e+01,  4.24827977e+01,
        4.35634260e+01,  4.38982085e+01,  3.87204841e+01,  4.58979163e+01,
        4.75860322e+01,  4.77075795e+01,  5.08555424e+01,  4.69759556e+01,
        5.48545596e+01,  4.72034517e+01,  3.18699450e+01,  2.98443004e+01,
        2.95219985e+01,  3.03749871e+01,  3.31955506e+01,  3.15639656e+01,
        3.41521806e+01,  3.74938971e+01,  4.06140179e+01,  5.58595243e+01,
        5.85403370e+01,  4.25735494e+01,  3.80869760e+01,  3.85368106e+01,
        3.17146301e+01,  2.54165139e+01,  1.75410261e+01,  8.10108958e+00,
        9.64188329e+00,  8.27189345e+00,  1.14725762e+01,  2.58387040e+01,
        3.64049736e+01,  1.07058552e+01,  7.00774219e+00,  1.94196267e+00,
        8.52177415e+00,  1.86848853e+01,  2.79325668e+01,  3.60733657e+01,
        4.97577773e+01,  4.16140939e+01,  4.43206968e+01,  5.45823638e+01,
        4.54610159e+01,  5.62632173e+01,  4.80053929e+01,  4.77196774e+01,
        6.64954835e+01,  6.03449924e+01,  3.36602549e+01,  2.73630528e+01,
        5.33440593e+01,  5.79827318e+01,  6.09581713e+01,  5.48110579e+01,
        6.02274489e+01,  4.55147931e+01,  4.48001212e+01,  2.47040159e+01])

a=dataDiff.values.tolist()
one_dim_array = [item for sublist in a for item in sublist]
delta = res.fittedvalues  - one_dim_array  # 残差
plt.figure(figsize=(10,6))
plt.plot(delta,'r',label=' residual error')
plt.legend(loc=0)

在这里插入图片描述

acf,q,p = sm.tsa.acf(delta,nlags=10,qstat=True)  ## 计算自相关系数 及p-value
out = np.c_[range(1,11), acf[1:], q, p]
output=pd.DataFrame(out, columns=['lag', "AC", "Q", "P-value"])
output = output.set_index('lag')
output

	AC	Q	P-value
lag
1.0	-0.040826	0.225047	0.635221
2.0	-0.031512	0.360155	0.835206
3.0	-0.055589	0.783861	0.853323
4.0	-0.021818	0.849639	0.931672
5.0	-0.031630	0.988980	0.963450
6.0	0.000664	0.989042	0.986024
7.0	0.000901	0.989157	0.995002
8.0	0.037945	1.194539	0.996696
9.0	-0.037657	1.398459	0.997832
10.0	-0.068114	2.071103	0.995766

观察p-value可知，该序列可以认为没有相关性，近似得可以认为残差序列接近白噪声。说明模型是显著的。

score = 1 - delta.var()/temp.var()
print(score)

0.6643583483988917

model =ARIMA(temp,order=(6, 1, 9)).fit()  
y = model.forecast(5) #使用.forecast()方法可以直接外推时间线预测（仅ARIMA模型可以）
y

array([39.89301892, 32.95434062, 39.57820142, 36.28694024, 31.5818024 ])

3.2.3 还原序列季节性

predictions_ARIMA_diff = pd.Series(res.fittedvalues[1:], copy=True)
print(predictions_ARIMA_diff.head())

0    2.999877
1    6.639064
2    8.470455
3    6.434214
4    3.967210
dtype: float64

predictions_ARIMA = np.cumsum(predictions_ARIMA_diff.values)

# 使用步长12进行差分后，进行还原操作
restored_data = np.zeros(len(predictions_ARIMA_diff) + 12)  # 创建一个与差分数据长度+12的数组

for i in range(12, len(restored_data)):
    restored_data[i] = restored_data[i - 12] + predictions_ARIMA_diff.values[i - 12]

plt.plot(data.values,'b')
plt.plot(restored_data,'orange')

在这里插入图片描述

3.3 SARIMA

SARIMA（Seasonal Autoregressive Integrated Moving Average）模型是一种时间序列预测模型，是ARIMA模型的扩展，用于处理具有季节性的时间序列数据。SARIMA模型在预测和分析季节性数据方面非常有用，它结合了自回归（AR）、差分（I）和移动平均（MA）的概念，以及季节性分析。

SARIMA模型由以下几个部分组成：

季节性成分（Seasonal Component）：SARIMA模型考虑了数据的季节性成分，该成分是在时间序列中以一定周期（例如每年、每月、每周）重复出现的模式。SARIMA模型使用季节性差分来处理这种季节性成分。
自回归成分（Autoregressive Component）：这部分和ARIMA模型中的自回归部分类似，表示当前时刻的观测值与过去若干时刻的观测值之间的关系。
差分成分（Integrated Component）：这部分和ARIMA模型中的差分部分类似，表示通过差分操作将时间序列转化为平稳序列，以便进行建模。
移动平均成分（Moving Average Component）：这部分和ARIMA模型中的移动平均部分类似，表示当前时刻的观测值与过去若干时刻的误差项之间的关系。

SARIMA模型的表示形式为 SARIMA(p, d, q)(P, D, Q)s，其中：

p: 自回归阶数（Autoregressive Order）
d: 差分阶数（Differencing Order）
q: 移动平均阶数（Moving Average Order）
P: 季节性自回归阶数（Seasonal Autoregressive Order）
D: 季节性差分阶数（Seasonal Differencing Order）
Q: 季节性移动平均阶数（Seasonal Moving Average Order）
s: 季节性周期（Seasonal Period）

SARIMA模型的建立通常涉及参数选择、模型拟合、残差分析和预测等步骤。它可以通过统计方法、优化算法等进行参数估计，然后利用已知数据进行模型拟合，最终用来预测未来的时间序列值。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# 示例数据（假设你有一个时间序列数据）
np.random.seed(0)
n = 200
time = np.arange(n)
data = 10 + np.sin(2*np.pi*time/12) + np.random.normal(0, 1, n)

# 创建时间序列 DataFrame
df = pd.DataFrame({'data': data}, index=pd.date_range(start='2020-01-01', periods=n, freq='M'))

# 可视化时间序列
plt.figure(figsize=(10, 4))
plt.plot(df.index, df['data'])
plt.title('Time Series Data')
plt.xlabel('Date')
plt.ylabel('Value')
plt.show()

在这里插入图片描述

# 绘制 ACF 和 PACF 图来帮助选择 SARIMA 参数
fig, axes = plt.subplots(1, 2, figsize=(10, 4))
plot_acf(df['data'], lags=20, ax=axes[0])
plot_pacf(df['data'], lags=20, ax=axes[1])
plt.show()

在这里插入图片描述

# 创建并拟合 SARIMA 模型
model = SARIMAX(df['data'], order=(1, 1, 1), seasonal_order=(1, 1, 1, 12))
results = model.fit()

# 打印模型摘要
print(results.summary())

# 预测未来的数据
forecast_steps = 12
forecast = results.forecast(steps=forecast_steps)

# 可视化原始数据和预测结果
plt.figure(figsize=(10, 4))
plt.plot(df.index, df['data'], label='Original Data')
plt.plot(forecast.index, forecast, label='Forecast')
# plt.fill_between(forecast.index, forecast.conf_int()['lower data'], forecast.conf_int()['upper data'], color='gray', alpha=0.2)
plt.title('SARIMA Forecast')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.show()

                                     SARIMAX Results                                      
==========================================================================================
Dep. Variable:                               data   No. Observations:                  200
Model:             SARIMAX(1, 1, 1)x(1, 1, 1, 12)   Log Likelihood                -280.985
Date:                            Sun, 06 Aug 2023   AIC                            571.969
Time:                                    14:29:44   BIC                            588.125
Sample:                                01-31-2020   HQIC                           578.515
                                     - 08-31-2036                                         
Covariance Type:                              opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1          0.0134      0.082      0.164      0.870      -0.146       0.173
ma.L1         -0.9094      0.041    -22.086      0.000      -0.990      -0.829
ar.S.L12      -0.0475      0.091     -0.523      0.601      -0.226       0.131
ma.S.L12      -0.9990      9.996     -0.100      0.920     -20.590      18.593
sigma2         0.9696      9.635      0.101      0.920     -17.915      19.855
===================================================================================
Ljung-Box (L1) (Q):                   0.04   Jarque-Bera (JB):                 1.27
Prob(Q):                              0.84   Prob(JB):                         0.53
Heteroskedasticity (H):               0.97   Skew:                             0.07
Prob(H) (two-sided):                  0.90   Kurtosis:                         2.63
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

在这里插入图片描述