通过SARIMA模型进行销量预测

本篇文章想通过历史的销量数据来预测未来几个月的数据,用以定目标和供运营、供应链人员备货需求参考,本文通过SARIMA模型来实现,预测下一个月的销量相对误差在17%,预测结果仅供参考,具体还得结合运营人员经验共同预测得出,后续也会尝试其他模型来提高预测准确性

一、导入需要的包

from __future__ import division
from datetime import datetime, timedelta,date
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
#import statsmodels as sm
#from tqdm import tqdm_notebook
from statsmodels.graphics.tsaplots import plot_pacf
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.tsa.stattools import adfuller
from statsmodels.stats.diagnostic import acorr_ljungbox
from statsmodels.tsa.statespace.sarimax import SARIMAX
import warnings
warnings.filterwarnings("ignore")

二、导入历史数据

df = pd.read_excel("C:\\Users\\10237\\Desktop\\F7NP历史销量.xlsx")
plt.figure(figsize=(10,5))
df["sales"].plot()
plt.legend(["sales"])
plt.show()

在这里插入图片描述

三、自相关图

df["sales"]=df["sales"].astype("float")
plot_acf(df["sales"]).show()

在这里插入图片描述

自相关系数

平稳序列通常具有短期相关性,即随着延迟期数k的增加,平稳序列的自相关系数会很快地衰减向零,而非平稳序列的自相关系数的衰减速度会比较慢,这就是我们利用自相关图判断平稳性的标准。
其横轴表示延迟期数,纵轴表示自相关系数,从图中可以看出自相关系数衰减到零的速度比较缓慢,在很长的延迟期内,自相关系数一直为正,然后为负,呈现出三角对称性,这是具有单调趋势的非平稳序列的一种典型的自相关图形式

四、平稳性检验

adfuller(df["sales"])
(1.8408412567719112,
 0.9984266704565955,
 8,
 16,
 {'1%': -3.9240193847656246, '5%': -3.0684982031250003, '10%': -2.67389265625},
 283.93234809442)

P值为0.99大于显著性水平0.05,接受原假设(非平稳序列),说明原始序列是非平稳序列

五、一阶差分序列的检验

df1 = df.diff(periods=1,axis=0).dropna()
df1 = df1["sales"]
plt.figure(figsize=(10,5))
df1.plot()
#解读:在均值附近比较平稳波动

在这里插入图片描述

#自相关图
plot_acf(df1).show()
#解读:有短期相关性,但趋向于零

在这里插入图片描述

#平稳性检验
adfuller(df1)
#解读:P值为0.98大于显著性水平,接受原假设,说明一阶差分非平稳序列
(0.6199434003434391,
 0.9880970219599929,
 7,
 16,
 {'1%': -3.9240193847656246, '5%': -3.0684982031250003, '10%': -2.67389265625},
 267.98692610966367)

可见原始序列波动太大,直接差分意义不大,所以下面先取对数把数值控制在较小的波动区间内,再进行差分序列容易平稳

六、取对数进行差分

df["npsales"]=np.log(df["sales"])
df["diffsales"]=df["npsales"].diff().dropna()
df2 = df.drop(df.index[0])
df
datesalesnpsalesdiffsales
02021-12-013972.08.287025NaN
12022-01-012140.07.668561-0.618464
22022-02-011884.07.541152-0.127409
32022-03-012293.07.7376160.196464
42022-04-011681.07.427144-0.310472
52022-05-011788.07.4888530.061709
62022-06-011742.07.462789-0.026064
72022-07-012560.07.8477630.384973
82022-08-012329.07.753194-0.094568
92022-09-012309.07.744570-0.008624
102022-10-013343.08.1146240.370054
112022-11-016817.08.8271750.712551
122022-12-018446.09.0414480.214273
132023-01-016769.08.820109-0.221340
142023-02-016538.08.785387-0.034722
152023-03-018709.09.0721120.286726
162023-04-019131.09.1194300.047318
172023-05-0112873.09.4628870.343457
182023-06-0115708.09.6619250.199038
192023-07-0129836.010.3034710.641546
202023-08-0117128.09.748470-0.555001
212023-09-0116098.09.686450-0.062020
222023-10-0120588.09.9324640.246013
232023-11-0136124.010.4947130.562249
242023-12-0136785.010.5128450.018133
plt.figure(figsize=[15,7.5])
plt.plot(df["diffsales"])

在这里插入图片描述

plot_acf(df2["diffsales"]).show()

在这里插入图片描述

adfuller(df2["diffsales"])
(-1.9799606891943407,
 0.2954281636375788,
 8,
 15,
 {'1%': -3.9644434814814815,
  '5%': -3.0849081481481484,
  '10%': -2.6818144444444445},
 -2.6195422547466265)

P值0.29大于显著性水平0.05,接受原假设,一阶差分不平稳,继续二阶差分

df2["diff2sales"] = df["diffsales"].diff(1)
df3 = df2.drop([1,2],axis=0).reset_index(drop=True)
df3
datesalesnpsalesdiffsalesdiff2sales
02022-03-012293.07.7376160.1964640.323872
12022-04-011681.07.427144-0.310472-0.506936
22022-05-011788.07.4888530.0617090.372181
32022-06-011742.07.462789-0.026064-0.087773
42022-07-012560.07.8477630.3849730.411037
52022-08-012329.07.753194-0.094568-0.479542
62022-09-012309.07.744570-0.0086240.085944
72022-10-013343.08.1146240.3700540.378679
82022-11-016817.08.8271750.7125510.342497
92022-12-018446.09.0414480.214273-0.498277
102023-01-016769.08.820109-0.221340-0.435613
112023-02-016538.08.785387-0.0347220.186618
122023-03-018709.09.0721120.2867260.321448
132023-04-019131.09.1194300.047318-0.239407
142023-05-0112873.09.4628870.3434570.296139
152023-06-0115708.09.6619250.199038-0.144419
162023-07-0129836.010.3034710.6415460.442508
172023-08-0117128.09.748470-0.555001-1.196547
182023-09-0116098.09.686450-0.0620200.492982
192023-10-0120588.09.9324640.2460130.308033
202023-11-0136124.010.4947130.5622490.316236
212023-12-0136785.010.5128450.018133-0.544116
plt.figure(figsize=[15,7.5])
plt.plot(df3["diff2sales"])

在这里插入图片描述

plot_acf(df3["diff2sales"]).show()

在这里插入图片描述

adfuller(df3["diff2sales"])
(-4.377278173091498,
 0.00032555635791855304,
 9,
 12,
 {'1%': -4.137829282407408,
  '5%': -3.1549724074074077,
  '10%': -2.7144769444444443},
 -26.776419186897854)

P值0.0003小于显著性0.05,拒绝原假设,说明二阶差分是平稳序列

七、白噪声检验

acorr_ljungbox(df3["diff2sales"],lags=[6,12,20])
lb_statlb_pvalue
617.2708670.008338
1230.7205640.002173
2052.2567620.000104

P值均小于0.05,拒绝原假设,说明二阶差分序列非白噪声序列

八、拟合参数

plot_acf(df3['diff2sales']).show()
plot_pacf(df3['diff2sales'],lags=10).show()

在这里插入图片描述

在这里插入图片描述

import itertools
import warnings
p = q = range(0, 4) 
d = range(0,4)
pdq = list(itertools.product(p, d, q))
seasonal_pdq = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p, d, q))]
warnings.filterwarnings("ignore") # 忽略警告信息
for param in pdq:
    for param_seasonal in seasonal_pdq:
        try:
            mod = sm.tsa.statespace.SARIMAX(df3["diff2sales"],
                                            order=param,
                                            seasonal_order=param_seasonal,
                                            enforce_stationarity=False,
                                            enforce_invertibility=False)
            results = mod.fit()            
            print('SARIMA{}x{} - AIC:{}'.format(param, param_seasonal, results.aic))
        except:
            continue
#之前是可以运行出结果的,后来不知道咋了结果死活运行不出,结果中找到最小的AIC

综上,我们得知SARIMA模型使AIC最小的参数组合为:SARIMA(0,0,0)(0,2,0)s s=12

best_model1 = SARIMAX(df3['diff2sales'], order=(0, 0, 0), seasonal_order=(0, 2, 0, 12)).fit(dis=-1)
print(best_model1.summary())
                                SARIMAX Results                                 
================================================================================
Dep. Variable:               diff2sales   No. Observations:                   22
Model:             SARIMAX(0, 2, 0, 12)   Log Likelihood                   0.000
Date:                  Wed, 07 Feb 2024   AIC                              2.000
Time:                          13:59:29   BIC                                nan
Sample:                               0   HQIC                               nan
                                   - 22                                         
Covariance Type:                    opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
sigma2          1e-10         -0       -inf      0.000       1e-10       1e-10
===================================================================================
Ljung-Box (L1) (Q):                    nan   Jarque-Bera (JB):                  nan
Prob(Q):                               nan   Prob(JB):                          nan
Heteroskedasticity (H):                nan   Skew:                              nan
Prob(H) (two-sided):                   nan   Kurtosis:                          nan
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
[2] Covariance matrix is singular or near-singular, with condition number    inf. Standard errors may be unstable.
df3['sarima_model'] = best_model1.fittedvalues
df3['sarima_model'][:11+1] = np.NaN
forecast = best_model1.predict(start=df3.shape[0], end=df3.shape[0] + 12)
forecast = df3['sarima_model'].append(forecast)
plt.figure(figsize=(15, 7.5))
plt.plot(forecast, color='r', label='model')
plt.axvspan(df3.index[-1], forecast.index[-1], alpha=0.5, color='lightgrey')
plt.plot(df3['diff2sales'], label='actual')
plt.legend()
plt.show()

在这里插入图片描述

从预测值与实际值的曲线来看,误差不大相对吻合

forecast
0          NaN
1          NaN
2          NaN
3          NaN
4          NaN
5          NaN
6          NaN
7          NaN
8          NaN
9          NaN
10         NaN
11         NaN
12    0.485809
13   -0.760404
14    0.558271
15   -0.131659
16    0.616556
17   -0.719312
18    0.128916
19    0.568018
20    0.513745
21   -0.747416
22   -0.653420
23    0.279926
24    0.319023
25    0.028121
26    0.220096
27   -0.201065
28    0.473978
29   -1.913552
30    0.900020
31    0.237387
32    0.289975
33   -0.589955
34   -0.871226
dtype: float64
df4 = pd.concat([df3,forecast],axis=1)
df4
datesalesnpsalesdiffsalesdiff2salessarima_model0
02022-03-012293.07.7376160.1964640.323872NaNNaN
12022-04-011681.07.427144-0.310472-0.506936NaNNaN
22022-05-011788.07.4888530.0617090.372181NaNNaN
32022-06-011742.07.462789-0.026064-0.087773NaNNaN
42022-07-012560.07.8477630.3849730.411037NaNNaN
52022-08-012329.07.753194-0.094568-0.479542NaNNaN
62022-09-012309.07.744570-0.0086240.085944NaNNaN
72022-10-013343.08.1146240.3700540.378679NaNNaN
82022-11-016817.08.8271750.7125510.342497NaNNaN
92022-12-018446.09.0414480.214273-0.498277NaNNaN
102023-01-016769.08.820109-0.221340-0.435613NaNNaN
112023-02-016538.08.785387-0.0347220.186618NaNNaN
122023-03-018709.09.0721120.2867260.3214480.4858090.485809
132023-04-019131.09.1194300.047318-0.239407-0.760404-0.760404
142023-05-0112873.09.4628870.3434570.2961390.5582710.558271
152023-06-0115708.09.6619250.199038-0.144419-0.131659-0.131659
162023-07-0129836.010.3034710.6415460.4425080.6165560.616556
172023-08-0117128.09.748470-0.555001-1.196547-0.719312-0.719312
182023-09-0116098.09.686450-0.0620200.4929820.1289160.128916
192023-10-0120588.09.9324640.2460130.3080330.5680180.568018
202023-11-0136124.010.4947130.5622490.3162360.5137450.513745
212023-12-0136785.010.5128450.018133-0.544116-0.747416-0.747416
22NaTNaNNaNNaNNaNNaN-0.653420
23NaTNaNNaNNaNNaNNaN0.279926
24NaTNaNNaNNaNNaNNaN0.319023
25NaTNaNNaNNaNNaNNaN0.028121
26NaTNaNNaNNaNNaNNaN0.220096
27NaTNaNNaNNaNNaNNaN-0.201065
28NaTNaNNaNNaNNaNNaN0.473978
29NaTNaNNaNNaNNaNNaN-1.913552
30NaTNaNNaNNaNNaNNaN0.900020
31NaTNaNNaNNaNNaNNaN0.237387
32NaTNaNNaNNaNNaNNaN0.289975
33NaTNaNNaNNaNNaNNaN-0.589955
34NaTNaNNaNNaNNaNNaN-0.871226

九、还原预测值

def diff1_reduction(y_pred_test_1, y_train_real):
    '''
    :param y_pred_test_1: 一阶差分后测试集的预测结果
    :param y_train_real: 训练集的输出
    :return: 测试集的真实预测结果(未经差分)
    '''
    y_test_real = y_pred_test_1.cumsum() + y_train_real.values[-1]
    return y_test_real.dropna()


def diff2_reduction(y_pred_test_2, y_train_real):
    '''
    :param y_pred_test_2: 二阶差分后测试集的预测结果
    :param y_train_real: 训练集的输出
    :return: 测试集的真实预测结果(未经差分)
    '''

    data_diff_1_reduction = y_pred_test_2.cumsum() + (
                y_train_real.values[-1] - y_train_real.values[-2])  # 先还原到y_pred_test_1
    y_test_real = diff1_reduction(data_diff_1_reduction, y_train_real)
    return y_test_real


def diff3_reduction(y_pred_test_3, y_train_real):
    '''
    :param y_pred_test_3: 三阶差分后测试集的预测结果
    :param y_train_real: 训练集的输出
    :return: 测试集的真实预测结果(未经差分)
    '''

    data_diff_2_reduction = y_pred_test_3.cumsum() + (
               y_train_real.values[-1] -2*y_train_real.values[-2]+ y_train_real.values[-3])  # 先还原到y_pred_test_2
    y_test_real = diff2_reduction(data_diff_2_reduction, y_train_real)
    return y_test_real    

y_return = diff2_reduction(forecast[22:], df3["npsales"])
# 二阶差分还原
y_return
22     9.877559
23     9.522198
24     9.485860
25     9.477644
26     9.689524
27     9.700338
28    10.185131
29     8.756372
30     8.227632
31     7.936280
32     7.934902
33     7.343569
34     5.881009
dtype: float64
# 对数还原
np.exp(y_return)
22    19488.085127
23    13659.601305
24    13172.153768
25    13064.368259
26    16147.550146
27    16323.129592
28    26506.122567
29     6351.027108
30     3742.960429
31     2796.935436
32     2793.084329
33     1546.220029
34      358.170530
dtype: float64

十、预测值与实际值对比结论

1、得到下一个月也就是24年1月的预测值是19488,实际24年1月的销售额是16559,相对误差(|预测值-实际值|/实际值)为17.68%。

2、另外从预测数据来看,29行之后数据没有太大的价值,后面每月会填入最新的实际值来滚动预测最新的数据,也会尝试其他模型来提高预测准确性。

  • 21
    点赞
  • 36
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值