case1 ARIMA预测市场规模

最新推荐文章于 2024-09-15 22:31:42 发布

weixin_45552398

最新推荐文章于 2024-09-15 22:31:42 发布

阅读量253

点赞数

分类专栏：数据分析实践文章标签： python 机器学习数据分析

本文链接：https://blog.csdn.net/weixin_45552398/article/details/105805904

版权

数据分析实践专栏收录该内容

1 篇文章 0 订阅

订阅专栏

最近终于有一次机会来尝试一下自己做一个例子，也算是检验一下这段时间学习的成果，总的来说还是不错的，至少能够根据需要修改python代码，并且能够针对具体的应用实现了数据的预处理，总的来说，在看完几个案例以后自己再来实现，应该不是很困难，下面就让我来介绍一下这次预测的建模过程。
首先还是调用相关的包：

import matplotlib.pyplot as plt
import pandas as pd
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.stats.diagnostic import acorr_ljungbox
from statsmodels.tsa.stattools import adfuller as ADF
from statsmodels.graphics.tsaplots import plot_pacf
from statsmodels.tsa.arima_model import ARIMA

导入数据：

plt.rcParams['font.sans-serif'] = ['SimHei']  #设置中文字体
df = pd.read_csv('/Users/Desktop/data.csv')

对数据进行预处理，这里我主要分析的是数据之间的等式关系，对于不符合条件的数据利用前后数据的平均值进行平滑化处理：

north_america = df.iloc[:, 4 * k + 1:4 * (k + 1) + 1]
    for i in range(north_america.shape[0]):
        for j in range(north_america.shape[1]):
            a = north_america.iloc[i, j]
            if pd.isnull(a):
                north_america.iloc[i, j] = 0
    for i in range(north_america.shape[0] - 5):
        if north_america.iloc[i, 3] != 0:
            if north_america.iloc[i, 0] + north_america.iloc[i, 1] + north_america.iloc[i, 2] == north_america.iloc[
                i, 3]:
                continue
            else:
                north_america.iloc[i, 0] = (north_america.iloc[i - 1, 0] + north_america.iloc[i + 1, 0]) / 2
                north_america.iloc[i, 1] = (north_america.iloc[i - 1, 1] + north_america.iloc[i + 1, 1]) / 2
                north_america.iloc[i, 2] = (north_america.iloc[i - 1, 2] + north_america.iloc[i + 1, 2]) / 2
                north_america.iloc[i, 3] = north_america.iloc[i, 0] + north_america.iloc[i, 1] + north_america.iloc[
                    i, 2]

接下来是时间序列模型，通过反复尝试得到最适合的p和q：

    for i in range(north_america.shape[1]):
        plot_acf(north_america.iloc[:, i])  # 自相关图
        plt.show()
        print('白噪声-检验结果：', acorr_ljungbox(north_america.iloc[:, i], lags=1))
        print('ADF-检验结果：', ADF(north_america.iloc[:, i]))
        D_data = north_america.iloc[:, i].diff().dropna()  # 对原数据进行1阶差分，删除非法值
        D_data.plot()  # 时序图
        plot_acf(D_data)  # 自相关图
        plot_pacf(D_data)  # 偏自相关图
        plt.show()
        print('差分序列－ADF－检验结果为：', ADF(D_data))#平稳性检测
        north_america.iloc[:, i]=north_america.iloc[:, i].astype(float)
        pmax = int(len(D_data) / 10)  # 一般阶数不超过length/10
        qmax = int(len(D_data) / 10)  # 一般阶数不超过length/10
        e_matrix = []  # 评价矩阵
        for p in range(pmax + 1):
            tmp = []
            for q in range(qmax + 1):
                try:  # 存在部分报错，所以用try来跳过报错。
                    tmp.append(ARIMA(north_america.iloc[:, i], (p, 1, q)).fit().aic)
                except:
                    tmp.append(None)
            e_matrix.append(tmp)
        e_matrix = pd.DataFrame(e_matrix)  # 从中可以找出最小值
        p, q = e_matrix.stack().idxmin()  # 先用stack展平，然后用找出最小值位置。
        print('AIC最小的p值和q值为：%s、%s' % (p, q))

最后就是用模型进行预测啦：

model = ARIMA(north_america.iloc[:, i], (p, 1, q)).fit()  # 建立ARIMA(4,1,1)模型
        model.summary2()  # 给出模型报告
        print('---------------------------')
        print(model.forecast(4))  # 作为期5天的预测，返回预测结果、标准误差、置信区间。