小学期09

最新推荐文章于 2023-09-09 10:26:13 发布

早早下班

最新推荐文章于 2023-09-09 10:26:13 发布

阅读量368

点赞数

本文链接：https://blog.csdn.net/weixin_45155046/article/details/118764994

版权

实验名称

时间序列分析应用练习

实验目的

熟悉Pandas的一些基本操作及数据样本的可视化分析；
掌握如何使用Prophet建立时间序列模型；
掌握计算预测值和真实值的MAPE和MSE值的方法；
掌握使用 Dickey-Fuller 测试来验证序列的平稳性。

实验背景

本次作业我们选用维基百科 Machine Learning 页面每日浏览量统计数据

实验原理

时间序列分析使用的自回归移动平均模型(ARMA(p，q))是时间序列中最为重要的模型之一，它主要由两部分组成： AR代表p阶自回归过程，MA代表q阶移动平均过程，：

实验步骤

准备环境

打开终端，安装statsmodels

pip install statsmodels==0.11.0
pip install pystan
conda install -c conda-forge fbprophet
pip install plotly
pip install fbprophet

准备数据

1、打开丘比特笔记本
2、导入实验所需要的资源库

import warnings 
import numpy as np 
import pandas as pd 
import os 
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot 
from plotly import graph_objs as go 
import requests 
import pandas as pd

3、环境设置，初始化ploty绘图模式，并忽略告警

init_notebook_mode(connected=True) 
warnings.filterwarnings('ignore')

    <script type="text/javascript">
    window.PlotlyConfig = {MathJaxConfig: 'local'};
    if (window.MathJax) {MathJax.Hub.Config({SVG: {font: "STIX-Web"}});}
    if (typeof require !== 'undefined') {
    require.undef("plotly");
    requirejs.config({
        paths: {
            'plotly': ['https://cdn.plot.ly/plotly-latest.min']
        }
    });
    require(['plotly'], function(Plotly) {
        window._Plotly = Plotly;
    });
    }
    </script>

4、读取并加载浏览量统计数据集

df = pd.read_csv('wiki_machine_learning1.csv', sep=' ') # 过滤没浏览量的记录 
df = df[df['count'] != 0] 
df.head()

	date	count	lang	page	rank	month	title
81	2015-01-01	1414	en	Machine_learning	8708	201501	Machine_learning
80	2015-01-02	1920	en	Machine_learning	8708	201501	Machine_learning
79	2015-01-03	1338	en	Machine_learning	8708	201501	Machine_learning
78	2015-01-04	1404	en	Machine_learning	8708	201501	Machine_learning
77	2015-01-05	2264	en	Machine_learning	8708	201501	Machine_learning

5、查看数据总量

df.shape

(383, 7)

Prophet建模预测

6、将原数据中的时间字符串处理成日期格式

df.date = pd.to_datetime(df.date) 
df.head()

	date	count	lang	page	rank	month	title
81	2015-01-01	1414	en	Machine_learning	8708	201501	Machine_learning
80	2015-01-02	1920	en	Machine_learning	8708	201501	Machine_learning
79	2015-01-03	1338	en	Machine_learning	8708	201501	Machine_learning
78	2015-01-04	1404	en	Machine_learning	8708	201501	Machine_learning
77	2015-01-05	2264	en	Machine_learning	8708	201501	Machine_learning

7、查看数据集后5行

df.tail()

	date	count	lang	page	rank	month	title
382	2016-01-16	1644	en	Machine_learning	8708	201601	Machine_learning
381	2016-01-17	1836	en	Machine_learning	8708	201601	Machine_learning
376	2016-01-18	2983	en	Machine_learning	8708	201601	Machine_learning
375	2016-01-19	3389	en	Machine_learning	8708	201601	Machine_learning
372	2016-01-20	3559	en	Machine_learning	8708	201601	Machine_learning

8、接下来，使用 plotly 提供的方法定义一个 plotly_df 函数，以方便绘制出可交互式图像。

def plotly_df(df, title=''):     data = []     for column in df.columns:         #使用plotly绘制折线图         trace = go.Scatter(x=df.index, y=df[column], mode='lines', name=column)         #添加到data        data.append(trace)    #设置标题     layout = dict(title=title)     fig = dict(data=data, layout=layout)     #绘图 i    plot(fig, show_link=False)

9、然后，利用定义好的绘图函数，绘制数据集浏览量随时间的变化情况。

plotly_df(df.set_index('date')[['count']])

在这里插入图片描述

10、接下来使用 Prophet 预测时间序列数据。首先将 DataFrame 处理成 Prophet 支持的格式。

#取出日期和访问量 df = df[['date', 'count']] #重命名列名 df.columns = ['ds', 'y'] df.tail()

	ds	y
382	2016-01-16	1644
381	2016-01-17	1836
376	2016-01-18	2983
375	2016-01-19	3389
372	2016-01-20	3559

11、然后，将原始数据后 30 条切分用于预测，只使用 30 条之前的历史数据进行建模。

predictions = 30 #取到倒数30条 train_df = df[:-predictions].copy() train_df.tail()

	ds	y
358	2015-12-17	2870
363	2015-12-18	2475
364	2015-12-19	1659
344	2015-12-20	1534
343	2015-12-21	2425

12、使用 Prophet 对 train_df 数据建模，预测后 30 天的结果，并输出 1 月 20 日当天的预测结果。

from fbprophet import Prophet#创建Prophet对象m = Prophet()#填充训练数据，生成模型m.fit(train_df)#预测后三十天的访问量future = m.make_future_dataframe(periods=predictions)forecast = m.predict(future)forecast.tail()

INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.

	ds	trend	yhat_lower	yhat_upper	trend_lower	trend_upper	additive_terms	additive_terms_lower	additive_terms_upper	weekly	weekly_lower	weekly_upper	yhat
378	2016-01-16	2971.125440	1693.020364	2505.084457	2950.572621	2992.357064	-861.666255	-861.666255	-861.666255	-861.666255	-861.666255	-861.666255	2109.459185
379	2016-01-17	2976.423576	1863.399604	2682.683119	2954.866142	2999.140416	-720.685978	-720.685978	-720.685978	-720.685978	-720.685978	-720.685978	2255.737598
380	2016-01-18	2981.721713	2859.464791	3672.168653	2958.893371	3006.114622	281.426215	281.426215	281.426215	281.426215	281.426215	281.426215	3263.147928
381	2016-01-19	2987.019849	3150.060866	3931.155735	2962.577162	3012.714353	541.366440	541.366440	541.366440	541.366440	541.366440	541.366440	3528.386289
382	2016-01-20	2992.317986	3020.316557	3836.613409	2966.983683	3019.515352	425.464520	425.464520	425.464520	425.464520	425.464520	425.464520	3417.782506

#拼接DataFrame,方便使用真实值和预测值进行计算cmp_df = forecast.set_index('ds')[['yhat', 'yhat_lower', 'yhat_upper']].join(df.set_index('ds'))cmp_df.head()#计算MAEcmp_df['e'] = cmp_df['y'] - cmp_df['yhat']#计算MAPEcmp_df['p'] = 100 * cmp_df['e'] / cmp_df['y']print('MAPE = ', round(np.mean(abs(cmp_df[-predictions:]['p'])), 2))print('MAE = ', round(np.mean(abs(cmp_df[-predictions:]['e'])), 2))

MAPE =  34.19MAE =  593.37

import statsmodels.api as smfrom scipy import statsimport matplotlib.pyplot as plt%matplotlib inline#设置图表结构plt.rcParams['figure.figsize'] = (15, 10)

sm.tsa.seasonal_decompose(train_df['y'].values, freq=7).plot()print("Dickey-Fuller test: p=%f" % sm.tsa.stattools.adfuller(train_df['y'])[1])

Dickey-Fuller test: p=0.107392

在这里插入图片描述

实验总结

通过本次作业，练习Pandas的一些基本操作及数据样本的可视化分析，并学会使用Prophet建立时间序列模型，进行预测，在这个基础上，理解掌握简单的评价指标MAPE和MAE。

早早下班

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
小学期09

实验名称时间序列分析应用练习实验目的熟悉Pandas的一些基本操作及数据样本的可视化分析；掌握如何使用Prophet建立时间序列模型；掌握计算预测值和真实值的MAPE和MSE值的方法；掌握使用 Dickey-Fuller 测试来验证序列的平稳性。实验背景本次作业我们选用维基百科 Machine Learning 页面每日浏览量统计数据实验原理时间序列分析使用的自回归移动平均模型(ARMA(p，q))是时间序列中最为重要的模型之一，它主要由两部分组成： AR代表p阶自回归过程，MA代
复制链接

扫一扫