纪伯伦先知_先知的时间序列分析:航空乘客数据

本文介绍了如何运用纪伯伦先知(Prophet)这一时间序列预测工具,通过实例分析航空乘客数据,展示其在大数据和人工智能背景下的应用。
摘要由CSDN通过智能技术生成

纪伯伦先知

Prophet is a forecasting model by Facebook that forecasts time series using special adjustments for factors such as seasonality, holiday periods, and changepoints.

Prophet是Facebook的一种预测模型,该模型使用针对季节性,假期和变更点等因素的特殊调整来预测时间序列。

Let’s investigate this further by building a Prophet model to forecast air passenger numbers.

让我们通过建立一个Prophet模型来预测航空乘客人数来进一步研究。

背景 (Background)

The dataset is sourced from the San Francisco International Airport Report on Monthly Passenger Traffic Statistics by Airline, which is available from data.world (Original Source: San Francisco Open Data) as indicated in the References section below.

该数据集来自《 旧金山国际机场每月航空客运量统计报告》 ,该报告可从data.world(原始来源:旧金山开放数据)获得,如以下“参考”部分所述。

Specifically, adjusted passenger numbers for the airline KLM (enplaned) are filtered as the time series for analysis from the period May 2005 to March 2016.

具体来说,将筛选出荷航 (预定)的已调整乘客人数,作为从2005年5月到2016年3月的时间序列进行分析。

Image for post
Source: Jupyter Notebook Output
资料来源:Jupyter Notebook输出

As we can see, the time series shows quite a stationary pattern (one where there is a constant mean, variance and autocorrelation.

如我们所见,时间序列显示出非常平稳的模式(均值,方差和自相关恒定)。

We will not formally test for this condition here, but it is also evident that there appears to be significant seasonality present in the dataset — i.e. significant shifts in the time series trend that occur at certain time intervals.

我们不会在这里正式测试这种情况,但是很显然,数据集中似乎存在明显的季节性变化-即在特定时间间隔发生的时间序列趋势发生了重大变化。

From a visual inspection of the time series, it would appear that this shift happens approximately every eight months or so.

从时间序列的目视检查来看,这种转变似乎大约每八个月发生一次。

建筑模型 (Model Building)

With that in mind, let’s get started on building a forecasting model.

考虑到这一点,让我们开始构建预测模型。

The first step is to properly format the data in order to work with Prophet.

第一步是正确格式化数据,以便与先知一起使用。

train_dataset= pd.DataFrame()
train_dataset['ds'] = train_df['Date']
train_dataset['y']= train_df['Adjusted Passenger Count']
train_dataset.head(115)
Image for post
Source: Jupyter Notebook Output
资料来源:Jupyter Notebook输出

Here is the dataframe that will be used as the test set (the part of the time series we are trying to predict), with the time interval defined as monthly:

这是将用作测试集(我们尝试预测的时间序列的一部分)的数据帧,其时间间隔定义为monthly

future= prophet_basic.make_future_dataframe(periods=14, freq='M')
future.tail(15)
Image for post
Source: Jupyter Notebook Output
资料来源:Jupyter Notebook输出

We firstly define a model as follows:

我们首先定义一个模型,如下所示:

prophet_basic = Prophet()
prophet_basic.fit(train_dataset)

Here is a plot of the forecast:

这是预测的图:

forecast=prophet_basic.predict(future)
fig1 =prophet_basic.plot(forecast)
Image for post
Source: Jupyter Notebook Output
资料来源:Jupyter Notebook输出

Here are the components of the forecast:

以下是预测的组成部分:

fig1 = prophet_basic.plot_components(forecast)
Image for post

Some observations:

一些观察:

  • We can see that there is a significant growth in the trend from 2007 up until 2009, with passenger numbers levelling off after that.

    我们可以看到,从2007年到2009年,这一趋势有了显着增长,此后旅客人数趋于稳定。
  • We also observe that passenger numbers appear to be highest from approximately May — September, after which we see a dip in numbers for the rest of the year.

    我们还观察到,大约5月至9月的旅客人数似乎最高,此后,其余年份的旅客人数均出现下降。

Note that we observed visually that seasonality appears to be present in the dataset. However, given that we are working with a monthly dataset — we will not use Prophet to explicitly model seasonality in this instance.

请注意,我们从视觉上观察到数据集中似乎存在季节性。 但是,鉴于我们正在使用每月数据集-在这种情况下,我们将不会使用Prophet显式模拟季节性。

There are two reasons for this:

有两个原因:

  1. Detection of seasonality would be more accurate if we were using daily data — but we are not in this case.

    如果我们使用每日数据,则季节性的检测将更加准确-但在这种情况下我们不这样做。
  2. Making an assumption of yearly seasonality may not be particularly accurate in this case. Inspecting the dataset shows that while certain seasonal shifts occur every year, others occur every 6 to 8 months. Therefore, explicitly defining a seasonality parameter in the model may do more harm than good in this instance.

    在这种情况下,假设年度季节性可能不是特别准确。 检查数据集可以发现,尽管每年发生某些季节性变化,但每6至8个月发生一次。 因此,在这种情况下,在模型中明确定义季节性参数可能弊大于利。

变更点 (Changepoints)

A changepoint represents a significant structural shift in a time series.

变更点表示时间序列中的重大结构变化。

For instance, the big drop in air passenger numbers after the onset of COVID-19 would represent a significant structural shift in the data.

例如,COVID-19发作后航空旅客人数的大幅下降将代表数据的重大结构变化。

For instance, here is the indicated changepoints on the model when the appropriate parameter is set to 4.

例如,当适当的参数设置为4时,这是模型上指示的更改点。

pro_change= Prophet(n_changepoints=4)
forecast = pro_change.fit(train_dataset).predict(future)
fig= pro_change.plot(forecast);
a = add_changepoints_to_plot(fig.gca(), pro_change, forecast)
Image for post
Source: Jupyter Notebook Output
资料来源:Jupyter Notebook输出

We see that the significant changepoint as indicated in the model lies between 2007–2009.

我们看到模型中指出的重要变更点位于2007–2009年之间。

What is interesting is that while passenger numbers did see a significant decline for 2009 — numbers were still higher on average for this period than for 2005–2007, indicating that the overall demand for air travel (for KLM flights from San Francisco at least) actually grew towards the end of the decade.

有趣的是,尽管2009年的旅客人数确实出现了大幅下降,但此期间的平均人数仍比2005-2007年的平均人数要高,这表明航空旅行的总体需求(至少从旧金山出发的荷航航班)在本世纪末增长。

模型验证 (Model Validation)

Now that the forecasting model has been built, the predicted passenger numbers are compared to the test set in order to determine model accuracy.

现在已经建立了预测模型,将预测的乘客人数与测试集进行比较,以确定模型的准确性。

With the changepoint set to 4, we obtain the following error metrics:

将changepoint设置为4 ,我们获得以下错误度量:

  • Root Mean Squared Error: 524

    均方根误差:524
  • Mean Forecast Error: 71

    平均预测误差:71
Image for post
Source: Jupyter Notebook Output
资料来源:Jupyter Notebook输出

With a mean of 8,799 passengers per month — the errors are quite low in comparison to this figure — indicating that the model is performing well in forecasting monthly passenger numbers.

每月平均有8,799名乘客-与该数字相比,误差非常低-表明该模型在预测每月乘客人数方面表现良好。

However, it is important to note that the model accuracy is significantly influenced by the changepoint parameter.

但是,必须注意的是,模型精度受更改点参数的影响很大。

Let’s see what happens to the RMSE when the changepoints are modified.

让我们看看修改变更点后RMSE会发生什么。

Image for post
Source: Author’s Calculations
资料来源:作者的计算

We can see that the RMSE drops quite dramatically as more changepoints are introduced — but the RMSE is minimised at 4 changepoints.

我们可以看到,随着引入更多的变更点,RMSE急剧下降-但在4个变更点处,RMSE最小化。

结论 (Conclusion)

In this example, you have seen:

在此示例中,您已经看到:

  • How Prophet can be used to make time series forecasts

    如何使用先知来进行时间序列预测
  • How to analyse trends and seasonal fluctuations using Prophet

    如何使用先知分析趋势和季节性波动
  • The importance of changepoints in determining model accuracy

    变更点在确定模型准确性中的重要性

Hope you found this of use, and grateful for any thoughts or feedback!

希望您发现了这种用法,并感谢您的任何想法或反馈!

You can access the code and datasets for this example at the MGCodesandStats repository as indicated below.

您可以在MGCodesandStats存储库中访问此示例的代码和数据集,如下所示。

Disclaimer: This article is written on an “as is” basis and without warranty. It was written with the intention of providing an overview of data science concepts, and should not be interpreted as professional advice in any way.

免责声明:本文按“原样”撰写,不作任何担保。 它旨在提供数据科学概念的概述,并且不应以任何方式解释为专业建议。

翻译自: https://towardsdatascience.com/time-series-analysis-with-prophet-air-passenger-data-6f29c7989681

纪伯伦先知

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值