I used to use Rob J Hyndman’s fpp2
forecasting package. Quite a lot. Still it’s my go-to forecasting library. The reason I like it so much is that it comes with extensive coverage of forecasting techniques and an invaluable open access book that has all the theories going into forecasting. Pretty much everything you need for academic research on time series is there.
我曾经使用过Rob J Hyndman的 fpp2
预测软件包 。 非常多。 仍然是我的首选预测库。 我之所以如此喜欢它,是因为它涵盖了预测技术的广泛覆盖范围以及一本有价值的开放获取书 ,其中包含有关预测的所有理论。 您几乎可以进行时间序列的学术研究。
But that’s also the downside of the package, it’s not beginner-friendly. Who wants to build a car just to drive it on the road?
但这也是该软件包的缺点,它对初学者不友好。 谁想要制造一辆只在路上行驶的汽车?
Then Facebook Prophet came along.
然后, Facebook先知出现了。
Prophet made unbelievable simplification to forecasting exercise. You can use it out of the box without needing to understand a lot of theories, as you are about to see below.
先知简化了预测工作。 您可以立即使用它,而无需了解很多理论,下面将要介绍。
The package is very intuitive to use and is especially powerful for business forecasting. You can even specify weekends, special days and events (e.g. Superbowl) that impact business activities.
该软件包使用起来非常直观,对于业务预测特别强大。 您甚至可以指定影响业务活动的周末,特殊日子和事件(例如超级碗)。
Cherry on top, Prophet is available in both python and R programming language!
最重要的是Cherry,Prophet支持python和R编程语言!
Let’s do a quick demo.
让我们做一个快速演示。
1.安装软件包 (1. Install package)
I’m doing it in Python, so all you need is pandas
package for manipulating data.
我正在Python中执行此操作,因此您所需要的只是用于处理数据的pandas
包。
And of course Prophet
.
当然还有Prophet
。
# improt libraries
import pandas as pd
from fbprophet import Prophet
2.导入和格式化数据 (2. Import & format data)
The dataset I’m going to use is a time series consisting of daily minimum temperature recorded for 10 years between 1981 and 1990.
我要使用的数据集是一个时间序列,包括1981年至1990年之间10年记录的每日最低温度。
# import data
df = pd.read_csv("https://bit.ly/3hJwIm0")# check out first few rows
df.head()
As you can see, the datarame has just two columns, one on the time dimension and the other on observations.
如您所见,数据框只有两列,一列在时间维度上,另一列在观察上。
Some data formatting is needed. Prophet
requires that the datetime column is named as “ds” and the observation column as “y”.
需要一些数据格式化。 Prophet
要求datetime列命名为“ ds”,而观察列命名为“ y”。
Let's rename both columns.
让我们重命名两个列。
# data formating
df = df.rename(columns = {"Date": "ds", "Temp": "y"})
3.建立模型 (3. Model building)
Similar to Scikit Learn algorithms, Prophet
follows a simple “instantiate → fit → predict” workflow for forecasting.
与Scikit Learn算法相似, Prophet
遵循简单的“实例化→拟合→预测”工作流进行预测。
# instantiate model
m = Prophet()# fit model to data
m.fit(df)
You can play with parameters but for building your first forecasting model out of the box package, it is as simple as those two tiny lines of code.
您可以使用参数,但是要开箱即用地构建您的第一个预测模型,它就像这两行代码一样简单。
4.预测 (4. Forecasting)
Now that you’ve your model, you are ready to make a forecast.
有了模型后,就可以进行预测了。
Just like building the model, forecasting is also a two-lines code. In the first line, you make an empty dataframe to store forecast values. In the second line you pass in the empty dataframe into the model, the model will fill out the rows with predicted values.
就像建立模型一样,预测也是两行代码。 在第一行中,您将创建一个空的数据框来存储预测值。 在第二行中,将空数据框传递到模型中,模型将用预测值填充行。
# make a forecast dataframe
future = m.make_future_dataframe(periods = 365)# make a forecast
forecast = m.predict(future)
You are done with forecasting, you can now check out the forecast values in a dataframe.
预测已经完成,现在可以在数据框中检出预测值。
# check out forecast values
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()
The column names are intuitive —
列名很直观-
- ds: forecast time steps; ds:预测时间步长;
- yhat: forecast values yhat:预测值
- yhat_lower & yhat_upper: confidence intervals yhat_lower和yhat_upper:置信区间
5.绘图 (5. Plotting)
It’s always good to visualize data and forecast values altogether to see how they fit.
完全可视化数据和预测值以查看其适合度总是好的。
Again, it’s just a one-liner code.
同样,这只是一个单行代码。
# plot
fig1 = m.plot(forecast)
In this figure below, each black dot represents the original observations, the dark green line is forecast model and the light green line is confidence intervals.
在下图中,每个黑点表示原始观测值,深绿色线表示预测模型,浅绿色线表示置信区间。
There! You made a real forecasting model under 10 lines of code!
那里! 您使用10行代码制作了一个真实的预测模型!
下一步是什么? (What’s next?)
In this article I didn’t mean to go deep into the model, the purpose was rather to convince people that it’s not complicated to build a forecasting model, even if you know little about theories.
在本文中,我并不是要深入研究该模型,而是要说服人们即使您对理论知之甚少,构建预测模型也并不复杂。
As a next step, you can explore other functionalities — they have a very easy-to-follow documentation. I would specifically encourage folks to explore how special days and events affect forecasting (not temperature of course! but business activities).
下一步,您可以探索其他功能-它们具有非常易于理解的文档 。 我特别鼓励人们探索特殊的日子和事件如何影响预测(当然不是温度!而是商业活动)。
As I said, you can implement the package both in Python and R programming environment, so that gives you some extra freedom.
就像我说过的那样,您可以在Python和R编程环境中实现该程序包,从而给您额外的自由。