时间序列动态模型与静态模型_用基于树的模型接近时间序列

时间序列动态模型与静态模型

In this article, I’m going to show a generic way how to approach a time-series prediction problem with a machine learning model. I’m using a tree-based model LightGBM as an example here, but practically it could be almost any classical machine learning model, including linear regression or some other. I’ll cover needed data transformation, basic feature engineering and, modeling itself.

在本文中,我将展示一种通用方法,该方法如何使用机器学习模型来解决时间序列预测问题。 我在这里以基于树的模型LightGBM为例,但实际上它几乎可以是任何经典的机器学习模型,包括线性回归或其他模型。 我将介绍所需的数据转换,基本特征工程以及建模本身。

时间序列问题简介 (Introduction to a Time series problem)

Time series is basically a list of values at known points in time. Some examples: daily purchases at a store, the daily number of website visitors, hourly temperature measurements, etc. Often the interval between data points is fixed, but that is not a mandatory requirement. And the problem we want to solve in a time series is to predict the value at points in the future.

时间序列基本上是已知时间点的值列表。 例如:每天在商店购物,每天的网站访问者数量,每小时温度测量等。数据点之间的间隔通常是固定的,但这不是强制性要求。 我们要在时间序列中解决的问题是预测将来各点的价值。

There are many specialized ways how to approach this problem. Exponential smoothing, special models like ARIMA, Prophet, and of course Neural Networks (RNN, LSTM), just to name some. But in this article, I will focus on using generic tree-based models for time series forecasting.

有许多专门的方法来解决此问题。 指数平滑,ARIMA,Prophet等特殊模型,当然还有神经网络(RNN,LSTM),仅举几例。 但是在本文中,我将重点介绍使用基于树的通用模型进行时间序列预测。

Tree models out of the box don’t support forecasting times series on raw data. Therefore data must be first pre-processed in a special way and corresponding features have to be generated.

开箱即用的树模型不支持对原始数据的预测时间序列。 因此,必须首先以特殊方式对数据进行预处理,并且必须生成相应的特征。

Typical raw time series data looks as follows.

典型的原始时间序列数据如下所示。

Image for post

There is one value for each time series step. In this particular example, this step is a day, but it could be an hour, a month, or any other time interval. We have some known training data from the past (blue color in the graph above). And we have unknown data in the future we want to forecast (yellow color in the example).

每个时间序列步骤都有一个值。 在此特定示例中,此步骤是一天,但是可以是一个小时,一个月或任何其他时间间隔。 我们有一些过去的已知训练数据(上图中的蓝色)。 而且我们将来还需要预测未知的数据(示例中为黄色)。

数据转换 (Data transform)

For tree-based models we need many training samples with target values, but here there’s only one long line with time-series data. In order to make it usable, let’s transform it by going through all the values and taking each of them as a target for one sample and prior data points as training data as in the visualization below.

对于基于树的模型,我们需要许多具有目标值的训练样本,但是这里只有很长的一条时间序列数据。 为了使其可用,让我们通过遍历所有值并将每个值作为一个样本的目标并将先前数据点作为训练数据的方式对其进行转换,如下图所示。

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值