预测不同时间序列模型的概述

When one wants to start predicting the future, it must come as no surprise that one must deal with time. This is the essence of time series modeling. A time series model can be thought of as a function F(t) = y that takes a given point in time (t) and produces some output (y).

当人们想开始预测未来时,必须处理时间就不足为奇了。 这是时间序列建模的本质。 可以将时间序列模型视为函数F(t)= y ,它采用给定的时间点(t)并产生一些输出(y)。

The difficult arises when we want to estimate what y will be during a point in time that hasn’t occurred yet. The model then changes to F(t+h) = y, where h is some amount of time into the future.

当我们想要估计尚未发生的时间点内的y时,就会出现困难。 然后,模型变为F(t + h)= y,其中h是未来的时间量。

There are several time series models that exist and today I will give an overview of the most common.

存在几种时间序列模型,今天我将概述最常见的模型。

季节性分解: (Seasonal Decomposition:)

Oftentimes, time series data can demonstrate clear seasonality, ie certain patterns or trends that emerge over consistent periods of time and repeat themselves (these could be daily, weekly, monthly, yearly, etc). A seasonal decomposition breaks the time series model into three parts:

通常,时间序列数据可以显示清楚的季节性,即在一致的时间段内出现并重复自身的某些模式或趋势(可能是每天,每周,每月,每年等)。 季节性分解将时间序列模型分为三个部分:

F(t) = S(t) + T(t) + R(t)

F(t)= S(t)+ T(t)+ R(t)

where S(t) is a seasonal component, T(t) is a trend component, and R(t) is the remainder component. During a seasonal decomposition, the trend will be able to explain a certain proportion of the observations, the seasonality another portion, and then the remainder of the observations are by caused by some unknown factor or series of factors (“the remainder” R(t)). Below is an example of a seasonal decomposition with the original data on the top, and the trend, seasonal, and remainder components below:

其中S(t)是季节性成分, T(t)是趋势成分, R(t)是余量成分。 在季节分解过程中,趋势将能够解释观测值的一定比例,季节变化的另一部分,然后剩余的观测值是由一些未知因素或一系列因素(“剩余” R(t ) )。 以下是一个季节性分解的示例,其原始数据位于顶部,而趋势,季节和其余部分位于下面:

Exponential Smoothing

指数平滑

Exponential smoothing is a simple model that proposes that the model’s output at time t is a weighted average of the values of observations at t-1, t-2, all the way to t=1, with the weights assigned to each observation reducing exponentially the further back in time they are. Put more simply, t-1 will have the highest weight in the average, t-2 the second highest weight, and so on.

指数平滑是一个简单的模型,它提出模型在时间t的输出是在t-1,t-2一直到t = 1的观测值的加权平均值,分配给每个观测值的权重呈指数下降时间越远。 简而言之,t-1的权重平均最高,t-2的权重第二高,依此类推。

As time proceeds, the model values the information provided by an observation at a single point in time less and less.

随着时间的流逝,模型会越来越重视单个时间点上观察提供的信息。

The weights are calculated by some defining some beginning weight for the most recent observation (alpha), and the remaining weight are derived from this alpha as such:

权重是通过为最近的观测值(alpha)定义一些初始权重来计算的,而剩余的权重是从该alpha得出的,如下所示:

Image for post

Below is a table of examples demonstrating how weights assigned to each observation vary depending on the initial alpha value:

下面的示例表格展示了分配给每个观察值的权重如何随初始alpha值而变化:

Image for post

Note that for any value 0 < alpha <1, the sum of the weights will approximately sum to 1.

请注意,对于任何值0 <alpha <1,权重的总和将近似等于1。

自回归综合移动平均线(ARIMA) (AutoRegressive Integrated Moving Average (ARIMA))

Combined with exponential smoothing, the ARIMA group of models is one of the most common forecasting techniques used by practitioners.

结合指数平滑,ARIMA模型组是从业人员最常用的预测技术之一。

An ARIMA models combines two approaches to explaining future observations. The first is the autoregressive portion (AR) which postulates that a portion of the observation is some linear combination of the past observations. The second is the moving average portion (MA) which postulates that a portion of the observation is some linear combination of the past errors in the forecast.

ARIMA模型结合了两种方法来解释未来的观察结果。 第一个是自回归部分(AR),它假定观测值的一部分是过去观测值的某种线性组合。 第二个是移动平均线(MA),它假设观测值的一部分是预测中过去错误的某种线性组合。

These models require the time series to be stationary (ie there is no trend in the data), thus differencing or integrating (I) is a necessary step in most cases as a trend usually does exist in time series data.

这些模型要求时间序列是固定的(即数据中没有趋势),因此在大多数情况下,区分或积分(I)是必需的步骤,因为时间序列数据中通常确实存在趋势。

The autoregressive portion of the model is expressed as such:

模型的自回归部分表示为:

Image for post

With p being the number of lags that the model is assuming influences the outcome (this is also known as the order of the AR portion).

p是模型假设的滞后次数会影响结果(这也称为AR部分的顺序 )。

The moving average portion of the model is expressed as such:

模型的移动平均部分表示为:

Image for post

With q being the number of error term lags that the model is assuming influences the outcome (this is also known as the order of the MA portion).

q为误差项的数量,模型假设的滞后会影响结果(这也称为MA部分的顺序 )。

Together, the expression for the ARIMA model becomes:

总之,ARIMA模型的表达式变为:

Image for post

In order to determine which orders or p and q are needed ACF and PACF plots are required. ACF plots shows the autocorrelations between yt and all observations from yt-1 to yt-k. PACF plots shows the partial autocorrelations which is the correlation between yt and yt-k that isn’t explained by their mutual correlations with another specified set of variables.

为了确定需要哪些阶数或pq ,需要ACF和PACF图。 ACF图显示了yt与从yt-1到yt-k的所有观测值之间的自相关。 PACF图显示了部分自相关,这是yt和yt-k之间的相关性,无法通过它们与另一组指定变量的相互相关性来解释。

In a future post I will demonstrate how to create all three of these models in python and how to use them to perform simple forecasts.

在以后的文章中,我将演示如何在python中创建所有这三个模型以及如何使用它们执行简单的预测。

翻译自: https://medium.com/@teamastertoast/an-overview-of-different-time-series-models-for-forecasting-c06c071c7684

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值