愉快的学习就从翻译开始吧_Multivariate Forecasting

最新推荐文章于 2022-06-05 23:06:11 发布

dreamscape9999

最新推荐文章于 2022-06-05 23:06:11 发布

阅读量245

点赞数

Multivariate Forecasting/多变量预测

Another important type of time series is called multivariate time series.

另一种重要的时间序列类型被称为多变量时间序列

This is where we may have observations of multiple different measures and an interest in forecasting one or more of them.

在这里我们可以有多个不同测量的观测值，并且对预测他们中的一个或多个有兴趣

For example, we may have two sets of time series observations obs1 and obs2 and we wish to forecast one or both of these.

例如，我们可能有两组时间序列观测obs1和obs2，我们希望预测其中的一个或两个。

We can call series_to_supervised() in exactly the same way.

我们可以用完全相同的方式调用series_to_supervised（）。

For example:

例如：

from pandas import DataFrame
from pandas import concat


def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
    """
    Frame a time series as a supervised learning dataset.
    Arguments:
        data: Sequence of observations as a list or NumPy array.
        n_in: Number of lag observations as input (X).
        n_out: Number of observations as output (y).
        dropnan: Boolean whether or not to drop rows with NaN values.
    Returns:
        Pandas DataFrame of series framed for supervised learning.
    """
    n_vars = 1 if type(data) is list else data.shape[1]
    df = DataFrame(data)
    cols, names = list(), list()
    # input sequence (t-n, ... t-1)
    for i in range(n_in, 0, -1):
        cols.append(df.shift(i))
        names += [('var%d(t-%d)' % (j + 1, i)) for j in range(n_vars)]
    # forecast sequence (t, t+1, ... t+n)
    for i in range(0, n_out):
        cols.append(df.shift(-i))
        if i == 0:
            names += [('var%d(t)' % (j + 1)) for j in range(n_vars)]
        else:
            names += [('var%d(t+%d)' % (j + 1, i)) for j in range(n_vars)]
    # put it all together
    agg = concat(cols, axis=1)
    agg.columns = names
    # drop rows with NaN values
    if dropnan:
        agg.dropna(inplace=True)
    return agg


raw = DataFrame()
raw['ob1'] = [x for x in range(10)]
raw['ob2'] = [x for x in range(50, 60)]
values = raw.values
data = series_to_supervised(values)
print(data)

Running the example prints the new framing of the data, showing an input pattern with one time step for both variables and an output pattern of one time step for both variables.

运行该示例将打印数据的新框架，显示两个变量的一个时间步的输入对，以及两个变量的一个时间步的输出对。

Again, depending on the specifics of the problem, the division of columns into X and Y components can be chosen arbitrarily, such as if the current observation of var1 was also provided as input and only var2 was to be predicted.

同样，根据问题的具体情况，可以任意选择将列划分为X和Y分量，例如，如果还提供var1的当前观察作为输入，仅预测var2。

 
         1 
       
         2 
       
         3 
       
         4 
       
         5 
       
         6 
       
         7 
       
         8 
       
         9 
       
         10 
       
            var1(t-1)  var2(t-1)  var1(t)  var2(t) 
       
         1        0.0       50.0        1       51 
       
         2        1.0       51.0        2       52 
       
         3        2.0       52.0        3       53 
       
         4        3.0       53.0        4       54 
       
         5        4.0       54.0        5       55 
       
         6        5.0       55.0        6       56 
       
         7        6.0       56.0        7       57 
       
         8        7.0       57.0        8       58 
       
         9        8.0       58.0        9       59

You can see how this may be easily used for sequence forecasting with multivariate time series by specifying the length of the input and output sequences as above.

通过像上面那样指定输入和输出序列的长度，您可以看到这将多么容易用于多变量时间序列的序列预测。

For example, below is an example of a reframing with 1 time step as input and 2 time steps as forecast sequence.

例如，下面是以1个时间步骤作为输入并且2个时间步骤作为预测序列的重新构造的示例。

from pandas import DataFrame
from pandas import concat


def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
    """
    Frame a time series as a supervised learning dataset.
    Arguments:
        data: Sequence of observations as a list or NumPy array.
        n_in: Number of lag observations as input (X).
        n_out: Number of observations as output (y).
        dropnan: Boolean whether or not to drop rows with NaN values.
    Returns:
        Pandas DataFrame of series framed for supervised learning.
    """
    n_vars = 1 if type(data) is list else data.shape[1]
    df = DataFrame(data)
    cols, names = list(), list()
    # input sequence (t-n, ... t-1)
    for i in range(n_in, 0, -1):
        cols.append(df.shift(i))
        names += [('var%d(t-%d)' % (j + 1, i)) for j in range(n_vars)]
    # forecast sequence (t, t+1, ... t+n)
    for i in range(0, n_out):
        cols.append(df.shift(-i))
        if i == 0:
            names += [('var%d(t)' % (j + 1)) for j in range(n_vars)]
        else:
            names += [('var%d(t+%d)' % (j + 1, i)) for j in range(n_vars)]
    # put it all together
    agg = concat(cols, axis=1)
    agg.columns = names
    # drop rows with NaN values
    if dropnan:
        agg.dropna(inplace=True)
    return agg


raw = DataFrame()
raw['ob1'] = [x for x in range(10)]
raw['ob2'] = [x for x in range(50, 60)]
values = raw.values
data = series_to_supervised(values, 1, 2)
print(data)

Running the example shows the large reframed DataFrame.

 
         1 
       
         2 
       
         3 
       
         4 
       
         5 
       
         6 
       
         7 
       
         8 
       
         9 
       
            var1(t-1)  var2(t-1)  var1(t)  var2(t)  var1(t+1)  var2(t+1) 
       
         1        0.0       50.0        1       51        2.0       52.0 
       
         2        1.0       51.0        2       52        3.0       53.0 
       
         3        2.0       52.0        3       53        4.0       54.0 
       
         4        3.0       53.0        4       54        5.0       55.0 
       
         5        4.0       54.0        5       55        6.0       56.0 
       
         6        5.0       55.0        6       56        7.0       57.0 
       
         7        6.0       56.0        7       57        8.0       58.0 
       
         8        7.0       57.0        8       58        9.0       59.0

Experiment with your own dataset and try multiple different framings to see what works best.

试验您自己的数据集并尝试多种不同的框架，看看哪种方法效果最好。

Summary/总结

In this tutorial, you discovered how to reframe time series datasets as supervised learning problems with Python.

在本教程中，您了解了如何将时间序列数据集重新定义为Python的监督学习问题。

Specifically, you learned:

具体来说，你了解到：

About the Pandas shift() function and how it can be used to automatically define supervised learning datasets from time series data.
关于Pandas shift（）函数以及它如何用于从时间序列数据中自动定义监督学习数据集。
How to reframe a univariate time series into one-step and multi-step supervised learning problems.
如何将单变量时间序列重构为一步和多步监督学习问题。
How to reframe multivariate time series into one-step and multi-step supervised learning problems.
如何将多变量时间序列重构为一步和多步监督学习问题。

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.