时间序列-天真的方法

时间序列-天真的方法 (Time Series - Naive Methods)

介绍 (Introduction)

Naive Methods such as assuming the predicted value at time ‘t’ to be the actual value of the variable at time ‘t-1’ or rolling mean of series, are used to weigh how well do the statistical models and machine learning models can perform and emphasize their need.

单纯的方法(例如,假设在时间“ t”处的预测值是时间在“ t-1”处变量的实际值或序列的滚动平均值)用于衡量统计模型和机器学习模型的性能并强调他们的需求。

In this chapter, let us try these models on one of the features of our time-series data.

在本章中,让我们在时间序列数据的功能之一上尝试这些模型。

First we shall see the mean of the ‘temperature’ feature of our data and the deviation around it. It is also useful to see maximum and minimum temperature values. We can use the functionalities of numpy library here.

首先,我们将看到数据“温度”特征的均值及其周围的偏差。 查看最大和最小温度值也很有用。 我们可以在这里使用numpy库的功能。

显示统计 (Showing statistics)

In [135]:

在[135]中:


import numpy
print (
   'Mean: ',numpy.mean(df['T']), '; 
   Standard Deviation: ',numpy.std(df['T']),'; 
   \nMaximum Temperature: ',max(df['T']),'; 
   Minimum Temperature: ',min(df['T'])
)

We have the statistics for all 9357 observations across equi-spaced timeline which are useful for us to understand the data.

我们拥有等距时间轴上所有9357个观测值的统计信息,这对于我们理解数据很有用。

Now we will try the first naive method, setting the predicted value at present time equal to actual value at previous time and calculate the root mean squared error(RMSE) for it to quantify the performance of this method.

现在,我们将尝试第一种天真的方法,将当前的预测值设置为与先前的实际值相等,并为其计算均方根误差(RMSE),以量化该方法的性能。

显示第一种天真的方法 (Showing 1st naïve method)

In [136]:

在[136]中:


df['T']
df['T_t-1'] = df['T'].shift(1)

In [137]:

在[137]中:


df_naive = df[['T','T_t-1']][1:]

In [138]:

在[138]中:


from sklearn import metrics
from math import sqrt

true = df_naive['T']
prediction = df_naive['T_t-1']
error = sqrt(metrics.mean_squared_error(true,prediction))
print ('RMSE for Naive Method 1: ', error)

RMSE for Naive Method 1: 12.901140576492974

原始方法1的RMSE:12.901140576492974

Let us see the next naive method, where predicted value at present time is equated to the mean of the time periods preceding it. We will calculate the RMSE for this method too.

让我们看看下一个朴素的方法,其中当前时间的预测值等于它之前的时间段的平均值。 我们还将为此方法计算RMSE。

显示第二种天真的方法 (Showing 2nd naive method)

In [139]:

在[139]中:


df['T_rm'] = df['T'].rolling(3).mean().shift(1)
df_naive = df[['T','T_rm']].dropna()

In [140]:

在[140]中:


true = df_naive['T']
prediction = df_naive['T_rm']
error = sqrt(metrics.mean_squared_error(true,prediction))
print ('RMSE for Naive Method 2: ', error)

RMSE for Naive Method 2: 14.957633272839242

原始方法2的RMSE:14.957633272839242

Here, you can experiment with various number of previous time periods also called ‘lags’ you want to consider, which is kept as 3 here. In this data it can be seen that as you increase the number of lags and error increases. If lag is kept 1, it becomes same as the naïve method used earlier.

在这里,您可以尝试各种数量的先前时间段,也就是您要考虑的“滞后时间”,此处保持为3。 从该数据可以看出,随着滞后次数的增加和误差的增加。 如果将滞后保持为1,它将与之前使用的朴素方法相同。

Points to Note

注意事项

  • You can write a very simple function for calculating root mean squared error. Here, we have used the mean squared error function from the package ‘sklearn’ and then taken its square root.

    您可以编写一个非常简单的函数来计算均方根误差。 在这里,我们使用了软件包“ sklearn”中的均方误差函数,然后取其平方根。

  • In pandas df[‘column_name’] can also be written as df.column_name, however for this dataset df.T will not work the same as df[‘T’] because df.T is the function for transposing a dataframe. So use only df[‘T’] or consider renaming this column before using the other syntax.

    在熊猫中,df ['column_name']也可以写为df.column_name,但是对于此数据集df.T不能与df ['T']相同,因为df.T是用于转置数据帧的功能。 因此,仅使用df ['T']或考虑在使用其他语法之前重命名此列。

翻译自: https://www.tutorialspoint.com/time_series/time_series_naive_methods.htm

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值