时间序列--去除季节性因素

时间序列数据集可以包含季节性成分。这是一个随时间重复的周期,如每月或每年。这种重复的循环可能会模糊我们在预测时希望建模的信号,从而可能为我们的预测模型提供一个强大的信号。

Minimum Daily Temperatures

可以看出有很强的季节性成分

第一个方法:差分

用上一年的数据剪去这一年的数据

from pandas import Series
from matplotlib import pyplot
series = Series.from_csv('daily-minimum-temperatures.csv', header=0)
X = series.values
diff = list()
days_in_year = 365
for i in range(days_in_year, len(X)):
	value = X[i] - X[i - days_in_year]
	diff.append(value)
pyplot.plot(diff)
pyplot.show()

最后结果如下

Differencing Sesaonal Adjusted Minimum Daily Temperature

我们的数据集中有两个闰年(1984年和1988年)。它们没有被显式地处理;这意味着1984年3月以后的观测偏移量错了一天,1988年3月以后的观测偏移量错了两天。

我们可以不用一天一天的差分,而是剪去上个月的均值

from pandas import Series
from matplotlib import pyplot
series = Series.from_csv('daily-minimum-temperatures.csv', header=0)
resample = series.resample('M')
monthly_mean = resample.mean()
X = series.values
diff = list()
months_in_year = 12
for i in range(months_in_year, len(monthly_mean)):
	value = monthly_mean[i] - monthly_mean[i - months_in_year]
	diff.append(value)
pyplot.plot(diff)
pyplot.show()

Minimum Monthly Temperature Dataset

上面这个图是每个月的表现,可以看出有明显趋势

月份和上年月份值相减之后变成下列形式

Seasonal Adjusted Minimum Monthly Temperature Dataset

接下来,我们可以使用去年同期的月平均最低气温来调整日最低气温数据集。同样,我们只是跳过第一年的数据,但是使用月度数据而不是每日数据进行修正可能是更稳定的方法。

from pandas import Series
from matplotlib import pyplot
series = Series.from_csv('daily-minimum-temperatures.csv', header=0)
X = series.values
diff = list()
days_in_year = 365
for i in range(days_in_year, len(X)):
	month_str = str(series.index[i].year-1)+'-'+str(series.index[i].month)
	month_mean_last_year = series[month_str].mean()
	value = X[i] - month_mean_last_year
	diff.append(value)
pyplot.plot(diff)
pyplot.show()

最后调整如下

More Stable Seasonal Adjusted Minimum Monthly Temperature Dataset With 

更灵活的方法是取前一年同一日期任意一周的平均值,这可能再次是更好的方法。此外,多个尺度的温度数据可能存在季节性,可直接或间接加以修正,例如:天的水平。多日水平,如一周或几周。多周水平,如一个月。多月水平,如季度或季节。

第二个方法:机器学习

from pandas import Series
from matplotlib import pyplot
from numpy import polyfit
series = Series.from_csv('daily-minimum-temperatures.csv', header=0)
# fit polynomial: x^2*b1 + x*b2 + ... + bn
X = [i%365 for i in range(0, len(series))]
y = series.values
degree = 4
coef = polyfit(X, y, degree)
print('Coefficients: %s' % coef)
# create curve
curve = list()
for i in range(len(X)):
	value = coef[-1]
	for d in range(degree):
		value += X[i]**(degree-d) * coef[d]
	curve.append(value)
# plot curve over original data
pyplot.plot(series.values)
pyplot.plot(curve, color='red', linewidth=3)
pyplot.show()

效果图如下:

Curve Fit Seasonal Model of Daily Minimum Temperature

之后我再讲两者相减

from pandas import Series
from matplotlib import pyplot
from numpy import polyfit
series = Series.from_csv('daily-minimum-temperatures.csv', header=0)
# fit polynomial: x^2*b1 + x*b2 + ... + bn
X = [i%365 for i in range(0, len(series))]
y = series.values
degree = 4
coef = polyfit(X, y, degree)
print('Coefficients: %s' % coef)
# create curve
curve = list()
for i in range(len(X)):
	value = coef[-1]
	for d in range(degree):
		value += X[i]**(degree-d) * coef[d]
	curve.append(value)
# create seasonally adjusted
values = series.values
diff = list()
for i in range(len(values)):
	value = values[i] - curve[i]
	diff.append(value)
pyplot.plot(diff)
pyplot.show()

最终效果如下

Curve Fit Seasonal Adjusted Daily Minimum Temperature 

为什么要去趋势呢?

我自己的想法是可以把这些趋势也当做一个feature用于预测

去趋势之后的数据maybe更有代表性。。

https://machinelearningmastery.com/time-series-seasonality-with-python/

 

  • 5
    点赞
  • 81
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值