欢迎关注我,IT界搬运喵专注Python!
时间序列问题是数据科学中最难解决的问题之一。传统的处理方法如 ARIMA、SARIMA 等,虽然是很好,但在处理具有非线性特性或非平稳时间序列问题时很难取得满意的预测效果。
为了获得更好的预测效果,并且可以简单高效的完成任务,本文中我将分享给大家7个用于处理时间序列问题的 Python 工具包,有所收获,点赞支持,欢迎收藏学习。
1、tsfresh
tsfresh 是一个很棒的 python 包,它可以自动计算大量的时间序列特性,包含许多特征提取方法和强大的特征选择算法。
让我们以获取航空公司乘客的标准数据集为例,来了解tsfresh
# Importing libraries
import pandas as pd
from tsfresh import extract_features, extract_relevant_features, select_features
from tsfresh.utilities.dataframe_functions import impute, make_forecasting_frame
from tsfresh.feature_extraction import ComprehensiveFCParameters, settings
# Reading the data
data = pd.read_csv('../input/air-passengers/AirPassengers.csv')
# Some preprocessing for time component:
data.columns = ['month','Passengers']
data['month'] = pd.to_datetime(data['month'],infer_datetime_format=True,format='%y%m')
data.index = data.month
df_air = data.drop(['month'], axis = 1)
# Use Forecasting frame from tsfresh for rolling forecast training
df_shift, y_air = make_forecasting_frame(df_air["Passengers"], kind="Passengers", max_timeshift=12, rolling_direction=1)
print(df_shift)
数据需要被格式化为如下格式:
# Getting Comprehensive Features
extraction_settings = ComprehensiveFCParameters()
X = extract_features(df_shift, column_id="id", column_sort="time", column_value="value", impute_function=impute,show_warnings=False,default_fc_parameters=extraction_settings)