AI实战:用Transformer建立数值时间序列预测模型开源代码汇总

用Transformer建立数值时间序列预测模型开源代码汇总

Transformer是一个利用注意力机制来提高模型训练速度的模型。,trasnformer可以说是完全基于自注意力机制的一个深度学习模型,因为它适用于并行化计算,和它本身模型的复杂程度导致它在精度和性能上都要高于之前流行的RNN循环神经网络。

记录一下Transformer做数值时间序列预测的一下开源代码

time_series_forcasting

  • 代码地址
    https://github.com/CVxTz/time_series_forecasting

Transformer-Time-Series-Forecasting

  • 代码地址
    https://github.com/nklingen/Transformer-Time-Series-Forecasting

    Article: https://natasha-klingenbrunn.medium.com/transformer-implementation-for-time-series-forecasting-a9db2db5c820
    szZack的博客

Transformer_Time_Series

  • 代码地址
    https://github.com/mlpotter/Transformer_Time_Series

  • 论文地址:
    Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting (NeurIPS 2019)
    https://arxiv.org/pdf/1907.00235.pdf

Non-AR Spatial-Temporal Transformer

  • Introduction
    Implementation of the paper NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series Forecasting (submitted to ICML 2021).

    We propose a Non-Autoregressive Transformer architecture for time series forecasting, aiming at overcoming the time delay and accumulative error issues in the canonical Transformer. Moreover, we present a novel spatial-temporal attention mechanism, building a bridge by a learned temporal influence map to fill the gaps between the spatial and temporal attention, so that spatial and temporal dependencies can be processed integrally.

    • 论文地址:https://arxiv.org/pdf/2102.05624.pdf
    • 代码地址
      https://github.com/Flawless1202/Non-AR-Spatial-Temporal-Transformer

Multidimensional-time-series-with-transformer

Transformer/self-attention for Multidimensional time series forecasting 使用transformer架构实现多维时间预测

Rerfer to https://github.com/oliverguhr/transformer-time-series-prediction

  • 代码地址
    https://github.com/RuifMaxx/Multidimensional-time-series-with-transformer
    szZack的博客

TCCT2021

Convolutional Transformer Architectures Complementary to Time Series Forecasting Transformer Models

Paper: TCCT: Tightly-Coupled Convolutional Transformer on Time Series Forecasting https://arxiv.org/abs/2108.12784

It has already been accepted by Neurocomputing:

Journal ref.: Neurocomputing, Volume 480, 1 April 2022, Pages 131-145

doi: 10.1016/j.neucom.2022.01.039

  • 代码地址
    https://github.com/OrigamiSL/TCCT2021-Neurocomputing-

Time_Series_Transformers

  • Introduction
    This directory contains a Pytorch/Pytorch Lightning implementation of transformers applied to time series. We focus on Transformer-XL and Compressive Transformers.

    Transformer-XL is described in this paper Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov (*: equal contribution) Preprint 2018.

    Part of this code is from the authors at https://github.com/kimiyoung/transformer-xl.

  • 代码地址
    https://github.com/Emmanuel-R8/Time_Series_Transformers

Multi-Transformer: A new neural network-based architecture for forecasting S&P volatility

Transformer layers have already been successfully applied for NLP purposes. This repository adapts Transfomer layers in order to be used within hybrid volatility forecasting models. Following the intuition of bagging, this repository also introduces Multi-Transformer layers. The aim of this novel architecture is to improve the stability and accurateness of Transformer layers by averaging multiple attention mechanism.

The article collecting theoretical background and empirical results of the proposed model can be downloaded here. The stock volatility models based on Transformer and Multi-Transformer (T-GARCH, TL-GARCH, MT-GARCH and MTL-GARCH) overcome the performance of traditional autoregressive algorithms and other hybrid models based on feed forward layers or LSTM units. The following table collects the validation error (RMSE) by year and model.

  • 代码地址
    https://github.com/EduardoRamosP/MultiTransformer

szZack的博客

一个很好的完整的例子

  • 代码
    https://github.com/OrigamiSL/TCCT2021-Neurocomputing-
    https://github.com/zhouhaoyi/Informer2020

    parser = argparse.ArgumentParser(description='[Informer] Long Sequences Forecasting')
    
    parser.add_argument('--model', type=str, required=True, default='informer',help='model of experiment, options: [informer, informerstack, informerlight(TBD)]')
    
    parser.add_argument('--data', type=str, required=True, default='ETTh1', help='data')
    parser.add_argument('--root_path', type=str, default='./data/ETT/', help='root path of the data file')
    parser.add_argument('--data_path', type=str, default='ETTh1.csv', help='data file')    
    parser.add_argument('--features', type=str, default='M', help='forecasting task, options:[M, S, MS]; M:multivariate predict multivariate, S:univariate predict univariate, MS:multivariate predict univariate')
    parser.add_argument('--target', type=str, default='OT', help='target feature in S or MS task')
    parser.add_argument('--freq', type=str, default='h', help='freq for time features encoding, options:[s:secondly, t:minutely, h:hourly, d:daily, b:business days, w:weekly, m:monthly], you can also use more detailed freq like 15min or 3h')
    parser.add_argument('--checkpoints', type=str, default='./checkpoints/', help='location of model checkpoints')
    
    parser.add_argument('--seq_len', type=int, default=96, help='input sequence length of Informer encoder')
    parser.add_argument('--label_len', type=int, default=48, help='start token length of Informer decoder')
    parser.add_argument('--pred_len', type=int, default=24, help='prediction sequence length')
    # Informer decoder input: concat[start token series(label_len), zero padding series(pred_len)]
    
    parser.add_argument('--enc_in', type=int, default=7, help='encoder input size')
    parser.add_argument('--dec_in', type=int, default=7, help='decoder input size')
    parser.add_argument('--c_out', type=int, default=7, help='output size')
    parser.add_argument('--d_model', type=int, default=512, help='dimension of model')
    parser.add_argument('--n_heads', type=int, default=8, help='num of heads')
    parser.add_argument('--e_layers', type=int, default=2, help='num of encoder layers')
    parser.add_argument('--d_layers', type=int, default=1, help='num of decoder layers')
    parser.add_argument('--s_layers', type=str, default='3,2,1', help='num of stack encoder layers')
    parser.add_argument('--d_ff', type=int, default=2048, help='dimension of fcn')
    parser.add_argument('--factor', type=int, default=5, help='probsparse attn factor')
    parser.add_argument('--distil', action='store_false', help='whether to use distilling in encoder, using this argument means not using distilling', default=True)
    parser.add_argument('--CSP', action='store_true', help='whether to use CSPAttention, default=False', default=False)
    parser.add_argument('--dilated', action='store_true', help='whether to use dilated causal convolution in encoder, default=False', default=False)
    parser.add_argument('--passthrough', action='store_true', help='whether to use passthrough mechanism in encoder, default=False', default=False)
    parser.add_argument('--dropout', type=float, default=0.05, help='dropout')
    parser.add_argument('--attn', type=str, default='prob', help='attention used in encoder, options:[prob, full, log]')
    parser.add_argument('--embed', type=str, default='timeF', help='time features encoding, options:[timeF, fixed, learned]')
    parser.add_argument('--activation', type=str, default='gelu',help='activation')
    parser.add_argument('--output_attention', action='store_true', help='whether to output attention in encoder')
    parser.add_argument('--do_predict', action='store_true', help='whether to predict unseen future data')
    
    parser.add_argument('--num_workers', type=int, default=0, help='data loader num workers')
    parser.add_argument('--itr', type=int, default=2, help='experiments times')
    parser.add_argument('--train_epochs', type=int, default=6, help='train epochs')
    parser.add_argument('--batch_size', type=int, default=16, help='batch size of train input data')
    parser.add_argument('--patience', type=int, default=3, help='early stopping patience')
    parser.add_argument('--learning_rate', type=float, default=0.0001, help='optimizer learning rate')
    parser.add_argument('--des', type=str, default='test',help='exp description')
    parser.add_argument('--loss', type=str, default='mse',help='loss function')
    parser.add_argument('--lradj', type=str, default='type1',help='adjust learning rate')
    parser.add_argument('--use_amp', action='store_true', help='use automatic mixed precision training', default=False)
    parser.add_argument('--inverse', action='store_true', help='inverse output data', default=False)
    
    parser.add_argument('--use_gpu', type=bool, default=True, help='use gpu')
    parser.add_argument('--gpu', type=int, default=0, help='gpu')
    parser.add_argument('--use_multi_gpu', action='store_true', help='use multiple gpus', default=False)
    parser.add_argument('--devices', type=str, default='0,1,2,3',help='device ids of multile gpus')
    

szZack的博客

  • 数据集
    https://github.com/zhouhaoyi/ETDataset
### 基于Transformer的轨迹预测代码实现 以下是基于Transformer架构的一个简单轨迹预测代码示例。此代码实现了多变量输入和单步或多步输出的功能,适用于时间序列数据(如位置坐标)的预测。 #### 数据预处理 为了使模型能够学习到有效的模式,在训练之前需要对原始轨迹数据进行标准化处理并分割成滑动窗口形式的数据集。 ```python import numpy as np from sklearn.preprocessing import StandardScaler def preprocess_data(data, window_size=10): scaler = StandardScaler() scaled_data = scaler.fit_transform(data) X, y = [], [] for i in range(len(scaled_data) - window_size): X.append(scaled_data[i:i + window_size]) y.append(scaled_data[i + window_size]) return np.array(X), np.array(y), scaler ``` #### Transformer编码器结构定义 下面是一个简单的Transformer编码器模块定义,其中包含了多头自注意力机制以及前馈神经网络。 ```python import tensorflow as tf from tensorflow.keras.layers import LayerNormalization, Dense, Dropout class TransformerEncoder(tf.keras.Model): def __init__(self, d_model, num_heads, ff_dim, dropout_rate=0.1): super(TransformerEncoder, self).__init__() self.attention = tf.keras.layers.MultiHeadAttention(num_heads=num_heads, key_dim=d_model) self.ffn = tf.keras.Sequential([ Dense(ff_dim, activation="relu"), Dense(d_model), ]) self.layernorm1 = LayerNormalization(epsilon=1e-6) self.layernorm2 = LayerNormalization(epsilon=1e-6) self.dropout1 = Dropout(dropout_rate) self.dropout2 = Dropout(dropout_rate) def call(self, inputs, training): attn_output = self.attention(inputs, inputs) attn_output = self.dropout1(attn_output, training=training) out1 = self.layernorm1(inputs + attn_output) ffn_output = self.ffn(out1) ffn_output = self.dropout2(ffn_output, training=training) return self.layernorm2(out1 + ffn_output) ``` #### 完整模型构建 通过堆叠多个Transformer编码器层来形成最终的轨迹预测模型,并附加一个全连接层作为输出层。 ```python from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Input, Flatten def build_transformer_model(input_shape, d_model=64, num_heads=8, ff_dim=128, num_layers=3, output_dim=2): inputs = Input(shape=input_shape) x = inputs for _ in range(num_layers): encoder_layer = TransformerEncoder(d_model, num_heads, ff_dim) x = encoder_layer(x) x = Flatten()(x) outputs = Dense(output_dim)(x) model = tf.keras.Model(inputs=inputs, outputs=outputs) model.compile(optimizer='adam', loss='mse') return model ``` #### 训练过程 加载数据并对上述模型进行编译与拟合操作即可完成整个流程。 ```python X_train, y_train, scaler = preprocess_data(train_data, window_size=10) model = build_transformer_model((10, train_data.shape[1])) history = model.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.2) ``` 以上代码展示了如何利用TensorFlow/Keras库快速搭建一个基础版的Transformer用于解决轨迹预测问题[^1]。值得注意的是,如果希望进一步提升性能,则可考虑引入其他组件比如TCN或者LSTM单元构成更复杂的混合模型体系结构[^2]。 ###
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

szZack

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值