Survey of long sequence time-series forecasting by deep learning algorithms

Abstract

Many real-world applications require the prediction of long sequence time-series. Long sequence time-series forecasting (LSTF) demands a high prediction capacity of the model, which is the ability to capture precise long-range dependency coupling between output and input efficiently. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Transformer have achieved superior performances in many tasks in natural language processing and computer vision, which also triggered great interest in the time series community. Among multiple advantages of Transformer, the ability to capture long-range dependencies and interactions is especially attractive for time series modeling, leading to exciting progress in various time series applications. In this paper, we systematically review Transformers for time series modeling by highlighting their strengths as well as limitations. And then, we introduction its kinds and greater models.

Introduction

Time series forecasting has become increasingly ubiquitous in real-world applications, such as weather forecasting, energy consumption planning, and financial risk assessment. Recently, Transformers[1] have shown great power in time series forecasting due to their global-range modeling ability and remarkable architectural design from time series forecasting communtiy’s thinking[2]. But this model still exits some problems, what had been proved including the data from LSTF-linear[3], performance drops sharply when encountering non-stationary and over-stationarization problem[4], which shown on trained on the stationarized series tend to generate indistinguishable attentions and unable to capture eventful temporal dependencies. The other problem is that although transformer-based models have made progress in this field, they usually do not make full use of three features of multivariate time series: global information, local information, and variables correlation.

The innovation of Transformer in deep learning has brought great interests recently due to its excellent performances in natural language processing[5] (NLP), computer vision (CV), and speech processing. Over the past few years, numerous Transformers have been proposed to advance the state-of-the-art performances of various tasks significantly. There are quite a few literature reviews from different aspects, such as in NLP applications, CV applications and efficient Transformers. Transformers have shown great modeling ability for long-range dependencies and interactions in sequential data and thus are appealing to time series modeling. Many variants of Transformer have been proposed to address special challenges in time series modeling and have been successfully applied to various time series tasks. As Transformer for time series is an emerging subject in deep learning, a systematic and comprehensive survey on time series Transformers would greatly benefit the time series community.

In this paper, we aim to fill the gap by summarizing the main developments of time series Transformers. At the first, we elaborate the Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output. And introduce the research results and optimization model of transformer in Multi-headed Self-attention and multi-variable time prediction model

Related works

2.1 Multi-headed Self-attention for Time series Forecasting

Self-attention, sometimes called intra-attention is an attention mechanism relating different positions of a single sequence in order to compute a representation of the sequence. The Transformer is the first transduction model relying entirely on self-attention to compute representations of its input and output without using sequence-aligned RNNs or convolution. In addition to learning attention in time domain, recent works also explore learning attention in frequency domains. In time series forecasting, a new proposed method called TDformer[6] which first applies seasonal-trend decomposition, and then additively combines an MLP which predicts the trend component with Fourier attention which predicts the seasonal component to obtain the final prediction. And the result that extensive experiments on benchmark time-series forecasting datasets demonstrate that TDformer achieves state-of-the-art performance against existing attention-based models. Another study corresponding to this is that Triformer[7] which is scheduled to ensure high efficiency and accuracy. That is a triangular, variable-specific attention. It has a better accurate and efficient compare to state-of-the-art methods.

2.2 Deep Models for Time series Forecasting

Soon afterward, Transformer emerges and shows great power in sequence modeling. To overcome the quadratic computation growth on sequence length, subsequent works aim to reduce Self-Attention’s complexity. Especially in time series forecasting, Informer[8] extends Self-Attention with KL-divergence criterion to select dominant queries. Reformer[9] introduces local-sensitive hashing (LSH) to approximate attention by allocated similar queries. Seformer[10] through designs the binary position encoding mechanism to solve the time cost problem and baseline drift problem arising from neural ODE-style position encoding on LSTF tasks. But this model’s limition is that current Seformer model is that input sequence chunking is not flexible enough, and fixed sequence length chunking may destroy the local dependency information of some sequence. Sepformer[11] is the composition of Sepformer , SWformer and Mini-SWformer , each one is designed to enhance prediction capacity with LSTF problem.Apart from improved by reduced complexity, the following models futher develop delicate building blocks for time series forecasting. Autoformer[12] fuses the decomposition blocks into a canonical structure and develops Auto-Correlation to discover series-wise connections. Pyraformer[13] designs pyramid attention module (PAM) to capture temporal dependencies with different hierarchies. Various modes based on transformers have demonstrated their excellent performance,reflecting the superiority of transformer again.

2.3 Further Optimization of the Model

Based on the existing transformer, researchers are committed to developing a more efficient architecture. Especially in framework for multivariate time series, scientist proposed a novel framework on the transformer encoder architecture[15]. The framework includes an unsupervised pre-training scheme, which can offer substantial performance benefits over fully supervised learning on downstream tasks, both with but even without lever- aging additional unlabeled data, i.e., by reusing the existing data samples to evaluate. The results display that surpasses the performance of all current state-of-the-art super- vised methods. The other model is a double sampling transformer –DSformer[15], which consists of the double sampling (DS) block and the temporal variable attention (TVA) block, which is meant to effectively mine the above three features and establish a high-precision prediction model. At the same time, Foreformer[16] is also an enhanced transformer-based framework for multivariate time series forecasting. Regard to an enhanced Transformer-based framework for MTSF, Foreformer achieves state-of-the-art forecasting performance compared with other deep learning methods. The validity of each essential component in our proposed method is quantified through a detailed ablation study. Lastly, hyperparameters sensitivity analysis is performed to prove the rationality of the experiment. When encountering the linear forecasting models, transformer is challenged in forecasting series with larger lookback windows due to performance degradation and computation explosion. So in order to solve this question, Itransformer[17] is proposed, which simply inverts the duties of the attention mechanism and the feed-forward network. This model achieves consistent state-of-the-art on several real-world datasets, which further empowers the Transformer family with promoted performance, generalization ability across different variates, and better utilization of arbitrary lookback windows, making it a nice alternative as the fundamental backbone of time series forecasting. In other aspect, a novel attention-based architecture—Temporal Fusion Transformer[18] (TFT) is pTo learn temporal relationships at different scales, TFT uses recurrent layers for local processing and interpretable self-attention layers for long-term dependencies. TFT utilizes specialized components to select relevant features and a series of gating layers to suppress unnecessary components, enabling high performance in a wide range of scenery posed, which combines high-performance multi-horizon forecasting with interpretable insights into temporal dynamics. By combining Transformer with the seasonal-trend decomposition method, in which the decomposition method captures the global profile of time series while Transformers capture more detailed structures. To further enhance the performance of Transformer for long term prediction, we exploit the fact that most time series tend to have a sparse representation in well-known basis such as Fourier transform, and develop a frequency enhanced Transformer. Besides being more effective, the proposed method, termed as Frequency Enhanced Decomposed Transformer[19] (FEDformer), is more efficient than standard Transformer with a linear complexity to the sequence length.

2.4 Other Methods for Long-term Time Series Forecasting

There are also many other methods in LSTF. For instance, a Time-Frequency Enhanced Decomposed Network[20] (TFDNet), which could capture both the long-term underlying patterns and temporal periodicity from the time-frequency domain. By devising a multiscale time-frequency enhanced encoder backbone and developing two separate trend and seasonal time-frequency blocks to capture the distinct patterns within the decomposed trend and seasonal components in multi-resolution, we can learn strategies of the kernel operations. PatchMixer[21] is proposed to tackle a transformer’s problem that the permutation invariant self-attention mechanism within Transformers leads to a loss of temporal information. It introduces a permutation-variant convolutional structure to preserve temporal information. In long time series forecasting, proposed TimesNet[22] achieves consistent state-of-the-art in five mainstream time series analysis tasks, including short- and long-term forecasting, imputation, classification, and anomaly detection.

Conclusion

In a summary, the Transformer, the first sequence transduction model based entirely on attention, replacing the recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-attention. While employing positional encoding and using tokens to embed sub-series in Transformers facilitate preserving some ordering information, the nature of the permutation-invariant self-attention mechanism inevitably results in temporal information loss. Although Transformer has produced many types and achieved impressive results, in the test of the LSTF-Linear model found that its effect is not significant.

References

  1. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz KaiserIllia Polosukhin.(2023).Attention Is All You Need.arxiv .
  2. Qingsong Wen, Tian Zhou, Chaoli Zhang, Weiqi Chen, Ziqing Ma, Junchi YanLiang Sun.(2023).Transformers in Time Series: A Survey.arxiv.
  3. Ailing Zeng,Muxi Chen,Lei ZhangQiang Xu.(2022).Are Transformers Effective for Time Series Forecasting?.arxiv.
  4. Yong Liu, Haixu Wu, Jianmin Wang, Mingsheng Long.(2022).Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting.arxiv.
  5. Elov B B, Khamroeva Sh MXusainova Z Y.(2023).The pipeline processing of NLP.E3S WEB OF CONFERENCES,413.
  6. Xiyuan Zhang, Xiaoyong Jin, Karthick Gopalswamy, Gaurav Gupta, Youngsuk Park, Xingjian Shi, Hao Wang, Danielle C Maddix, Yuyang Wang. (2022). First De-Trend then Attend: Rethinking Attention for Time-Series Forecasting.arxiv.
  7. Razvan-Gabriel Cirstea, Chenjuan Guo, Bin Yang, Tung Kieu, Xuanyi Dong, Shirui Pan.(2022). Triformer: Triangular, Variable-Specific Attentions for Long Sequence Multivariate Time Series Forecasting-Full Version.arxiv
  8. Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, Wangcai Zhang.(2021).Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting.arxiv
  9. Nikita Kitaev, Lukasz Kaiser, and Anselm LevKaya. Reformer: The efficient transformer. In ICLR,2020.
  10. Zeng, P., Hu, G., Zhou, X. et al. (2023).Seformer: a long sequence time-series forecasting model based on binary position encoding and information transfer regularization. Appl Intell 53, 15747–15771 (2023).
  11. Jin Fan, Zehao Wang, Danfeng Sun, Huifeng Wu.(2023).Sepformer-based Models: More Efficient Models for Long Sequence Time-Series Forecasting in IEEE Transactions on Emerging Topics in Computing
  12. Shizhan Liu, Hang Yu, Cong Liao, Jianguo Li, Weiyao Lin, Alex X Liu, and Schahram Dustdar. Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. In ICLR, 2021.
  13. Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. Autoformer: Decomposition transformers with Auto-Correlation for long-term series forecasting. In nneurIPS,2021.
  14. George Zerveas, Srideepika Jayaraman, Dhaval Patel, Anuradha Bhamidipaty, Carsten Eickhoff.(2020).A Transformer-based Framework for Multivariate Time Series Representation Learning.arxiv.
  15. Chengqing Yu, Fei Wang, Zezhi Shao, Tao Sun, Lin WuYong, jun Xu.(2023).DSformer: A Double Sampling Transformer for Multivariate Time Series Long-term Prediction.arxiv.
  16. Ye Yang, Jiangang Lu.(2022). Foreformer: an enhanced transformer-based framework for multivariate time series forecasting. In nature 2022
  17. Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao.(2023).Itransformer: Inverted Transformers Are Effective for Time Series Forecasting.arxiv
  18. Bryan Lim, Sercan O.Arik, Nicolas Loeff, Tomas Pfister.(2021). Temporal Fusion Transformers for interpretable multi-horizon time series forecasting.arxiv.
  19. Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang SunRong Jin.(2022).FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting.arxiv.
  20. Yuxiao Luo, Ziyu LyuXingyu Huang. (2023).TFDNet: Time-Frequency Enhanced Decomposed Network for Long-term Time Series Forecasting.arxiv.
  21. Zeying Gong, Yujin TangJunwei Liang. (2023).PatchMixer: A Patch-Mixing Architecture for Long-Term Time Series Forecasting. arxiv.
  22. Haixu Wu, Tengge Hu, Yong Liu, Hang Zhou, Jianmin Wang, Mingsheng Long.(2023).TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis.arxiv.
  • 16
    点赞
  • 16
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值