Fedformer:Frequency Enhanced DecomposedTransformer for long-term series forecasting[还在学习中···]

思考实践

已于 2022-07-08 19:28:18 修改

阅读量1.4k

点赞数

分类专栏： # 时间序列预测论文文章标签： Fedformer freq-enhanced decompsed transformer

于 2022-06-09 17:14:10 首次发布

本文链接：https://blog.csdn.net/weixin_43332715/article/details/125131211

版权

论文同时被 2 个专栏收录

32 篇文章

订阅专栏

时间序列预测

14 篇文章

订阅专栏

初读//原论文18页，主要内容占8页，可读性好

1.Introduction

To further en-hance the performance of Transformer for long-term prediction, we exploit the fact that most time series tend to have a sparse representation in well-known basis such as Fourier transform, and develop a frequency enhanced Transformer. Be-sides being more effective, the proposed method,termed as Frequency Enhanced Decomposed.Transformer (FEDformer), is more efficient than standard Transformer with a linear complexity to the sequence length.//说了大多数时间序列在傅里叶变换下的频率特征表示都是稀疏的，咱们增强在这个傅里叶变换背景下的特征表示，然后咱不仅做好了特征的表达，对比原生Transformer，还提高了计算效率，时间复杂度是线性的。

To the best of our knowledge, our proposed method is the first work to achieve fast attention mechanism through low rank approximated transformation in frequency domain for time series forecasting.//就我们所知，我们所提出的方法是第一个通过时间序列预测的频域低阶近似变换来实现快速关注机制的工作。

It is clear that the pre-
dicted time series shared a different distribution from that #发现问题
of ground truth. The discrepancy between ground truth
and prediction could be explained by the point-wise atten-
tion and prediction in Transformer. Since prediction for
each timestep is made individually and independently, it #分析问题
is likely that the model fails to maintain the global prop-
erty and statistics of time series as a whole. To address
this problem, we exploit two ideas in this work. The first         #解决问题
idea is to incorporate a seasonal-trend decomposition ap-
proach (Cleveland et al., 1990; Wen et al., 2019), which is
widely used in time series analysis, into the Transformer-
based method. Although this idea has been exploited be-
fore (Oreshkin et al., 2019; Wu et al., 2021), we present a
special design of network that is effective in bringing the
distribution of prediction close to that of ground truth, ac-
cording to Kologrov-Smirnov distribution test. Our second
idea is to combine Fourier analysis with the Transformer-
based method. Instead of applying Transformer to the time
domain, we apply it to the frequency domain which helps    #总结新意：一个新的分解模块和
Transformer better capture global properties of time series. # 在频域使用，transformer
Combining both ideas, we propose a Frequency Enhanced         # 来提取时间序列全局信息
Decomposition Transformer, or, FEDformer for short, for              #
long-term time series forecasting.

2.Method

Compact Representation of Time Series in Frequency Domain#时间序列的频域紧凑表示

时序数据可以从时域和频域建模。与其他算法不同，本文基于神经网络频域运算。傅里叶分析是深入频域的常用工具，如何用傅里叶恰当地表示时间序列是关键。

保留所有的频率分量表示较差，时序的许多高频变化是噪声。仅保留低频也不适合，因为一些趋势变化代表重要事件。本文使用少量选定的傅里叶分量保持时间序列的紧凑表示带来transformer的高效计算。通过随机选择常数个数的傅立叶分量(包括高频和低频)来表示时间序列。下面从理论上分析了随机选择的合理性，并可以在实验中验证。

这段我是这么理解的，X1(t),...,Xm(t)好比m个变量，每个变量都是关于时间t的序列。

通过对每个变量的时间序列进行傅里叶变换，得到向量 $a_{i}=(a_{i,1},...,a_{i,d})^{T} \epsilon R^{d}$ ,（这其中每一个分量都是一个频率分量），把所有的傅里叶变换向量都装进一个矩阵里面，得到矩阵 $A=(a_{1},a_{2},...,a_{m})^{T} \epsilon R^{m\times d}$ ,这个矩阵每一行代表不同的变量的时间序列(基于前面打比方举例)，每一列代表着不同的傅里叶变换分量(就是复数)，尽管适用所有的傅里叶分量可以尽可能的保存时序里面的历史信息，但是这样会对历史数据过拟合，并且预测效果很差。因此，我们需要选取傅里叶分量的子集，一方面足够小，小到满足规避过拟合的程度，另一方面尽可能的保留历史信息.所以我们提议均匀随机的从d个傅里叶分量里面选取s个分量.特别的，我们用 $i1<i2<i3<...<is$ 来代表随机选取的分量，我们构建矩阵 $S\epsilon \left \{0,1 \right \}^{s\times d}$ ，并且 $S_{i,k}=1$ $if:i=i_{k}$ , $else: S_{i,k}=0$

预算我们的多变量时序的矩阵表达变为 $A^{'}=AS^{T} \epsilon R^{m\times s}$

Figure 1. Overall structure of FEDformer

1 模型架构：FEDformer由n个encoder和m个decoder组成,可以看出渐进式分解序列的整体架构是在autoformer上改进，将autoformer的序列级别注意替换成了频域注意力。

2 .1 注意力改进分（傅里叶版本）：解模块增加了专家模型频率增强块(FEB,绿色，相当于self-attention机制)和频率增强注意(FE红色，相当于cross attention)在频域进行表示学习。FEB或FEA都有两个版本(-f 傅立叶基& -w小波基)。混合专家分解块(MOEDecomp，黄色)从输入数据提取季节趋势，下面是第一个版本，用傅立叶变换实现的

Figure 2. Frequency Enhanced Block (FEB)

Figure 3. Frequency Enhanced Attention (FEA)

2.2 注意力改进（小波版）：

Figure 5. Top Left: Wavelet frequency enhanced block decompo-
sition stage. Top Right: Wavelet block reconstruction stage shared
by FEB-w and FEA-w. Bottom: Wavelet frequency enhanced
cross attention decomposition stage.

官方代码

https://github.com/MAZiqing/FEDformer

@inproceedings{zhou2022fedformer,
  title={{FEDformer}: Frequency enhanced decomposed transformer for long-term series forecasting},
  author={Zhou, Tian and Ma, Ziqing and Wen, Qingsong and Wang, Xue and Sun, Liang and Jin, Rong},
  booktitle={Proc. 39th International Conference on Machine Learning (ICML 2022)},
  location = {Baltimore, Maryland},
  pages={},
  year={2022}
}

应该是阿里达摩院的工作