论文笔记：Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting

最新推荐文章于 2025-03-01 16:14:24 发布

UQI-LIUWJ

最新推荐文章于 2025-03-01 16:14:24 发布

阅读量2.1k

点赞数

分类专栏：论文笔记文章标签：论文阅读深度学习人工智能

本文链接：https://blog.csdn.net/qq_40206371/article/details/129148987

版权

论文笔记专栏收录该内容

356 篇文章

订阅专栏

Neurips 2021

1 Intro

针对长期时间序列预测，很多模型使用自注意力机制，并获得了比较好的结果
- 但是，在“长期”这个问题配置下，预测任务是很有挑战的：
  - 很难直接从长期时间序列中得到准确、可靠的时间依赖关系，因为各种时间特征（如趋势、周期性等）可能被纠缠在一起
  - 传统带self-attention的Transformer模型，在计算长期时间序列预测的过程中，计算量过大【（On^2)的计算量】
    - 基于Transformer的模型不得不使用稀疏形式的注意力机制来应对二次复杂度的问题，减少计算量（如logSparseTrans，Informer等）
    - 但造成了信息利用的瓶颈
    - ——>这也是长期时间序列的一个瓶颈
这篇论文提出了Autoformer
- 突破将序列分解作为预处理的传统方法，提出深度分解架构，能够从复杂时间模式中分解出可预测性更强的组分。
- 提出自相关机制（Auto-Correlation），代替点向连接的注意力机制，实现序列级（series-wise）连接和O(LlogL)的复杂度，打破信息利用瓶颈。

2 Autoformer

2.1 深度分解架构

在预测过程中，逐步从隐变量中分离趋势项与周期项，实现渐进式分解，实现分解、预测结果优化的交替进行、相互促进

对于输入时间序列，我们进行如下的分解，得到趋势项和周期项
- $\begin{aligned} & \mathcal{X}_{\mathrm{t}}=\operatorname{AvgPool}(\operatorname{Padding}(\mathcal{X})) \\ & \mathcal{X}_{\mathrm{s}}=\mathcal{X}-\mathcal{X}_{\mathrm{t}} \end{aligned}$
- AvgPool是滑动平均
- Padding的作用是滑动平均之后，时间序列长度不变
论文后续统一使用 $\mathcal{X}_{\mathrm{s}}, \mathcal{X}_{\mathrm{t}}=\operatorname{SeriesDecomp}(\mathcal{X})$ 来代替前面的两个式子

2.2 模型输入

encoder的输入 $\mathcal{X}_{\mathrm{en}} \in \mathbb{R}^{I \times d}$
decoder的输入,
- 其中：
  - $\mathcal{X}_0$ 是O位的0
  - $\mathcal{X}_{mean}$ 是 $\mathcal{X}_{\text {en } \frac{I}{2}: I}$ 的平均

2.3 Encoder

逐步消除趋势项（这部分会在Deocder中通过累积得到）

2.4 Decoder

对趋势项与周期项分别预测。

对于周期项，使用自相关机制，基于序列的周期性质来进行依赖挖掘，并聚合具有相似过程的子序列；
对于趋势项，使用累积的方式，逐步从预测的隐变量中提取出趋势信息。

注：不管第几个decoder block，第二个auto-correlation看到的都是最后一个encoder的输出

2.5 Auto-Correlation

聚合不同周期的相似子过程

对于时间序列Xt，定义auto-correlation为：
- 反映了Xt和τ时延之前的相似关系
选择最有可能（auto-correlation最大）的k（）个τ：
- $\tau_1, \cdots, \tau_k=\underset{\tau \in\{1, \cdots, L\}}{\arg \operatorname{Topk}}\left(\mathcal{R}_{\mathcal{Q}, \mathcal{K}}(\tau)\right)$
- ——》相当于估计出的周期
对这些选中的τ对应的autocorrelation值进行softmax
- $\widehat{\mathcal{R}}_{\mathcal{Q}, \mathcal{K}}\left(\tau_1\right), \cdots, \widehat{\mathcal{R}}_{\mathcal{Q}, \mathcal{K}}\left(\tau_k\right)=\operatorname{SoftMax}\left(\mathcal{R}_{\mathcal{Q}, \mathcal{K}}\left(\tau_1\right), \cdots, \mathcal{R}_{\mathcal{Q}, \mathcal{K}}\left(\tau_k\right)\right)$

Roll的意思是将时延的部分接到时间序列后面
将Roll后的结果，乘以对应Softmax的结果，加权求和
- $\text { Auto-Correlation }(\mathcal{Q}, \mathcal{K}, \mathcal{V})=\sum_{i=1}^k \operatorname{Roll}\left(\mathcal{V}, \tau_i\right) \widehat{\mathcal{R}}_{\mathcal{Q}, \mathcal{K}}\left(\tau_i\right)$

2.6 高效的计算方法

根据Wiener–Khinchin理论

$\begin{aligned} & \mathcal{S}_{\mathcal{X} \mathcal{X}}(f)=\mathcal{F}\left(\mathcal{X}_t\right) \mathcal{F}^*\left(\mathcal{X}_t\right)=\int_{-\infty}^{\infty} \mathcal{X}_t e^{-i 2 \pi t f} \mathrm{~d} t \int_{-\infty}^{\infty} \mathcal{X}_t e^{-i 2 \pi t f} \mathrm{~d} t \\ & \mathcal{R}_{\mathcal{X} \mathcal{X}}(\tau)=\mathcal{F}^{-1}\left(\mathcal{S}_{\mathcal{X} \mathcal{X}}(f)\right)=\int_{-\infty}^{\infty} \mathcal{S}_{\mathcal{X} \mathcal{X}}(f) e^{i 2 \pi f \tau} \mathrm{d} f \end{aligned}$