DSANet: Dual Self-Attention Network for Multivariate Time Series Forecasting以及pytorch代码实现

最新推荐文章于 2024-05-26 09:36:16 发布

鱼吐泡泡水

最新推荐文章于 2024-05-26 09:36:16 发布

阅读量1.4k

点赞数 1

分类专栏：多元回归文章标签： python

本文链接：https://blog.csdn.net/m0_37859875/article/details/111397450

版权

多元回归专栏收录该内容

5 篇文章 2 订阅

订阅专栏

Huang S, Wang D, Wu X, et al. DSANet: Dual Self-Attention Network for Multivariate Time Series Forecasting[C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2019: 2129-2132.
原文链接：
https://doi.org/10.1145/3357384.3358132
代码链接（Pytorch）：
https://github.com/bighuang624/DSANet

Motivation

Traditional methods fail to capture complicated nonlinear dependencies between time steps and between multiple time series.
Recurrent neural network and attention mechanism have been used to model periodic temporal patterns across multiple time steps. However, these models fit not well for time series with dynamic-period patterns or nonperiodic patterns.

Dual self-attention network (DSANet) :
Highly efficient multivariate time series forecasting, especially for dynamic-period or nonperiodic series.

Model

在这里插入图片描述

Global Temporal Convolution:
Extract time-invariant patterns of all time steps for univariate time series.
Local Temporal Convolution:
Time steps with a shorter relative distance have a larger impact on each other.
Focus on modeling local temporal patterns.
Self-Attention Module:
Strong feature-extraction capability of self-attentional networks.
Capture the dependencies between different series.
Scaled dot product self-attention:

Position-wise feed-forward:
Autoregressive Component:
Due to the nonlinearity of both convolutional and self-attention components, the scale of neural network output is not sensitive to that of input.
The classical AR model is taken as the linear component.
Generation of Prediction:
First use a dense layer to combine the outputs of two self-attention modules;
Then obtained by summing the self-attention based prediction and the AR prediction.

Experiment

Gas station service company:
Daily revenue of five gas stations ranging from 2015.12.1-2018.12.1.
The stations are geographically close, which means a complex mix of revenue promotion and mutual exclusion exists between them.
training (60%), validation (20%) and test (20%).

mini-batch stochastic gradient descent (SGD) with the Adam optimizer , loss is MSE
dropout rate : 0.1

root relative squared error (RRSE), mean absolute error (MAE) and empirical correlation coefficient (CORR)
在这里插入图片描述

(1) The best result on each window-horizon pair is obtained by complete DSANet, showing all components have contributed to the effectiveness and robustness of the whole model;
(2) The performance of DSAwoAR significantly drops, showing that the AR component plays a crucial role. The reason is that AR is generally robust to the scale changing in data according to [10];
(3) DSAwoGlobal and DSAwoLocal also suffer from performance loss but less than removing the AR component.This is because features learned by the two branches coincide. In other words, when one branch is removed, some of the lost features can be obtained from the other branch.