state-of-the-art ST-GNN models

Cherry_csc

已于 2023-10-19 16:15:21 修改

阅读量114

点赞数

分类专栏：图神经网络文章标签：神经网络深度学习人工智能

于 2023-10-19 16:14:32 首次发布

本文链接：https://blog.csdn.net/Cherry_csc/article/details/133928989

版权

图神经网络专栏收录该内容

4 篇文章 1 订阅

订阅专栏

Diffusion Convolutional Recurrent Neural Network (DCRNN)

[1] Li, Yaguang, et al. “Diffusion convolutional recurrent neural network: Data-driven traffic forecasting.” arXiv preprint arXiv:1707.01926 (2017).

provided code: https://github.com/liyaguang/DCRNN

Spatial Dependency Modeling

在这里插入图片描述

$W\in R^{N\times N}$ : weighted adjacency matrix representing the nodes proximity

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

Temporal Dynamic Modeling: Diffusion Convolutional Gated Recurrent Unit

在这里插入图片描述

comparison: classical GRU,where $r_t$ for reset gate and $z_t$ for update gate:

System architecture of DCRNN

The historical time series are fed into an encoder whose final states are used to initialize the decoder.
To mitigate the discrepancy between input distribution of training and testing, integrate scheduled sampling (reference bloghttp://www.manongjc.com/article/92160.html) into the model.
- Scheduled sampling:
  - where we feed the model with either the ground truth observation with probability $\epsilon_i$ or the prediction by the model with probability 1 − $\epsilon_i$ at the ith iteration. During the training process, $\epsilon_i$ gradually decreases to 0 to allow the model to learn the testing distribution.
  - The decoder makes predictions based on either previous ground truth or the model output.
defect：computatinoal complexty and gradient explosion

Spatio-Temporal Graph Convolutional Networks (STGCN)

[2] Yu, Bing, Haoteng Yin, and Zhanxing Zhu. “Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting.” arXiv preprint arXiv:1709.04875 (2017).

provided code: https://github.com/VeritasYin/STGCN_IJCAI-18

Graph CNNs for Extracting Spatial Features

在这里插入图片描述

employ spectral-based CNN which utilize graph-structured data directly to extract highly meaningful patterns and features in space domain.
Consider two types of approximation to dwindle computation complexity
- Chebyshev polynomials approximation
- $1^{st}$ -order approximation

Gated CNNs for Extracting Temporal Features

drawback of RNN-based models: Recurrent networks for traffic prediction still suffer from time-consuming iterations, complex gate mechanisms, and slow response to dynamic changes.

benefits of CNN-based models: CNNs have the superiority of fast training, simple structures, and no dependency constraints to previous steps

在这里插入图片描述

For every node, have a sequence of length M
input: length-M sequence with $C_i$ input channels as $X\in R^{M\times C_i}$
kernal: $\Gamma\in R^{K_t\times C_i\times 2C_o}$
1-D Conv: $Y=\Gamma *X\in R^{(M-K_t+1)\times (2C_o)}=[P \ Q],P,Q\in R^{(M-K_t+1)\times C_o}$
GLU: $Z=P\odot \sigma(Q)\in R^{(M-K_t+1)\times C_o}$
temporal gated convolution defined as:

在这里插入图片描述

The sigmoid gate σ(Q) controls which input P of the current states are relevant for discovering compositional structure and dynamic variances in time series.

Spatio-temporal Convolution Block

在这里插入图片描述

Overall framework

在这里插入图片描述

Graph WaveNet

[3]Wu, Zonghan, et al. “Graph wavenet for deep spatial-temporal graph modeling.” arXiv preprint arXiv:1906.00121 (2019).

provided code:https://github.com/nnzhan/Graph-WaveNet

Background

shortcomings of current study of GNN

assume the structure of data reflects the genuine dependency relationships among nodes
lack computation-efficient way of learning the long-term dependency
- RNN: suffer from time-consuming iterative propagation and gradient explosion/vanishing for capturing longrange sequences --> self-adaptive graph learning
- CNN: need to use many layers in order to capture very long sequences because they adopt standard 1D convolution whose receptive field size grows linearly with an increase in the number of hidden layers. --> dilated causal convolution

Framework

在这里插入图片描述

TCN: use dilated causal convolution as the temporal convolution layer(TCN)
- Dilated causal convolution networks allow an exponentially large receptive field by increasing the layer depth. As opposed to RNN-based approaches, dilated casual convolution networks are able to handle long range sequences properly in a non-recursive manner, which facilitates parallel computation and alleviates the gradient explosion problem.
- filter $f\in R^K$ ,d: dilation factor
- $x*f(t)=\sum_{s=0}^{K-1}f(s)x(t-d\times s)$
Gated TCN: a gate to control the ratio of information passed to the next layer
- $h=g(\Theta_1 *X+b)\odot \sigma(\Theta_2*X+c)$
- g(·) is an activation function of the outputs, and σ(·) is the sigmoid function which determines the ratio of information passed to the next layer.
GCN: propose self-adaptive adjacency matrix which does not require any prior knowledge and is learned end-to-end through stochastic gradient descent
- $\tilde{A_{adp}}=SoftMax(ReLU(E_1E_2^T))$
(shortcomings: not sparse; not uni-directional)
- if no prior knowledge of the graph: $Z=\sum_{k=0}^K\tilde{A_{adp}}XW_k$
  - otherwise combine diffusion convolution layer (P: A times the reverse of the diagonal degree matrix)

(shortcomings: hard to select parameter K comsidering problem of over-smoothing in deep GCN

skip connections: enable the Graph WaveNet to handle spatial dependencies at different temporal levels. For example, at the bottom layer, GCN receives short-term temporal information while at the top layer GCN tackles long-term temporal information.

MTGNN

[4]Wu, Zonghan, et al. “Connecting the dots: Multivariate time series forecasting with graph neural networks.” Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. 2020.

provided code: https://github.com/nnzhan/MTGNN

Graph Learning Layer

expect the change of a node’s condition casuses the change of another node’s condition, thus the learned relation is supposed to be uni-directional

在这里插入图片描述

$R e LU$ +subtraction --> if $A_{vu}$ is positive, its diagonal counterpart $A_{uv}$ will be zero --> uni-directional
select the top-k closest nodes as its neighbors --> ensure sparsity of $A$
If external features on nodes exist, set $E_1=E_2=Z$ ,where Z is a static node feature matrix.
why not consider capturing dynamic spatial dependency, thus dynamically adjust the weight of 2 connected nodes based on temporal inputs? --> hard to converge for structure of graph are to learn at the same time.

Graph Convolution Module

在这里插入图片描述

2 mix-hop propagation layer to handle inflow and outflow information passed through each node sepqrately
- Mix-hop Propagation Layer: consist of the information propagation step and the information selection step
  - information propagation step: $H^{k}=\beta H_{in}+(1-\beta)\tilde{A}H^{k-1}$ , $\beta$ : hyper parameter to control the ratio of retaining the root nodes’ original states, $\tilde{A}=\tilde{D}^{-1}(A+I)$
  - information selection step: $H_{out}=\sum_{i=0}^{K}H^{(k)}W^{(k)}$ , $W^{(k)}$ : a feature selector --> filter out important information produced at each loop
  - intuitive of information propagation step: over-smoothing problem for deep GCN --> node hidden states converge to a single point as the number of graph convolution layers goes to infinity;

在这里插入图片描述

Temporal Convolution Module

在这里插入图片描述

dilated inception layers:
- discover temporal patterns with various ranges and handle very long sequences.
- hard to choose the right kernal size --> too large ro represent short-term patterns or too small to discover long term signal pattern
  - inception strategy: As temporal signals tend to have several inherent periods such as 7, 12, 24, 28, and 60, alternatively, we propose a temporal inception layer consisting of four filter sizes, viz. 1 × 2, 1 × 3, 1 × 6, and 1 × 7.
- dilated convolution to process very long sequences

Overall framework

在这里插入图片描述

Models Comparison

[5] Bui, Khac-Hoai Nam, Jiho Cho, and Hongsuk Yi. “Spatial-temporal graph neural network for traffic forecasting: An overview and open research issues.” Applied Intelligence 52.3 (2022): 2763-2774.

– Currently, there is no model that can achieve the best results for all cases of multi-step prediction. For instance, ST-GCN, by using the fully GCN for extracting both spatial and temporal features, is able to significantly improve the computational time, however, the accuracy of this approach is generally lower than other approaches.

– For extracting spatial feature, spatial-based methods (DCRNN, GraphWN) are slightly better than spectralbased methods (STGCN) for extracting spatial dependencies, especially in the case of large road network (the number of nodes/sensors). Moreover, using RNNbased method is more effective than a 1D CNN in term of extracting temporal feature, especially in the case of short-term prediction. However, we are able to improve this approach by using dilated convolution with multiple filter sizes (GraphWN, MTGNN).

– Using GAT-based methods (GMAN) can perform better than GCN-based methods (STGCN, DCRNN, GraphWN) in extracting spatial correlations, especially for long-time prediction. The main reason is that using the attention mechanism is able to capture the complex dynamic of traffic flow, which changes significantly over time. However, the drawback of GMAN model is the high computational time for training, which require a large number of hyperparameters.

– The comparable results of adaptive learning-based approach (MTGNN, an improved model of GraphWN by adding a self-adaptive learning layer) with GATbased approach (GMAN) indicate the importance of learning the dynamics of traffic dataset for improving the performance, which is a potential research issue of forecasting problem using GNN.
with GATbased approach (GMAN) indicate the importance of learning the dynamics of traffic dataset for improving the performance, which is a potential research issue of forecasting problem using GNN.