state-of-the-art ST-GNN models

Diffusion Convolutional Recurrent Neural Network (DCRNN)

[1] Li, Yaguang, et al. “Diffusion convolutional recurrent neural network: Data-driven traffic forecasting.” arXiv preprint arXiv:1707.01926 (2017).

provided code: https://github.com/liyaguang/DCRNN

Spatial Dependency Modeling

在这里插入图片描述

  • W ∈ R N × N W\in R^{N\times N} WRN×N: weighted adjacency matrix representing the nodes proximity

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

Temporal Dynamic Modeling: Diffusion Convolutional Gated Recurrent Unit

在这里插入图片描述

comparison: classical GRU,where r t r_t rt for reset gate and z t z_t zt for update gate:

System architecture of DCRNN

  • The historical time series are fed into an encoder whose final states are used to initialize the decoder.

  • To mitigate the discrepancy between input distribution of training and testing, integrate scheduled sampling (reference bloghttp://www.manongjc.com/article/92160.html) into the model.

    • Scheduled sampling:
      • where we feed the model with either the ground truth observation with probability ϵ i \epsilon_i ϵior the prediction by the model with probability 1 − ϵ i \epsilon_i ϵi at the ith iteration. During the training process, ϵ i \epsilon_i ϵi gradually decreases to 0 to allow the model to learn the testing distribution.
      • The decoder makes predictions based on either previous ground truth or the model output.
  • defect:computatinoal complexty and gradient explosion

Spatio-Temporal Graph Convolutional Networks (STGCN)

[2] Yu, Bing, Haoteng Yin, and Zhanxing Zhu. “Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting.” arXiv preprint arXiv:1709.04875 (2017).

provided code: https://github.com/VeritasYin/STGCN_IJCAI-18

Graph CNNs for Extracting Spatial Features

在这里插入图片描述

  • employ spectral-based CNN which utilize graph-structured data directly to extract highly meaningful patterns and features in space domain.

  • Consider two types of approximation to dwindle computation complexity

    • Chebyshev polynomials approximation
    • 1 s t 1^{st} 1st-order approximation

Gated CNNs for Extracting Temporal Features

drawback of RNN-based models: Recurrent networks for traffic prediction still suffer from time-consuming iterations, complex gate mechanisms, and slow response to dynamic changes.

benefits of CNN-based models: CNNs have the superiority of fast training, simple structures, and no dependency constraints to previous steps

在这里插入图片描述

  • For every node, have a sequence of length M

  • input: length-M sequence with C i C_i Ci input channels as X ∈ R M × C i X\in R^{M\times C_i} XRM×Ci

  • kernal: Γ ∈ R K t × C i × 2 C o \Gamma\in R^{K_t\times C_i\times 2C_o} ΓRKt×Ci×2Co

  • 1-D Conv: Y = Γ ∗ X ∈ R ( M − K t + 1 ) × ( 2 C o ) = [ P   Q ] , P , Q ∈ R ( M − K t + 1 ) × C o Y=\Gamma *X\in R^{(M-K_t+1)\times (2C_o)}=[P \ Q],P,Q\in R^{(M-K_t+1)\times C_o} Y=ΓXR(MKt+1)×(2Co)=[P Q],P,QR(MKt+1)×Co

  • GLU: Z = P ⊙ σ ( Q ) ∈ R ( M − K t + 1 ) × C o Z=P\odot \sigma(Q)\in R^{(M-K_t+1)\times C_o} Z=Pσ(Q)R(MKt+1)×Co

  • temporal gated convolution defined as:

在这里插入图片描述

  • The sigmoid gate σ(Q) controls which input P of the current states are relevant for discovering compositional structure and dynamic variances in time series.

Spatio-temporal Convolution Block

在这里插入图片描述

Overall framework

在这里插入图片描述

Graph WaveNet

[3]Wu, Zonghan, et al. “Graph wavenet for deep spatial-temporal graph modeling.” arXiv preprint arXiv:1906.00121 (2019).

provided code:https://github.com/nnzhan/Graph-WaveNet

Background

shortcomings of current study of GNN

  • assume the structure of data reflects the genuine dependency relationships among nodes
  • lack computation-efficient way of learning the long-term dependency
    • RNN: suffer from time-consuming iterative propagation and gradient explosion/vanishing for capturing longrange sequences --> self-adaptive graph learning
    • CNN: need to use many layers in order to capture very long sequences because they adopt standard 1D convolution whose receptive field size grows linearly with an increase in the number of hidden layers. --> dilated causal convolution

Framework

在这里插入图片描述

  • TCN: use dilated causal convolution as the temporal convolution layer(TCN)

    • Dilated causal convolution networks allow an exponentially large receptive field by increasing the layer depth. As opposed to RNN-based approaches, dilated casual convolution networks are able to handle long range sequences properly in a non-recursive manner, which facilitates parallel computation and alleviates the gradient explosion problem.

    • filter f ∈ R K f\in R^K fRK,d: dilation factor

    • x ∗ f ( t ) = ∑ s = 0 K − 1 f ( s ) x ( t − d × s ) x*f(t)=\sum_{s=0}^{K-1}f(s)x(t-d\times s) xf(t)=s=0K1f(s)x(td×s)

      在这里插入图片描述

  • Gated TCN: a gate to control the ratio of information passed to the next layer

    • h = g ( Θ 1 ∗ X + b ) ⊙ σ ( Θ 2 ∗ X + c ) h=g(\Theta_1 *X+b)\odot \sigma(\Theta_2*X+c) h=g(Θ1X+b)σ(Θ2X+c)

    • g(·) is an activation function of the outputs, and σ(·) is the sigmoid function which determines the ratio of information passed to the next layer.

  • GCN: propose self-adaptive adjacency matrix which does not require any prior knowledge and is learned end-to-end through stochastic gradient descent

    • A a d p ~ = S o f t M a x ( R e L U ( E 1 E 2 T ) ) \tilde{A_{adp}}=SoftMax(ReLU(E_1E_2^T)) Aadp~=SoftMax(ReLU(E1E2T))

    (shortcomings: not sparse; not uni-directional)

    • if no prior knowledge of the graph: Z = ∑ k = 0 K A a d p ~ X W k Z=\sum_{k=0}^K\tilde{A_{adp}}XW_k Z=k=0KAadp~XWk

      • otherwise combine diffusion convolution layer (P: A times the reverse of the diagonal degree matrix)

      在这里插入图片描述

(shortcomings: hard to select parameter K comsidering problem of over-smoothing in deep GCN

  • skip connections: enable the Graph WaveNet to handle spatial dependencies at different temporal levels. For example, at the bottom layer, GCN receives short-term temporal information while at the top layer GCN tackles long-term temporal information.

MTGNN

[4]Wu, Zonghan, et al. “Connecting the dots: Multivariate time series forecasting with graph neural networks.” Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. 2020.

provided code: https://github.com/nnzhan/MTGNN

Graph Learning Layer

  • expect the change of a node’s condition casuses the change of another node’s condition, thus the learned relation is supposed to be uni-directional

在这里插入图片描述

  • R e L U ReLU ReLU+subtraction --> if A v u A_{vu} Avu is positive, its diagonal counterpart A u v A_{uv} Auv will be zero --> uni-directional
  • select the top-k closest nodes as its neighbors --> ensure sparsity of A A A
  • If external features on nodes exist, set E 1 = E 2 = Z E_1=E_2=Z E1=E2=Z ,where Z is a static node feature matrix.
  • why not consider capturing dynamic spatial dependency, thus dynamically adjust the weight of 2 connected nodes based on temporal inputs? --> hard to converge for structure of graph are to learn at the same time.

Graph Convolution Module

在这里插入图片描述

  • 2 mix-hop propagation layer to handle inflow and outflow information passed through each node sepqrately
    • Mix-hop Propagation Layer: consist of the information propagation step and the information selection step
      • information propagation step: H k = β H i n + ( 1 − β ) A ~ H k − 1 H^{k}=\beta H_{in}+(1-\beta)\tilde{A}H^{k-1} Hk=βHin+(1β)A~Hk1, β \beta β: hyper parameter to control the ratio of retaining the root nodes’ original states, A ~ = D ~ − 1 ( A + I ) \tilde{A}=\tilde{D}^{-1}(A+I) A~=D~1(A+I)
      • information selection step: H o u t = ∑ i = 0 K H ( k ) W ( k ) H_{out}=\sum_{i=0}^{K}H^{(k)}W^{(k)} Hout=i=0KH(k)W(k), W ( k ) W^{(k)} W(k): a feature selector --> filter out important information produced at each loop
      • intuitive of information propagation step: over-smoothing problem for deep GCN --> node hidden states converge to a single point as the number of graph convolution layers goes to infinity;

在这里插入图片描述

Temporal Convolution Module

在这里插入图片描述

  • dilated inception layers:
    • discover temporal patterns with various ranges and handle very long sequences.
    • hard to choose the right kernal size --> too large ro represent short-term patterns or too small to discover long term signal pattern
      • inception strategy: As temporal signals tend to have several inherent periods such as 7, 12, 24, 28, and 60, alternatively, we propose a temporal inception layer consisting of four filter sizes, viz. 1 × 2, 1 × 3, 1 × 6, and 1 × 7.
    • dilated convolution to process very long sequences
      在这里插入图片描述

Overall framework

在这里插入图片描述

Models Comparison

[5] Bui, Khac-Hoai Nam, Jiho Cho, and Hongsuk Yi. “Spatial-temporal graph neural network for traffic forecasting: An overview and open research issues.” Applied Intelligence 52.3 (2022): 2763-2774.

– Currently, there is no model that can achieve the best results for all cases of multi-step prediction. For instance, ST-GCN, by using the fully GCN for extracting both spatial and temporal features, is able to significantly improve the computational time, however, the accuracy of this approach is generally lower than other approaches.

– For extracting spatial feature, spatial-based methods (DCRNN, GraphWN) are slightly better than spectralbased methods (STGCN) for extracting spatial dependencies, especially in the case of large road network (the number of nodes/sensors). Moreover, using RNNbased method is more effective than a 1D CNN in term of extracting temporal feature, especially in the case of short-term prediction. However, we are able to improve this approach by using dilated convolution with multiple filter sizes (GraphWN, MTGNN).

– Using GAT-based methods (GMAN) can perform better than GCN-based methods (STGCN, DCRNN, GraphWN) in extracting spatial correlations, especially for long-time prediction. The main reason is that using the attention mechanism is able to capture the complex dynamic of traffic flow, which changes significantly over time. However, the drawback of GMAN model is the high computational time for training, which require a large number of hyperparameters.

– The comparable results of adaptive learning-based approach (MTGNN, an improved model of GraphWN by adding a self-adaptive learning layer) with GATbased approach (GMAN) indicate the importance of learning the dynamics of traffic dataset for improving the performance, which is a potential research issue of forecasting problem using GNN.
with GATbased approach (GMAN) indicate the importance of learning the dynamics of traffic dataset for improving the performance, which is a potential research issue of forecasting problem using GNN.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值