[GNN4TRAFFIC]2018 - T-GCN- A Temporal Graph Convolutional Network for Traffic Prediction
Background
Considering the Spatial dependency in non-euclidean structure
Datasets:
the taxi speed data of the Luohu District in
- Shenzhen datasets
- Los-loop datasets
Graph construction:
-
The Road graph G: an unweighted graph G = ( V , E ) G = (V,E) G=(V,E), V V V is a set of road nodes, V = v 1 , v 2 , ⋅ ⋅ ⋅ , v N V = {v1,v2,···,vN} V=v1,v2,⋅⋅⋅,vN, N N N is the number of the nodes, and E E E is a set of edges.
-
The adjacency matrix A A A: represent the connection between roads.(0/1)
-
The feature matrix X : X ∈ R N × P X: X ∈ R^{N ×P} X:X∈RN×P represents the number of node attribute features (the length of the historical time series) and X t ∈ R N × i X_t ∈ R^{N×i} Xt∈RN×i is used to represent the speed on each road at time i i i
Problem Definition:
[ X t + 1 , ⋅ ⋅ ⋅ , X t + T ] = f ( G ; ( X t − n , ⋅ ⋅ ⋅ , X t − 1 , X t ) ) [X_{t+1},···,X_{t+T}]=f(G;(X_{t−n},···,X_{t−1},X_t)) [Xt+1,⋅⋅⋅,Xt+T]=f(G;(Xt−n,⋅⋅⋅,Xt−1,Xt))
Model:
-
Statistic spatial feature -> 2-LAYER GCN:
f ( X , A ) = σ ( A ^ R e l u ( A ^ X W 0 ) W 1 ) f(X, A) = \sigma(\hat{A}Relu(\hat{A}XW_0)W_1) f(X,A)=σ(A^Relu(A^XW0)W1)
where, A ^ = D ~ − 1 2 A ~ D ~ − 1 2 \hat{A} = {\tilde{D}^{-\frac{1}{2}}}\tilde{A}{\tilde{D}^{-\frac{1}{2}}} A^=D~−21A~D~−21 denotes preprocessing step, A ~ = A + I N \tilde{A} = A + I_N A~=A+IN is a matrix with self-connection structure , D ~ = ∑ j A ~ i j \tilde{D} = {\sum}_j\tilde{A}_{ij} D~=∑jA~ij is a degree matrix, W 0 W_0 W0 and W 1 W_1 W1 represent the weight matrix in the first and second layer, and σ ( ⋅ ) σ(·) σ(⋅), R e l u ( ) Relu() Relu() represent the activation function
-
GRU
u_t = \sigma(W_u[f(A, X_t), h_{t-1} + b_u)
r_t = \sigma(W_r[f(A, X_t), h_{t-1} + b_r)
c = tanh(W_c[f(A, X_t), (r_t * h_{t-1})]) + b_c)
h_t = u_t *h_{t-1} + (1-u_t)*c_t -
Loss Function :
l o s s = ∥ Y t − Y ^ t ∥ + λ L r e g loss = \lVert Y_t - \hat{Y}_t \rVert + \lambda L_{reg} loss=∥Yt−Y^t∥+λLreg
where L r e g L_{reg} Lreg is an L 2 L2 L2 regularization term that helps to avoid an over fitting problem and λ \lambda λ is a hyperparameter
Model Parameters:
- Learning rate:0.001
- Batch size:64
- Training epoch: 3000
- Optimizer:the Adam optimizer.
- The number of hidden layers: select from [8, 16, 32, 64, 100, 128], comparing different evaluation matrix based on different hidden layers, and then select the best one
Perturbation Analysis and Robustness:
- Add two types of commonly random noise to the data during the experiment
- The random noise obeys the Gaussian distribution N ∈ ( 0 , σ 2 ) ( σ ∈ ( 0.2 , 0.4 , 0.8 , 1 , 2 ) ) N ∈ (0, σ2)(σ ∈ (0.2, 0.4, 0.8, 1, 2)) N∈(0,σ2)(σ∈(0.2,0.4,0.8,1,2)) & the Poisson distribution P ( λ ) ( λ ∈ ( 1 , 2 , 4 , 8 , 16 ) ) P (\lambda)(λ ∈ (1, 2, 4, 8, 16)) P(λ)(λ∈(1,2,4,8,16)) and then normalized the values of the noise matrices turn to [0, 1]
Further discuss:
- Predict poorly at the peak
- Certain errors between the real traffic information and the prediction results: no record when no taxis on the roads
- But it can detect the start and end of the rush hour and
make prediction results with similar pattern with the real traffic speed
2018 - T-GCN- ATemporal Graph Convolutional Network for Traffic Prediction1