【论文阅读】A Hybrid Model Integrating Local and Global...Traffic Prediction[一种融合局部和全局空间相关性的交通预测混合模型]（3）

炎武丶航

已于 2022-09-14 21:01:56 修改

阅读量1k

点赞数

分类专栏：深度学习图神经网络交通预测文章标签：深度学习图神经网络交通预测

于 2022-09-14 21:01:16 首次发布

原文链接：https://ieeexplore.ieee.org/abstract/document/9667362

版权

深度学习同时被 3 个专栏收录

125 篇文章 57 订阅

订阅专栏

图神经网络

52 篇文章 26 订阅

订阅专栏

交通预测

14 篇文章 1 订阅

订阅专栏

【论文阅读】A Hybrid Model Integrating Local and Global Spatial Correlation for Traffic Prediction[一种融合局部和全局空间相关性的交通预测混合模型]（3）

3. Methodology（方法）
参考文献

原文地址：https://ieeexplore.ieee.org/abstract/document/9667362

3. Methodology（方法）

A. Problem Definition（问题定义）

The relevant definitions covered in this paper are as follows:
本文涉及的相关定义如下:

Definition 1: Geographic adjacency matrix $A∈R^{N×N}$ , where $A$ is composed of $a_{ij}$ , $N$ is the number of sensors. We use the distance between sensors to calculate the value of $a_{ij}$ by the threshold Gaussian kernel [46], which is calculated as follows:
定义1: 地理邻接矩阵 $A∈R^{N×N}$ ，其中 $A$ 由 $a_{ij}$ 组成， $N$ 为传感器个数。我们利用传感器之间的距离，通过阈值高斯核[46]计算 $a_{ij}$ 的值，计算结果如下:
$a_{ij}=\begin{cases}e^{-\frac{{dist(i,j)}^2}{σ^2}},&dist(i,j)<λ\\0,&dist(i,j)≥λ\end{cases} \tag{1}$
where $a_{ij}$ represents the adjacent weight between sensor $i$ and sensor $j$ , $d i s t (i, j)$ is the distance between sensor $i$ and sensor $j$ , $σ^2$ is the variance, and $λ$ is the threshold.
其中 $a_{ij}$ 表示传感器 $i$ 与传感器 $j$ 之间的相邻权值， $d i s t (i, j)$ 为传感器 $i$ 与传感器 $j$ 之间的距离， $σ^2$ 为方差， $λ$ 为阈值。

Definition 2: Global correlation matrix $C∈R^{N×N}$ , where $C_{ij}$ stands for the spatial influence relationship of sensor $i$ on $j$ . This matrix describes the correlation between sensors within the research network. If the value is not $0$ , the value indicates the degree of spatial correlation of sensor $i$ on $j$ .
定义2: 全局相关矩阵 $C∈R^{N×N}$ ，其中 $C_{ij}$ 表示传感器 $i$ 对 $j$ 的空间影响关系，该矩阵描述了研究网络内传感器之间的相关性。如果该值不为 $0$ ，则表示传感器 $i$ 对 $j$ 的空间相关性程度。

Definition 3: Spatial-temporal feature matrix $X_T^S∈R^{T×S}$ , where $T$ is the total time steps, $S$ is the total number of sensors. The spatial-temporal matrix is constructed using the full amount of feature data of the traffic flow in the network to be studied. Each column represents a sensor in the road network, and the rows represent the values of one-time slice for each sensor. The structure is as follows:
定义3: 时空特征矩阵 $X_T^S∈R^{T×S}$ ，其中 $T$ 为总时间步长， $S$ 为传感器总数。利用待研究网络中交通流的全部特征数据构造时空矩阵。每列表示路网中的一个传感器，行表示每个传感器的一次性切片值。结构如下:
$X_T^S=\begin{bmatrix}x_1^1&x_1^2&⋯&x_1^s\\x_2^1&x_2^2&⋯&x_2^s\\⋮&⋮&⋯&⋮\\x_t^1&x_t^2&⋯&x_t^s\end{bmatrix} \tag{2}$
where $S=\{1,2,...,s\}$ represents the set of sensors. $T=\{1,2,...,t\}$ denotes the set of time steps. And $X_T^S$ is the spatial-temporal matrix, which contains $s$ sensors and the time step is $t$ . $X_T^S$ is the value of sensor numbered $s$ at the moment $t$ .
其中 $S=\{1,2,...,s\}$ 表示传感器集合。 $T=\{1,2,...,t\}$ 为时间步长集合。 $X_T^S$ 为时空矩阵，其中包含 $s$ 个传感器，时间步长为 $t$ 。 $X_t^s$ 是编号为 $s$ 的传感器在 $t$ 时刻的值。

Therefore, based on the above definitions, assuming that the current moment is t, our prediction task is to forecast the traffic flow in the future period (i.e., $y_{t+1}$ ) based on historical traffic flow data (i.e., $X_T^S$ ), geographic adjacency matrix (i.e., $A$ ) and global correlation matrix (i.e., $C$ ):
因此，基于上述定义，假设当前时刻为 $t$ ，我们的预测任务是根据历史交通流数据(即 $X_T^S$ )、地理邻接矩阵(即 $A$ )和全局相关矩阵(即 $C$ )，预测未来时段(即 $y_{t+1}$ )的交通流量:
$y_{t+1}=f(X_{t-T}^S,…,X_{t-1}^S,X_t^S;A;C) \tag{3}$
where $T$ represents the time step of the input and $f$ is the model of this paper.
其中 $T$ 为输入的时间步长， $f$ 为本文的模型。

B. Model OverView（模型概述）

Fig. 1 shows the structure of the model proposed in this paper. The model is composed of two components for modeling the local spatial-temporal correlation and global spatial-temporal correlation. The global spatial-temporal component consists of a global graph convolution and a GRU network. We first establish the global correlation degree matrix and use the global graph convolution to extract the global spatial features. Then the GRU is combined to capture the global spatial- temporal correlation of traffic flow. Besides, the local spatial- temporal component is stacked by a layer of FCL, GCN, and GRU. The first layer of FCL is used to extract the nodes’ own features, and the output is utilized to extract the local spatial correlation by the GCN. The output of these two layers is fused and input to GRU to obtain the local spatial-temporal correlation of traffic flow. Finally, the output of the two components is summed, and we use the dense layer to control the output steps in order to obtain the prediction results. We will describe each module in the following subsections.
本文提出的模型结构如图1所示。该模型由局部时空相关性和全局时空相关性两部分组成。全局时空分量由全局图卷积和GRU网络组成。首先建立全局关联度矩阵，利用全局图卷积提取全局空间特征;然后，结合GRU，对交通流进行全局时空相关性分析。此外，局部时空分量由FCL、GCN和GRU三层叠加。FCL的第一层提取节点自身的特征，输出利用GCN提取局部空间相关性。将这两层的输出融合输入GRU，得到局部交通流的时空相关性。最后，将两个分量的输出相加，利用稠密层来控制输出步骤，从而得到预测结果。我们将在下面的小节中描述每个模块。

FIGURE 1. The T-LGGCN structure which is composed of two parts: Global spatial-temporal component and local spatial-temporal component. The feature matrix is used to calculate the correlation matrix. And the error between input feature matrix and output feature matrix is adjusted by the loss function.
图1. T-LGGCN结构由全局时空分量和局部时空分量两部分组成。利用特征矩阵计算相关矩阵。并利用损失函数调整输入特征矩阵与输出特征矩阵之间的误差。

C. Spatial Correlation（空间相关性）

GCN is a popular neural network for processing spatial correlation. The topology of the road network is represented as an undirected graph $G \in (V, E)$ , where $V$ is the set of nodes representing the sensors, and $E$ is the set of edges, denoting the adjacency of the sensors. Convert the adjacency of the graph $G$ into the adjacent matrix by the method as defined in Definition 1, then the propagation rule of GCN is as follows:
GCN是一种常用的处理空间相关性的神经网络。路网的拓扑结构表示为无向图 $G \in (V, E)$ ，其中 $V$ 为节点集，表示各传感器的邻接关系， $E$ 为边集。按照 定义1 中定义的方法将图 $G$ 的邻接性转换为邻接矩阵，则GCN的传播规则如下:
$H^{(l+1)}=σ(\hat{A}H^{(l)}W^{(l)}) \tag{4}$
where $\hat A=\tilde D^{-1/2}\tilde A\tilde D^{-1/2}$ , $\tilde A=I+A$ , $I∈R^{N×N}$ is the identity matrix, $A∈R^{N×N}$ represents the adjacency matrix of the graph $G$ , $\tilde D_{i,i}=∑_j\tilde A_{i,j}$ , $H^{(l)}$ denotes the output of the $l$ th layer, $H^{(0)}=X_T^R$ , $W^{(l)}$ is the weight parameter matrix of the $l$ th layer.
式中， $\hat A=\tilde D^{-1/2}\tilde A\tilde D^{-1/2}$ , $\tilde A=I+A$ , $I∈R^{N×N}$ 为单位矩阵， $A∈R^{N×N}$ 为图 $G$ 的邻接矩阵， $\tilde D_{i,i}=∑_j\tilde A_{i,j}$ ， $H^{(l)}$ 为第 $l$ 层的输出， $W^{(l)}$ 为第 $l$ 层的权值参数矩阵。

The adjacency matrix represents the geographical structure of the real road network. Equation 4 uses $\hat A$ to fuse the features of adjacent sensors so that sensors can obtain the new feature representation. However, considering the geographic conditions of the highway, such as freeway hubs or ramps, the spatial correlation between some neighboring sensors will be weaker than non-adjacent sensors. Although the adjacency matrix can show the intuitive sensors’ adjacency, it does not express the internal spatial influence. Therefore, we analyze the spatial correlation from two perspectives, global spatial correlation, and local spatial correlation, respectively.
邻接矩阵表示真实路网的地理结构。式4 利用 $\hat A$ 融合相邻传感器的特征，使传感器得到新的特征表示。然而，考虑到高速公路的地理条件，如高速公路枢纽或匝道，一些相邻传感器之间的空间相关性会弱于非相邻传感器。虽然邻接矩阵可以直观地显示传感器的邻接关系，但它并不能表达内部的空间影响。因此，我们分别从全局空间相关性和局部空间相关性两个角度进行分析。

1) Local Spatial Correlation Component（局部空间相关分量）

It can be found that the more adjacent nodes, the smaller the weight of its node will be considered in the feature aggregation process during the convolution operation of equation 4. Therefore, in order to mine the spatial characteristics of the nodes themselves, we use the Approximate personalized propagation of neural predictions (APPNP) model to mine the local spatial correlation of traffic flow. APPNP [47] utilizes PageRank for node feature propagation, using PageRank to encode features for each root node and increase the chance of transmission back to the root node. In this way, the model can balance the need of retaining local features and mining neighborhood features. The model calculation rules are as follows:
可以发现，在对公式4 进行卷积运算时，相邻节点越多，在特征聚合过程中考虑其节点的权值就越小。因此，为了挖掘节点本身的空间特征，我们使用近似的个性化传播神经预测(APPNP)模型来挖掘交通流的局部空间相关性。APPNP [47] 利用PageRank进行节点特征传播，使用PageRank对每个根节点的特征进行编码，并增加传输回根节点的机会。这样，该模型可以在保留局部特征和挖掘邻域特征的需求之间取得平衡。模型计算规则如下:
$\begin{alignedat}{2}Z^{(0)}&=H=f_θ(X)\\Z^{(k+1)}&=(1-α)\hat AZ^{(k)}+αH\end{alignedat}\tag{5}$

where $X$ represents the input of the nodes, $f_θ$ denotes a neural network. We use $f_θ$ to extract each sensor’s self-features, $α$ to represent the percentage of self-features. As shown in Fig. 2, we mine the spatial correlation of the nodes themselves and their first-order neighbors, so we set $k$ to $1$ . The local spatial correlation mining formulas are as follows:
其中 $X$ 为节点的输入， $f_θ$ 为神经网络。利用 f_θ 提取各传感器的自特征， $α$ 表示自特征所占的百分比。如图2所示，我们挖掘节点本身及其一阶邻居的空间相关性，因此我们设 $k$ 为 $1$ 。局部空间相关性挖掘公式如下:
$\begin{alignedat}{2}Z^{(0)}&=W_L^{(1)}X+b_L^{(1)}\\LGCN(X,A)&=σ((1-α)\hat AZ^{(0)}+αZ^{(0)})\end{alignedat}\tag{6}$

We use the FCL to extract node features. $W_L^{(1)}$ represents the weight matrix of the FCL, and $b_L^{(1)}$ is the bias matrix. $LGCN(\cdot)$ denotes the output of local spatial correlation.
我们使用FCL来提取节点特征。 $W_L^{(1)}$ 为FCL的权重矩阵， $b_L^{(1)}$ 为偏置矩阵。 $LGCN(\cdot)$ 为局部空间相关性的输出。

FIGURE 2. Local spatial process. It consists of a fully connected layer and a GCN layer, which is used to capture the local spatial correlation.
图2. 局部的空间过程。它由全连通层和GCN层组成，GCN层用于捕获局部空间相关性

2) Global Spatial Correlation Component（全局空间相关分量）

Spatial correlation does not only exist between neighboring sensors. As far as the whole road network is concerned, spatial correlation exists between sensors separated by long distances. So, we implicitly express the global spatial correlation of the road network. For sensors data, we use the Pearson correlation coefficient method to analyze the correlation between sensors in the studied network. We set a correlation threshold $k$ to select high correlation sensors. If the correlation is greater than $k$ , the correlation value is kept; otherwise, it is set to $0$ . In this way, we construct the correlation matrix $C$ . And then, we use it to aggregate the highly correlated sensors’ features through the GCN convolution method.
空间相关性不仅存在于相邻传感器之间。就整个道路网络而言，距离较远的传感器之间存在空间相关性。因此，我们隐含地表达了道路网络的全局空间相关性。对于传感器数据，我们使用Pearson相关系数法来分析所研究网络中传感器之间的相关性。我们通过设置相关阈值 $k$ 来选择高相关传感器。如果相关性大于 $k$ ，则保持相关值; 否则，设置为 $0$ 。这样，我们构造了相关矩阵 $C$ ，然后利用它通过GCN卷积方法对高度相关的传感器的特征进行聚合。

The correlation between two sensors is analyzed through the Pearson correlation coefficient method, which is calculated as follows:
通过Pearson相关系数法分析两个传感器之间的相关性，计算结果如下:
$C_{ij}=\frac{∑_{t=1}^T(x_t^i-\bar X_i)(x_t^j-\bar X_j)}{\sqrt{∑_{t=1}^T(x_t^i-\bar X_i)^2}\sqrt{∑_{t=1}^T(x_t^j-\bar X_j)^2}} \tag{7}$
where $X_i=(x_1^i,x_2^i,…,x_t^i)$ represents the feature of traffic flow of sensor $i$ , $\bar X_i$ is the mean value of $X_i$ . Similarly, $X_j=(x_1^j,x_2^j,…,x_t^j)$ represents the feature of traffic flow of sensor $j$ , $\bar X_j$ is the mean value of $X_j$ .
其中 $X_i=(x_1^i,x_2^i,…,x_t^i)$ 表示传感器 $i$ 的交通流量特征， $\bar X_i$ 为 $X_i$ 的均值。同理，X_j=(x_1^j,x_2j,…,x_t^j) 表示传感器 $j$ 的交通流量特征， $\bar X_j$ 为 $X_j$ 的均值。

The node relationship described by the correlation matrix is a directed weighted graph, as shown in Fig. 3. The connections between nodes represent the influence weights, and the directions are the influence relationships. By the convolution of the correlation matrix and the feature matrix, the high correlation node features can be aggregated, which can deeply mine global spatial correlations. Therefore, the calculation rule of the global graph convolutional network based on the correlation matrix used in this paper is updated as follows.
相关矩阵所描述的节点关系为有向加权图，如图3所示。节点之间的连接表示影响权重，方向表示影响关系。通过对相关矩阵和特征矩阵进行卷积，可以对高相关节点特征进行聚合，从而对全局空间相关性进行深度挖掘。因此，本文使用的基于相关矩阵的全局图卷积网络的计算规则更新如下:
$GGCN(X,C)=σ(CXW_G^{(1)}) \tag{8}$
where $W_G^{(1)}$ represents the weight matrix of the global graph convolutional network, $GGCN(\cdot)$ is the output of the global spatial correlation.
其中 $W_G^{(1)}$ 为全局图卷积网络的权值矩阵， $GGCN(\cdot)$ 为全局空间相关的输出。

FIGURE 3. Global spatial process. It models the global spatial correlation between the distant roads. The process includes a correlation matrix construction and a GCN layer.
图3. 全局空间的过程。它模拟了遥远道路之间的全球空间相关性。该过程包括相关矩阵构造和GCN层

D. Temporal Correlation

GRU is a mainstream neural network that addresses time series prediction problems. It can avoid gradient explosion and disappearance of RNN. GRU contains three parts: the input layer, the hidden layer, and the output layer. The core algorithm lies in the computation process in the unit block of the hidden layer, as shown in Fig. 4.
GRU 是解决时间序列预测问题的主流神经网络。它可以避免 RNN 的梯度爆炸和消失。GRU 包含三个部分:输入层、隐藏层和输出层。算法的核心在于隐含层单元块的计算过程，如图4所示。

FIGURE 4. Gated Recurrent Unit Network. The unit combines values at this moment and the output of previous moment to capture the temporal correlation.
图4. 门控循环单元网络。该单元结合当前时刻的值和前一时刻的输出，以捕获时间相关性

The local and global spatial correlation output is input into GRU separately. Take the local spatial correlation output as an example, and the GRU calculation rules are as follows:
局部和全局空间相关输出分别输入GRU。以局部空间相关输出为例，GRU计算规则如下:
$\begin{alignedat}{4}&r_t^l=σ(W_r^l [LGCN(X,A),h_(t-1)^l ]+b_r^l) &(9)\\&z_t^l=σ(W_z^l [LGCN(X,A),h_(t-1)^l ]+b_z^l) &(10)\\ &\tilde h_t^l=tanh(W_{\tilde h}^l [LGCN(X,A),(r_t^l*h_{t-1}^l)]+b_{\tilde h}^l) &(11)\\&h_t^l=z_t^l*h_{t-1}^l+(1-z_t^l )*\tilde h_t^l &(12)\end{alignedat}$

where $r_t^l$ represents the reset gate of time $t$ , $W_r^l$ and $b_r^l$ are the weight matrix and bias matrix of the reset gate, respectively. $h_{t-1}^l$ is the output of the hidden layer at the previous moment.
其中 $r_t^l$ 为时间 $t$ 的复位门， $W_r^l$ 和 $b_r^l$ 分别为复位门的权矩阵和偏置矩阵。 $h_{t-1}^l$ 为隐藏层上一时刻的输出。

For a given time slice, the unit first concatenates the output $h_{t-1}^l$ of the hidden layer at the previous moment and the input $L GCN (X, A)$ at the current moment. And then, the data is transformed into $[0, 1]$ by the sigmoid function, which acts as the gate signals $r_t^l$ and $z_t^l$ . After that, the network uses the gate signal to selectively forget and save the information of $h_{t-1}^l$ and $L GCN (X, A)$ . In this way, GRU saves the traffic information of the previous moment and simultaneously combines the traffic context of the current moment, thus, achieves the temporal correlation.
对于给定的时间片，该单元首先将隐藏层上一时刻的输出 $h_{t-1}^l$ 与当前时刻的输入 $L GCN (X, A)$ 连接起来。然后通过 sigmoid 函数将数据变换为 $[0, 1]$ ，其作用为门信号 $r_t^l$ 和 $z_t^l$ 。然后，网络利用门信号选择性地忘记和保存 $h_{t-1}^l$ 和 $L GCN (X, A)$ 的信息。这样，GRU 既保存了前一时刻的交通信息，又同时结合了当前时刻的交通上下文，从而实现了时间相关性。

E. T-LGGCN Model（T-LGGCN模型）

To address the spatial-temporal correlation of traffic flow, especially the spatial correlation, we respectively construct the global spatial-temporal component and local spatial-temporal component to mine it, as shown in Fig. 1.
为了解决交通流的时空相关性，特别是空间相关性，我们分别构建全局时空分量和局部时空分量进行挖掘，如图1所示。

For the global spatial-temporal component, the global correlation matrix $C$ is first calculated using the full amount of feature data. And then, we feed the input $X_T^S$ and $C$ into the global spatial correlation component to obtain $GGCN(\cdot)$ , which is put into the GRU to extract the temporal correlation and get the output $h_t^g$ of this component.
对于全局时空分量，首先利用全量特征数据计算全局相关矩阵 $C$ 。然后将输入的 $X_T^S$ 和 $C$ 输入到全局空间相关分量中，得到 $GGCN(\cdot)$ ，将其输入到 GRU 中提取时间相关，得到该分量的输出 $h_t^g$ 。

Next, the local spatial-temporal component is calculated. We employ the fully connected layer to extract the node features of input $X_T^S$ and use the GCN to implement the aggregation and propagation of spatial features. The output $LGCN(\cdot)$ is directly input into GRU to obtain the output $h_t^l$ of this component.
接下来，计算局部时空分量。我们利用全连通层提取输入 $X_T^S$ 的节点特征，利用GCN实现空间特征的聚集和传播。输出 $LGCN(\cdot)$ 直接输入到 GRU 中，得到该组件的输出 $h_t^l$ 。

We use equation 13 to sum the output of the global spatial-temporal component and the local spatial-temporal component and input them into the Dense layer to output the prediction results.
我们使用公式13将全局时空分量和局部时空分量的输出相加，并将它们输入到稠密层，输出预测结果。
$y_{pre}=Dense(h_t^l+h_t^g) \tag{13}$
where $h_t^l$ is local spatial-temporal component output, $h_t^g$ is
其中 $h_t^l$ 为局部时空分量输出， $h_t^g$ 为全局时空分量输出。

In the training process of the model, we define the loss function of the model as follow:
在模型的训练过程中，我们定义模型的损失函数如下:
$Loss=\Vert y_{pre}-y_{true}\Vert+βL_{reg} \tag{14}$
where we introduce $\text{L2}$ regularization to avoid the overfitting problems. And $\beta$ is a hyperparameter, $y_{pre}$ and $y_{true}$ represent the predicted value of the model and the true value of the traffic flow, respectively.
我们引入 $\text{L2}$ 化来避免过拟合问题。其中 $\beta$ 是一个超参数， $y_{pre}$ 和 $y_{true}$ 分别代表模型的预测值和交通流的真实值。

Algorithm 1 outlines the training process of the T-LGGCN model. 算法1概述了T-LGGCN模型的训练过程。

参考文献

[46] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, ‘‘The emerging field of signal processing on graphs: Extending high- dimensional data analysis to networks and other irregular domains,’’ IEEE Signal Process. Mag., vol. 30, no. 3, pp. 83–98, May 2013, doi: 10.1109/MSP.2012.2235192.

[47] J. Klicpera, A. Bojchevski, and S. Günnemann, ‘‘Predict then propagate: Graph neural networks meet personalized PageRank,’’ in Proc. 7th Int. Conf. Learn. Represent. (ICLR), 2019, pp. 1–15.