【论文阅读】Attention Based Spatial-Temporal GCN...Traffic Flow Forecasting[基于注意力的时空图卷积网络交通流预测](4)

【论文阅读】Attention Based Spatial-Temporal Graph Convolutional Networks for Traffic Flow Forecasting[基于注意力的时空图卷积网络交通流预测](4)

原文地址:https://ojs.aaai.org/index.php/AAAI/article/view/3881
原文代码地址:https://github.com/ wanhuaiyu/ASTGCN


5. Experiments(实验)

In order to evaluate the performance of our model, we carried out comparative experiments on two real-world high-way traffic datasets.
为了评价我们的模型的性能,我们在两个真实的高速公路交通数据集上进行了比较实验。

Datasets(数据集)

We validate our model on two highway traffic datasets PeMSD4 and PeMSD8 from California. The datasets are collected by the Caltrans Performance Measurement System (PeMS) (Chen et al. 2001) in real time every 30 seconds. The traffic data are aggregated into every 5-minute interval from the raw data. The system has more than 39,000 detectors deployed on the highway in the major metropolitan areas in California. Geographic information about the sensor stations are recorded in the datasets. There are three kinds of traffic measurements considered in our experiments, including total flow, average speed, and average occupancy.
我们在加利福尼亚的两个高速公路交通数据集PeMSD4和PeMSD8上验证了我们的模型。数据集由Caltrans性能测量系统(PeMS) (Chen et al. 2001) 每30秒实时收集一次。流量数据被聚合到原始数据的每5分钟间隔中。该系统在加州主要城市地区的高速公路上部署了超过39000个探测器。数据集中记录了传感器站的地理信息。在我们的实验中考虑了三种交通测量,包括总流量、平均速度和平均采样率。

PeMSD4 It refers to the traffic data in San Francisco Bay Area, containing 3848 detectors on 29 roads. The time span of this dataset is from January to February in 2018. We choose data on the first 50 days as the training set, and the remains as the test set.
PeMSD4 它是指旧金山湾区的交通数据,包含29条道路上的3848个探测器。该数据集的时间跨度为2018年1月至2月。我们选取前50天的数据作为训练集,其余的作为测试集。

PeMSD8 It is the traffic data in San Bernardino from July to August in 2016, which contains 1979 detectors on 8 roads. The data on the first 50 days are used as the training set and the data on the last 12 days are the test set.
PeMSD8 这是圣贝纳迪诺2016年7月至8月的交通数据,其中包含了8条道路上的1979个探测器。前50天的数据作为训练集,后12天的数据作为测试集。

Preprocessing(数据预处理)

We remove some redundant detectors to ensure the distance between any adjacent detectors is longer than 3.5 miles. Finally, there are 307 detectors in the PeMSD4 and 170 detectors in the PeMSD8. The traffic data are aggregated every 5 minutes, so each detector contains 288 data points per day. The missing values are filled by the linear interpolation. In addition, the data are transformed by zero-mean normalization x ′ = x − m e a n ( x ) x'=x-mean(x) x=xmean(x) to let the average be 0.
我们删除了一些冗余检测器,以确保任何相邻检测器之间的距离大于3.5英里。最后,PeMSD4有307个检测器,PeMSD8有170个检测器。流量数据每5分钟聚合一次,每个检测器每天包含288个数据点。缺失的值由线性插值来填充。此外,对数据进行零均值归一化变换 x ′ = x − m e a n ( x ) x'=x-mean(x) x=xmean(x) ,使平均值为 0。

Settings(实验设置)

We implemented the ASTGCN model based on the MXNet (https://mxnet.apache.org/) framework. According to Kipf and Welling (2017) , we test the number of the terms of Chebyshev polynomial K ∈ { 1 , 2 , 3 } K∈\{1,2,3\} K{1,2,3} . As K K K becomes larger, the forecasting performance improves slightly. So does the kernel size in the temporal dimension. Considering the computing efficiency and the degree of improvement of the forecasting performance, we set K = 3 K=3 K=3 and the kernel size along the temporal dimension to 3 3 3. In our model, all the graph convolution layers use 64 64 64 convolution kernels. All the temporal convolution layers use 64 64 64 convolution kernels and the time span of the data is adjusted by controlling the step size of the temporal convolutions. For the lengths of the three segments, we set them as: T h = 24 T_h=24 Th=24, T h = 12 T_h=12 Th=12, T w = 24 T_w=24 Tw=24. The size of the predicting window T p = 12 T_p =12 Tp=12, that is to say, we aim at predicting the traffic flow over one hour in the future. In this paper, the mean square error (MSE) between the estimator and the ground truth are used as the loss function and minimized by back-propagation. During the training phase, the batch size is 64 64 64 and the learning rate is 0.0001 0.0001 0.0001. In addition, in order to verify the impact of the spatio-temporal attention mechanism proposed here, we also design a degraded version of ASTGCN, named Multi-Component Spatial-Temporal Graph Convolution Networks (MSTGCN), which gets rid of the spatial-temporal attention. The settings of MSTGCN are the same as those of ASTGCN, except no spatial-temporal attention.
我们基于MXNet (https://mxnet.apache.org/)框架实现了ASTGCN模型。根据 Kipf and Welling (2017) ,我们检验了Chebyshev多项式 K ∈ { 1 , 2 , 3 } K∈\{1,2,3\} K{1,2,3} 的项数。随着 K K K 的增大,预测性能略有改善。时间维度中的内核大小也是如此。考虑到计算效率和预测性能的提升程度,我们设置 K = 3 K=3 K=3 ,时间维度上的核大小为 3 3 3 。在我们的模型中,所有的图卷积层都使用 64 64 64 个卷积核。所有的时间卷积层都使用 64 64 64 个卷积核,通过控制时间卷积步长来调整数据的时间跨度。对于这三个段的长度,我们设为: T h = 24 T_h=24 Th=24 T h = 12 T_h=12 Th=12 T w = 24 T_w=24 Tw=24。预测窗口 T p = 12 T_p=12 Tp=12 的大小,也就是说我们的目标是预测未来一个小时以上的交通流量。本文以估计器与地面真实值之间的均方误差 (MSE) 作为损失函数,通过反向传播使其最小化。在训练阶段,批大小为 64 64 64 ,学习速率为 0.0001 0.0001 0.0001 。此外,为了验证本文提出的时空注意机制的影响,我们还设计了一个退化版的ASTGCN,命名为多组件时空图卷积网络(Multi-Component Spatial-Temporal Graph Convolution Networks, MSTGCN),该网络去掉了时空注意。MSTGCN的设置与ASTGCN相同,只是没有时空注意力机制。

Baselines(基线)

We compare our model with the following eight baselines:
我们将我们的模型与以下8条基线进行比较:

  • HA: Historical Average method. Here, we use the average value of the last 12 time slices to predict the next value.
    HA: 历史平均方法。在这里,我们使用最后12个时间片的平均值来预测下一个值。
  • ARIMA (Williams and Hoel 2003): Auto-Regressive Integrated Moving Average method is a well-known time series analysis method for predicting the future values.
    ARIMA (Williams and Hoel 2003): 自回归综合移动平均法是一种著名的预测未来值的时间序列分析方法。
  • VAR (Zivot and Wang 2006): Vector Auto-Regressive is a more advanced time series model, which can capture the pairwise relationships among all traffic flow series.
    VAR (Zivot and Wang 2006): 向量自回归模型是一种更先进的时间序列模型,它可以捕捉所有交通流序列之间的成对关系。
  • LSTM (Hochreiter and Schmidhuber 1997): Long Short-Term Memory network, a special RNN model.
    LSTM (Hochreiter and Schmidhuber 1997): 长短时记忆网络,一种特殊的RNN模型。
  • GRU (Chung et al. 2014): Gated Recurrent Unit network, a special RNN model.
    GRU (Chung et al. 2014): 门控递归单元网络是一种特殊的RNN模型。
  • STGCN (Li et al. 2018): A spatial-temporal graph convolution model based on the spatial method.
    STGCN (Li et al. 2018):基于空间方法的时空图卷积模型。
  • GLU-STGCN (Yu, Yin, and Zhu 2018): A graph convolution network with a gating mechanism, which is specially designed for traffic forecasting.
    GLU-STGCN (Yu, Yin, and Zhu 2018): 一种带门控机制的图卷积网络,专门设计用于交通预测。
  • GeoMAN (Liang et al. 2018): A multi-level attention-based recurrent neural network model proposed for the geo-sensory time series prediction problem.
    GeoMAN (Liang et al. 2018): 基于注意力机制的时间序列问题预测,提出了一种基于多级注意的递归神经网络模型。

Root mean square error (RMSE) and mean absolute error (MAE) are used as the evaluation metrics.
采用均方根误差(RMSE)和平均绝对误差(MAE)作为评价指标。

Comparison and Result Analysis(对比及结果分析)

We compare our models with the eight baseline methods on PeMSD4 and PeMSD8. Table 1 shows the average results of traffic flow prediction performance over the next one hour.
我们将我们的模型与PeMSD4和PeMSD8上的8种基线方法进行比较。表1显示了未来一小时交通流量预测性能的平均结果。
 It can be seen from Table 1 that our ASTGCN achieves the best performance in both two datasets in terms of all evaluation metrics. We can observe that the prediction results of the traditional time series analysis methods are usually not ideal, demonstrating those methods’ limited abilities of modeling nonlinear and complex traffic data. By comparison, the methods based on deep learning generally obtain better prediction results than the traditional time series analysis methods. Among them, the models which simultaneously take both the temporal and spatial correlations into account, including STGCN, GLU-STGCN, GeoMAN and two versions of our model, are superior to the traditional deep learning models such as LSTM and GRU. Besides, GeoMAN performs better than STGCN and GLU-STGCN, indicating the multi-level attention mechanisms applied in GeoMAN are efficient in capturing the dynamic changings of traffic data. Our MSTGCN, without any attention mechanisms, achieve better results than the previous state-of-the-art models, proving the advantages of our model in describing spatial-temporal features of the highway traffic data. Then combined with the spatial-temporal attention mechanisms, our ASTGCN further reduces the forecasting errors.
 从表1可以看出,我们的ASTGCN在两个数据集中的所有评价指标都达到了最好的性能。可以看到,传统时间序列分析方法的预测结果往往不理想,这说明该方法对非线性、复杂交通数据建模能力有限。通过比较,基于深度学习的方法通常比传统的时间序列分析方法获得更好的预测结果。其中同时考虑时间和空间相关性的模型,包括STGCN、GLU-STGCN、GeoMAN和我们的两个版本的模型,优于传统的LSTM和GRU模型。此外,GeoMAN的性能优于STGCN和GLU-STGCN,表明在geooman中应用的多级注意机制能够有效地捕捉交通数据的动态变化。我们的MSTGCN模型在没有任何注意机制的情况下,取得了比以往先进模型更好的结果,证明了我们的模型在描述公路交通数据的时空特征方面的优势。然后结合时空注意机制,进一步降低预测误差。

Table 1: Average performance comparison of different approaches on PeMSD4 and PeMSD8. 表1:不同方法在PeMSD4和PeMSD8上的平均性能比较。



Figure 6: Performance changes of different methods as the forecasting interval increases. 图6:随着预测区间的增加,不同方法的性能变化。


 Fig. 6 shows the changes of prediction performance of various methods as the prediction interval increases. Overall, as the prediction interval becomes longer, the corresponding difficulty of prediction is getting greater, hence the prediction errors also increase. As can be seen from the figure, the methods only taking the temporal correlation into account can achieve good results in the short-term prediction, such as HA, ARIMA, LSTM and GRU. However, with the increase of the prediction interval, their prediction accuracy drops dramatically. By comparison, the performance of VAR drops slower than those methods. This is mainly because VAR can simultaneously consider the spatial-temporal correlations which are more important in the long-term prediction. However, when the scale of the traffic network becomes larger, i.e., there are more time series considered in the model, the prediction error of VAR increases, as shown in Fig.6, its performance on PeMSD4 is worse than that on PeMSD8. The errors of deep learning methods increase slowly with prediction interval increases, and their overall performance is good. Our ASTGCN model achieves the best prediction performance almost all the time. Especially in the long-term prediction, the differences between ASTGCN and other baselines are more significant, showing that the strategy of combining attention mechanism with graph convolution can better mine the dynamic spatial-temporal patterns of traffic data.
 图6显示了各方法的预测性能随预测区间增加的变化情况。总的来说,预测区间越长,相应的预测难度越大,预测误差也越大。从图中可以看出,仅考虑时间相关性的方法,如HA、ARIMA、LSTM、GRU等,在短期预测中都能取得较好的结果。但随着预测区间的增大,其预测精度显著下降。相比之下,VAR的表现比这些方法下降得慢。这主要是因为VAR可以同时考虑在长期预测中更为重要的时空相关性。但是,当交通网络规模变大,即模型考虑的时间序列增多时,VAR的预测误差增大,如图6所示,其在PeMSD4上的表现要比在PeMSD8上差。深度学习方法的误差随预测区间的增加而缓慢增加,总体性能较好。我们的ASTGCN模型几乎每次都能取得最好的预测性能。特别是在长期预测方面,ASTGCN与其他基线的差异更为显著,说明将注意机制与图卷积相结合的策略能够更好地挖掘交通数据的动态时空格局。

Figure 7: The attention matrix obtained from the spatial at- tention mechanism. 图7:由空间注意机制得到的注意矩阵。


 In order to investigate the role of attention mechanisms in our model intuitively, we perform a case study: picking out a sub-graph with 10 detectors from the PeMSD8 and showing the average spatial attention matrix among detectors in the training set. As shown on the right side of Fig. 7, in the spatial attention matrix, the i i i-th row represents the correlation strength between each detector and the i i i-th detector. For instance, look at the last row, we can know traffic flows on the 9th detector is closely related to those on the 3th and 8th detectors. This is reasonable since these three detectors are close in space on the real traffic network, as shown on the left side of Fig. 7. Hence, our model not only achieves a best forecasting performance but also shows an interpretability advantage.
 为了直观地研究注意机制在我们模型中的作用,我们进行了一个案例研究:从PeMSD8中挑选出一个含有10个检测器的子图,并显示训练集中检测器之间的平均空间注意矩阵。如图7右侧所示,空间注意矩阵中,第 i i i 行表示每个检测器与第 i i i 个检测器之间的相关强度。例如,看最后一行,我们可以知道第9个检测器的流量与第3和第8个检测器的流量是密切相关的。这是合理的,因为这三个探测器在真实交通网络的空间上是接近的,如图7左侧所示。因此,我们的模型不仅实现了最佳的预测性能,而且显示了可解释性优势。

6. Conclusion and Future Work(结论与未来工作)

In this paper, a novel attention based spatial-temporal graph convolution model called ASTGCN is proposed and successfully applied to forecasting traffic flow. The model combines the spatial-temporal attention mechanism and the spatial-temporal convolution, including graph convolutions in the spatial dimension and standard convolutions in the temporal dimension, to simultaneously capture the dynamic spatial-temporal characteristics of traffic data. Experiments on two real-world datasets show that the forecasting accuracy of the proposed model is superior to existing models. The code has been released at: https://github.com/ wanhuaiyu/ASTGCN .
本文提出了一种新的基于注意力的时空图卷积模型——ASTGCN,并成功地应用于交通流预测。该模型结合时空注意机制和时空卷积,包括空间维上的图卷积和时间维上的标准卷积,同时捕捉交通数据的动态时空特征。在两个真实数据集上的实验表明,该模型的预测精度优于现有模型。代码发布地址: https://github.com/ wanhuaiyu/ASTGCN

 Actually, the highway traffic flow is affected by many external factors, like weather and social events. In the future, we will take some external influencing factors into account to further improve the forecasting accuracy. Since the ASTGCN is a general spatial-temporal forecasting framework for the graph structure data, we can also apply it to other pragmatic applications, such as estimating time of arrival.
 实际上,公路交通流量受到许多外部因素的影响,如天气和社会事件。未来,我们将考虑一些外部影响因素,进一步提高预测精度。由于ASTGCN是一个通用的图结构数据时空预测框架,我们也可以将其应用于其他实际应用,如估计到达时间。

参考文献

Chen, C.; Petty, K.; Skabardonis, A.; Varaiya, P.; and Jia, Z. 2001. Freeway performance measurement system: mining loop detector data. Transportation Research Record: Journal of the Transportation Research Board (1748):96–102.

Kipf, T. N., and Welling, M. 2017. Semi-supervised classification with graph convolutional networks. International Conference on Learning Representations.

Williams, B. M., and Hoel, L. A. 2003. Modeling and forecasting vehicular traffic flow as a seasonal ARIMA process: Theoretical basis and empirical results. Journal of transportation engineering 129(6):664–672.

Zivot, E., and Wang, J. 2006. Vector autoregressive models for multivariate time series. Modeling Financial Time Series with S-PLUS® 385–429.

Hochreiter, S., and Schmidhuber, J. 1997. Long short-term memory. Neural Computation 9(8):1735–1780.

Chung, J.; Gulcehre, C.; Cho, K.; and Bengio, Y. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. In NIPS 2014 Workshop on Deep Learning.

Li, C.; Cui, Z.; Zheng, W.; Xu, C.; and Yang, J. 2018. Spatio-Temporal Graph Convolution for Skeleton Based Action Recognition. In AAAI Conference on Artificial Intelli- gence, 3482–3489.

Yu, B.; Yin, H.; and Zhu, Z. 2018. Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting. In International Joint Conference on Artificial Intelligence, 3634–3640.

Liang, Y.; Ke, S.; Zhang, J.; Yi, X.; and Zheng, Y. 2018. GeoMAN: Multi-level Attention Networks for Geo-sensory Time Series Prediction. In International Joint Conference on Artificial Intelligence, 3428–3434.

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值