Social LSTM: Human Trajectory Prediction in Crowded Spaces
published in: CVPR16
Why
This problem of trajectory prediciton can be viewed as a sequence generation task, where we are interested in predicting the future trajectory of people based on their past positions.
Pioneering works have demonstrated that the inferred knowledge about the semantics of the static environment(e.g., location of sidewalks) helps predict the trajectory of pedestrains in future more accurately and human-human interactions will increase robustness and accuracy in multi-target tracking problems.
However, most of existing works are limited by the following two assumptions:
- They use hand-crafted functions to model “interactions” for specific setting rather than inferring them in a data-driven fashion.
- They focus on modeling interactions among people in close proximity to each other.
How
This work extended Long-Short Term Memory networks(LSTMs) for human trajectory prediction. To address the issue that LSTMs do not capture dependencies between multiple correlated sequences, this paper proposed a “Social” pooling layer which allows the LSTMs of spatially proximal sequences to share their hidden-states with each other.
Model
Experiments
Datasets: ETH (contains ETH and Hotel) and UCY (contains ZARA-01, ZARA-02 and UCY)
Evaluation metrics:
- Average displacement error: the mean square error(MSE) over all estimated points of a trajectory and the true points.
- Final displacement error: the distance between the predicted final destination and the true final destination at the end of the predicted period.
- Average non-linear displacement error: this is the MSE at the non-linear regions of a trajectory.
Compared models and results
Convoluntional Social Pooling for Vehicle Trajectory Prediction
pulished in: CVPR Workshops 2018
Why
Driver behavior tends to be inherently multi-modal, where a driver could make one of many decisions under the same traffic circumstances.
All previous instances of social pooling apply a fully connected layer to the social tensor, this is inefficient since it breaks up the spatial struture of the social tensor. Cells adjacent to each other in space become equivalent to cells far away from each other in the fully connected layer. This can lead to problems in generalization to a test set especially if the agents can be in various different spatial configurations.
How
Convolutional social pooling: This paper applied convolutional and max-pooling layers instead of a fully connected layer to social tensors of LSTM states that encode the past motion of neighboring vehicles. The equivariance of the convolutional layers can be expected to add locally useful features within the spatial grid of the social tensor, and the max-pooling layer can be expected to add local translational invariance.
Maneuver based decoder: The LSTM decoder used in this paper generates the probability distribution over future motion for six maneuver classes and assigns a probability to each maneuver class. This accounts for the multi-modal nature of vehicle motion.
Model
Experiments
Datasets: I-80 and US-101 data of the Next Generation Simulation(NGSIM)
Evaluation metrics:root mean square error (RMSE) and negative log-likelihood (NLL)
Compared models and results:
A Dynamic and Static Context-Aware Attention Network for Trajectory Prediction
published in: ISPRS International Journal of Geo-Information. 2021, 10(5)
Why
Traditional models consider the trajectory prediciton as a simple sequence prediction task. The ignorance of inter-vehicle interaction and environment influence degrades these models in real-world datasets.
How
This paper proposed a novel Dynamic and Static Context-aware Attention Network named DSCAN, which utilizes an attention mechanism to dynamically decide which surrouding vehicles are more important at the moment. Besides, this method is also equipped with a constraint network to consider the static environment information.
Model
Experiments
Datasets:I-80 and US-101 data of the Next Generation Simulation(NGSIM)
Evaluation metrics:root mean square error (RMSE)
Compared models and results
Vehicle Trajectory Prediction Using Hierarchical Graph Neural Network for Considering Interaction among Multimodal Maneuvers
pulished in: Sensors 2021, 21(16)
Why
The prediction of vehicle trajectories is an exigent problem for the following reasons:
- The subsequent maneuver of a vehicle has some uncertainty because the driver’s destination is unknown. Therefore, the trajectory prediction algorithm must be capable of predicting all possible maneuvers.
- Vehicle movements are interdependent. Therefore, the trajectory predicition algorithm should consider the interactions among the multiple maneuvers of vehicles.
However, previous studies have only considered the interactions among observed trajectories due to subsequent maneuvers that are unobservable and numerous maneuver combinations.
How
This paper proposed a hierarchical graph neural network, which approximately predicts the multiple maneuvers of vehicles and considers the interaction among the maneuvers by representing their relationships in a graph structure. The proposed method consisted of two hierarchical stages. The first stage approximately predicted the multiple trajectories of surrounding vehicles based on multimodal maneuvers and the probability of each maneuver. The second stage predicted the trajectory by considering the interactions among the predicted multiple trajectories of vehicles.
Model
The proposed model consists of two modules: a maneuver-based multi-modal trajectory prediction network and an interaction-aware trajectory prediction network.
Experiments
Datasets:I-80 and US-101 data of the Next Generation Simulation(NGSIM)
Evaluation metrics:root mean square error (RMSE)
Compared models and results