动机
Can we learn a meaningful context representation directly from the structured HD maps?
【CC】开宗明义提问,直接从结构化的HD MAP数据学习一个信息丰富的上下文(带动态ObjList)
This paper focuses on behavior prediction in complex multi-agent systems, such as self-driving vehicles.
The core interest is to find a unifified representation which integrates the agent dynamics, acquired by perception systems such as object detection and tracking, with the scene context, provided as prior knowledge often in the form of High Defifinition (HD) maps.
【CC】找到一种表示方法将HD Map结构化数据跟感知给出的动态的ObjList做到统一表达;然后,基于这个统一的表达做轨迹预测
This paper introduces VectorNet, a hierarchical graph neural network that first exploits the spatial locality of individual road components represented by vectors and then models the high-order interactions among all components.
【CC】道路结构(即静态的环境信息)和动态的车辆都被表达成了vector,在此表达的基础上做了GNN网络来表达各个元素间交互关系
We avoid lossy rendering and computationally intensive ConvNet encoding steps. To further boost VectorNet’s capability in learning context features, we propose a novel auxiliary task to recover the randomly masked out map entities and agent trajectories based on their context.
【CC】基于Conv的Encoder会丢失精度;这里采用类似MAE的做法去做表达训练增强
构思过程
For example, a lane boundary contains multiple control points that build a spline; a crosswalk is a polygon defined by several points; a stop sign is represented by a single point. All these geographic entities can be closely approximated as polylines defined by multiple control points, along with their attributes. Similarly, the dynamics of moving agents can also be approximated by polylines based on their motion trajectories. All these polylines can then be represented as sets of vectors.
【CC】从几何意义看,车道线包含多个控制点,交叉路口是个多边形(带多个顶点),交通标志是一个点,所有这些都可被近似-- 多个顶点多边形. 同样,动态Obj的轨迹也可被多边形近似。这种多边形都可以通过vector来表达。这里就是整个vector表达的底层逻辑
Figure 1. Illustration of the rasterized rendering (left) and vectorized approach (right) to represent high-definition map and agent trajectories.
We treat each vector as a node in the graph, and