动机
In this paper we develop a one-stage detector and forecaster that exploits both 3D point clouds produced by a LiDAR sensor as well as dynamic maps of the environment.
we exploit 3D LiDAR point clouds and dynamic HD maps containing semantic elements such as lanes, intersections and traffic lights.
our IntentNet is a fully-convolutional neural network that outputs three types of variables in a single forward pass corresponding to: detection scores for vehicle and background classes, high level action probabilities corresponding to discrete intention, and bounding box regressions in the current and future time steps to represent the intended trajectory
IntentNet is inspired by FaF [4], which performs joint detection and future prediction.
优点
(i) a more suitable architecture based on an early fusion of a larger number of previous LiDAR sweeps,
(ii) a parametrization of the map that allows our model to understand traffific constraints for all vehicles at once and
(iii) an improved loss function that includes a temporal discount factor to
account for the inherent ambiguity of the future.
Input parametrization
3D point cloud: we represent point clouds in bird’s eye view (BEV) as a 3D tensor, treating height as our channel dimension.
Dynamic maps:
We form a BEV representation of our maps by rasterization。
We represent each semantic component in the map with a binary map (i.e., 1 or -1). Roads and intersections are represented as fifilled polygons covering the whole drivable surface. Lane boundaries are parametrized as poly-lines representing the left and right boundaries of lane segments.
In total, there are 17 binary masks used as map features, resulting in a 3D tensor that represents the map