简介
动机:HD map 更新不及时,还有因为定位问题导致HD 不靠谱,需要一种基于LIDAR的无HD MAP的解决方案
输入: 原始的LIDAR信息 (点云时间序列) + Routing的指令(比如,turn right)
特点: 可解释 – 通过可解释的中间层实现
主干结构
Since the LiDAR input is voxelized at 0.2 meters per pixel, C has a resolution of 0.8 meters per pixel
概念介绍
Drivable area: Road surface (or pavement) where vehicles are allowed to drive, bounded by the curb.
Intersection: Drivable area portion where traffic is controlled via traffic lights or traffic signs. Reasoning about this is important to handle stop/yield signs and traffic lights.
Reachable lanes: Lane center lines (or motion paths) are defined as the canonical paths vehicles travel on, typically in the middle of 2 lane markers. We define the reachable lanes as the subset of motion paths the SDV can get to without breaking any traffic rules. When planning a trajectory, we would like the SDV to stay close to these reachable lanes and drive aligned to their direction.
Initial Occupancy: a BEV grid cell is active (occupied) if its center falls in the interior of a polygon given by an object shape and its current pose.
Temporal Motion Field: defined for the occupied pixels at a particular time into the future. Each occupied pixel motion is represented with a 2D BEV velocity vector (in m/s). We discretize this motion field into T = 11 time steps into the future (up to 5s, every 0.5s).
in this paper we propose an occupancy flow parameterized by the occupancy of the dynamic objects at the current state of the world and a temporal motion field into the future that describes how objects move (and in turn their future occupancies), both discretized into a spatial grid on BEV with a resolution of 0.4 m/pixel, as depicted in Fig. 4:
Backbone network:
extracts geometric and semantic information from the past LiDAR sweeps
The scene context features after each residual block are C1x, C2x, C4x, C8x, where the subscript indicates the downsampling factor from the input
Since the LiDAR input is voxelized at 0.2 meters per pixel, C has a resolution of 0.8 meters per pixel.