MP3- A Unified Model to Map Perceive Predict and Plan

最新推荐文章于 2023-10-24 12:10:44 发布

64318@461

最新推荐文章于 2023-10-24 12:10:44 发布

阅读量543

点赞数

分类专栏： End to End 文章标签：自动驾驶

本文链接：https://blog.csdn.net/weixin_56836871/article/details/120404032

版权

该文提出了一种无需高清地图的自动驾驶解决方案，利用LIDAR信息和路由指令，通过可解释的中间层进行地图构建、感知预测和路径规划。模型包括Backbone网络、Mapping架构和Perception & Prediction架构，以及Routing网络，旨在处理动态对象的占用情况和未来行为不确定性，以确保安全驾驶。

摘要由CSDN通过智能技术生成

简介

动机：HD map 更新不及时，还有因为定位问题导致HD 不靠谱，需要一种基于LIDAR的无HD MAP的解决方案
输入：原始的LIDAR信息（点云时间序列） + Routing的指令（比如，turn right）
特点：可解释 – 通过可解释的中间层实现

主干结构

在这里插入图片描述
Since the LiDAR input is voxelized at 0.2 meters per pixel, C has a resolution of 0.8 meters per pixel

概念介绍

Drivable area: Road surface (or pavement) where vehicles are allowed to drive, bounded by the curb.

Intersection: Drivable area portion where traffic is controlled via traffic lights or traffic signs. Reasoning about this is important to handle stop/yield signs and traffic lights.

Reachable lanes: Lane center lines (or motion paths) are defined as the canonical paths vehicles travel on, typically in the middle of 2 lane markers. We define the reachable lanes as the subset of motion paths the SDV can get to without breaking any traffic rules. When planning a trajectory, we would like the SDV to stay close to these reachable lanes and drive aligned to their direction.
在这里插入图片描述
Initial Occupancy: a BEV grid cell is active (occupied) if its center falls in the interior of a polygon given by an object shape and its current pose.

Temporal Motion Field: defined for the occupied pixels at a particular time into the future. Each occupied pixel motion is represented with a 2D BEV velocity vector (in m/s). We discretize this motion field into T = 11 time steps into the future (up to 5s, every 0.5s).
在这里插入图片描述
in this paper we propose an occupancy flow parameterized by the occupancy of the dynamic objects at the current state of the world and a temporal motion field into the future that describes how objects move (and in turn their future occupancies), both discretized into a spatial grid on BEV with a resolution of 0.4 m/pixel, as depicted in Fig. 4:
在这里插入图片描述

Backbone network:

extracts geometric and semantic information from the past LiDAR sweeps
The scene context features after each residual block are C1x, C2x, C4x, C8x, where the subscript indicates the downsampling factor from the input

在这里插入图片描述
Since the LiDAR input is voxelized at 0.2 meters per pixel, C has a resolution of 0.8 meters per pixel.