单眼测试_单眼鸟瞰自动驾驶语义分割

本文聚焦于单眼测试在自动驾驶领域的应用,主要关注单眼鸟瞰视角下的语义分割技术。通过该技术,可以对车辆周围的环境进行详细解析,为自动驾驶提供关键的感知信息。
摘要由CSDN通过智能技术生成

单眼测试

Autonomous driving requires an accurate representation of the environment around the ego vehicle. The environment includes static elements such as road layout and lane structures, and also dynamic elements such as other cars, pedestrians, and other types of road users. The static elements can be captured by an HD map containing lane level information.

无人驾驶需要精确表示自我车辆周围的环境。 环境包括静态元素(例如道路布局和车道结构)以及动态元素(例如其他汽车,行人和其他类型的道路使用者)。 静态元素可以通过包含车道级别信息的高清地图来捕获。

There are two types of mapping methods, offline and online. For offline mapping and the application of deep learning in offline mapping, please refer to my previous post. In places where there is no map support or the autonomous vehicle has never been to, online mapping would be useful. For online mapping, one conventional method is SLAM (simultaneous localization and mapping) which relies on the detection and matching of geometric features on a sequence of images, or with a twist of the added notion of object.

有两种类型的映射方法,脱机和联机。 有关离线地图以及深度学习在离线地图中的应用,请参阅我的上一篇文章。 在没有地图支持或自动驾驶汽车从未去过的地方,在线地图会很有用。 对于在线制图,一种常规方法是SLAM(同时定位和制图),它依赖于图像序列上的几何特征的检测和匹配,或者依赖于所添加的对象概念的扭曲。

This post will focus on another way to do online mapping — bird’s-eye-view (BEV) semantic segmentation. Compared with SLAM which requires a sequence of images from the same moving camera over time, BEV semantic segmentation is based on images captured by multiple cameras looking at different directions of the vehicle at the same time. It is, therefore, able to generate more useful information from the one-shot collection of data than SLAM. In addition, when the ego car is stationary or slowly moving, BEV semantic segmentation would still work, while SLAM will perform poorly or fail.

这篇文章将重点介绍另一种进行在线映射的方法-鸟瞰(BEV)语义分割。 与需要随时间推移从同一台运动摄像机获取一系列图像的SLAM相比,BEV语义分割基于多个摄像机同时看向车辆不同方向的图像。 因此,与SLAM相比,它能够从一次数据收集中生成更多有用的信息。 此外,当自驾车静止不动或缓慢行驶时,BEV语义细分仍将起作用,而SLAM的表现会很差或失败。

为什么选择BEV语义图? (Why BEV semantic maps?)

In a typical autonomous driving stack, Behavior Prediction and Planning are generally done in this a top-down view (or bird’s-eye-view, BEV), as hight information is less important and most of the information an autonomous vehicle would need can be conveniently represented with BEV. This BEV space can be loosely referred to as the 3D space. (For example, object detection in BEV space is typically referred to as 3D localization, to differ from full-blown 3D object detection.)

在典型的自动驾驶系统中,行为预测和计划通常是在自顶向下的视图(或鸟瞰图,BEV)中进行的,因为较高的信息不太重要,自动驾驶汽车可能需要的大多数信息都可以用BEV方便地表示。 这个BEV空间可以广义地称为3D空间。 (例如,BEV空间中的对象检测通常称为3D定位,与成熟的3D对象检测不同。)

It is therefore standard practice to rasterize HD maps into a BEV image and combine with dynamic object detection in behavior prediction planning. Recent research exploring this strategy includes IntentNet (Uber ATG, 2018), ChauffeurNet (Waymo, 2019), Rules of the Road (Zoox, 2019), Lyft Prediction Dataset (Lyft, 2020), among many others.

因此,标准做法是将高清地图栅格化为BEV图像,并在行为预测计划中与动态对象检测相结合。 探索该策略的最新研究包括IntentNet (Uber ATG,2018), ChauffeurNet (Waymo,2019),道路规则(Zoox,2019), Lyft Prediction Dataset (Lyft,2020)等。

Image for post
Image compiled by author, sources from publications in reference) 图片由作者编辑,参考文献中的来源)

Traditional computer vision tasks such as object detection and semantic segmentation involve making estimations in the same coordinate frame as the input image. As a consequence, the Perception stack of autonomous driving typically happens in the same space as the onboard camera image — the perspective view space.

传统的计算机视觉任务(例如对象检测和语义分割)涉及在与输入图像相同的坐标系中进行估计。 因此,自动驾驶的Percept堆栈通常发生在与车载摄像机图像相同的空间—透视图空间中

Image for post
SegNet) while Planning happens in BEV space (right: SegNet )中,而计划发生在BEV空间(右: NMP) ( NMP )中( source) )

The gap between the representation used in perception and downstream tasks such as prediction and planning are typically bridged in the Sensor Fusion stack, which lifts the 2D observation in perspective space to 3D or BEV, usually with the help of active sensors such as radar or lidar. That said, it is beneficial for perception across modalities to use BEV representation. First of all, it is interpretable and facilitates debugging about inherent failure modes for each sensing modality. It is also easily extensible to other new modalities and simplifies the task of late fusion. In addition, as mentioned above, the perception results in this representation can be readily consumed by prediction and planning stack.

通常在传感器融合堆栈中弥合用于感知和下游任务(如预测和计划)的表示之间的差距,这通常在有源传感器(例如雷达或激光雷达)的帮助下将透视空间中的2D观察提升为3D或BEV。 。 就是说,使用BEV表示法对于跨模态感知是有益的。 首先,它是可解释的,并且有助于调试每种感应方式的固有故障模式。 它也可以很容易地扩展为其他新形式,并简化了后期融合的任务。 另外,如上所述,该表示中的感知结果可以容易地被预测和计划栈消耗。

将Perspective RGB图像提升到BEV (Lifting Perspective RGB images to BEV)

The data from active sensors such as radar or lidar lend themselves to the BEV representation as the measurement are inherently metric in 3D. However, due to the ubiquitous presence and low cost of the surround-view camera sensors, the generation of BEV images with semantic meaning has attracted a lot of attention recently.

来自主动传感器(如雷达或激光雷达)的数据适合BEV表示,因为测量本质上是3D度量。 然而,由于环视相机传感器的普遍存在和低成本,具有语义意义的BEV图像的产生近来引起了很多关注。

In the title of this post, “monocular” refers to the fact that the input of the pipeline are images obtained from monocular RGB cameras, without explicit depth information. Monocular RGB images captured onboard autonomous vehicles are perspective projections of the 3D space, and the inverse problem of lifting 2D perspective observations into 3D is an inherently ill-posed problem.

在这篇文章的标题中,“单眼”指的是管道的输入是从单眼RGB相机获得的图像,没有明确的深度信息。 在自动驾驶汽车上捕获的单眼RGB图像是3D空间的透视投影,而将2D透视观察提升为3D的反问题是一个固有的不适定问题。

IPM及其它面临的挑战 (Challenges, IPM and Beyond)

One obvious challenge for BEV semantic segmentation is the view transformation. In order to properly restore the BEV representation of the 3D space, the algorithm has to leverage both hard (but potentially noisy) geometric priors such as the camera intrinsics and extrinsics, and also soft priors such as the knowledge corpus of road layout, and common sense (cars do not overlap in BEV, etc). Conventionally, inverse perspective mapping (IPM) has been the go-to method for this task, assuming a flat ground assumption and a fixed camera extrinsics. But this task does not work well for non-flat surface or on a bumpy road when camera extrinsics vary.

BEV语义分割的一个明显挑战是视图转换。 为了正确还原3D空间的BEV表示,该算法必须利用硬的(但可能有噪声)几何先验条件,例如相机固有和外部性,以及软的先验条件,例如道路布局的知识语料库和通用模型。感觉(汽车在BEV中不会重叠等)。 按照惯例,假定地面平坦且摄影机外部固定,逆透视映射(IPM)已成为该任务的首选方法。 但是,当相机的外部特性变化时,此任务不适用于非平坦表面或在崎road不平的道路上。

Image for post
  • 1
    点赞
  • 31
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值