BEV-LaneDet主要内容梳理

DouDouWuH

已于 2023-07-24 14:41:02 修改

阅读量633

点赞数

文章标签：算法

于 2023-07-20 17:10:11 首次发布

本文链接：https://blog.csdn.net/wuhongwuyan/article/details/131810348

版权

Abstract

提出BEV-LaneDet：首先，介绍了virtual camera，统一了安装在不同车辆上的cameras内外参数，保证cameras之间空间关系的一致性。由于统一的视觉空间，可以有效地促进学习过程。其次，提出了一个简单而有效的三维车道表示，称为关键点表示（Key-Points Representation）。该模块更适合表示复杂多样的三维车道结构。最后，提出了一种轻量化且友好的空间转换模块——空间转换金字塔（Spatial Transformation Pyramid），将多尺度前视特征转换为BEV特征。

Introduction

主要贡献在以下三方面：
（1）Virtual Camera：预处理模块，统一摄像机内外参数，保证数据分布的一致性。
（2）Key-Points Representation：简单且有效的3D lane 结构的表示
（3）Spatial Transformation Pyramid：一个轻量化且容易部署的基于MLP的结构，实现从多尺度front-view features 到BEV的转变。

Methodology

在这里插入图片描述
整体结构由5部分构成：
1 Virtual Camera: 统一相机内外参数的预处理方法
2 Front-View Backbone: a front-view features extractor
3 Spatial Transformation Pyramid: projecting front-view features to BEV features
4 Key-Points Representation: 基于key-point的一个3D head detector
5 Front-view Head: 2D车道检测头，提供辅助监督。

virtual camera

统一相机内外参数的预处理办法，利用单应性的共平面性，通过单应性矩阵 $H_i,_j$ 将当前摄像机的图像投影到虚拟摄像机的视图中。实现了不同相机的空间关系的一致性。virtual camera的内参和外参是固定的，这些参数是由训练数据集的内外参数的平均值计算出来的。
input image camer 内参、外参 $R i, T i$
virtual camer内参、外参 $R j, T j$
在BEV平面上 $P ro a d$ 上选择4个点 $x^k = (x^k ,y^k, 0)^T,k=1,2,3,4$ , 将4个点分别投射到input image和virtual camera image上，得到 $u_i^k=(u_i^k,v_i^k,1)^T$ 和 $u_j^k=(u_j^k,v_j^k,1)^T$ ,最后通过最小二乘法得到单应性矩阵 $H_i,_j$
$H_i,_j * u_i^k= u_j^k$
(实际推理过程中，把原始相机的image和 $H_i,_j$ 传入到Opencv库warpPerspective即可得到virtual camera下的image)

MLP Based Spatial Transformation Pyramid

轻量化且容易部署的模块：View Relation Module (VRM) based MLP
该模块学习flattened front-view特征的任意两个像素位置和flatten BEV特征之间的关系。VRM对front-view feature layer 的位置敏感。分析了不同尺度的front-view features in the VRM的影响，实验证明：低分辨率更适合spatial transformation in the VRM. 低分辨率的特征包含更多的全局信息。并且因为the MLP-based spatial transformation 是固定的映射，低分辨率特征需要较少的映射参数，更易学习。图2中的红色框内，是设计的一个spatial transformation pyramid based on VRM.实验时，使用input image的1/64分辨率特征:S64和1/32分辨率特征：S32 to be transformed.并且concatenate the results of both.
在这里插入图片描述

Key-Points Representation

在这里插入图片描述
划分BEV平面 $P ro a d$ 为s1xs2 cells(坐标为 $C ro a d = (x, y, z), z = 0$ )，每个cell用xx(x defaults to 0.5m)表示。同样的分辨率下预测4个heads,包括confidence（置信度），the embedding used for clustering（用于聚类的嵌入），the offset from the cell center to the lane in y derection（从单元格中心到车道在y方向上的偏移量）, the average height of each cell（每个单元格的平均height）. grid cell的size对3D lane 预测有影响，经过实验，设定grid cell 尺寸为0.50.5 $m^2$ ，在训练过程和推断过程中，我们预测the lanes of (-10m,10m) in the y direction and (3m ,103m) in the x direction in the road ground 坐标 $C ro a d = (x, y, z)$ , 因此，4个200x40分辨率tensors,包括confidence,embedding,offset and height从3D lane detection head 输出。the confidence branch ,embedding branch , offset branch are merged to botain the instance-level lanes under the BEV，图4显示。

confidence

和YOLO相似，lanes的置信度是a binary classification branch。每个像素表示the confidence of the cell。如果through the cell有a lane,the cell的置信度得分设置为1，否则设置为0. 置信度的loss can be expressed by the Binary Cross Entropy loss。
在这里插入图片描述
$p_i$ 是模型预测得到的confidence，~ $p_i$ 是ground truth的confidence。

Offset

由于cofidence分支不能准确的表示lanes的位置，偏离分支主要任务是准确预测cell center到lane在road ground 坐标 $C_road=(x,y,z)$ 在y方向的偏移量。图4显示，模型预测每个单元格的y轴的偏移量。the offset通过Sigmoid归一化并且减去了0.5，所以offset的范围是（-0.5，0.5）。offset损失为MSE loss（只计算offset for grid cells with a positive ground truth of confidence）。
在这里插入图片描述

embedding

在训练阶段，最小化同一车道的cell嵌入之间的距离，最大化不同车道的cell嵌入之间的距离。在推理中，我们使用了一种快速无监督聚类后处理方法来预测可变车道数。与通常在消失点处收敛的front-view lanes不同，三维车道更适合嵌入聚类损失函数。
在这里插入图片描述

lane height

confidence,offset,embedding都只能预测在road ground 坐标系 $C ro a d = (x, y, z)$ 的关键点x,y。因此提出了一个height branch来表示预测关键点z。训练过程中，使用单元格的平均高度作为ground truth。与此同时，仅positive ground truth的单元格在loss中可以conunted。
在这里插入图片描述

total loss

total loss包括3D lanes的损失和front-view lane的损失。front-view lane loss 包括lane segmentation loss和lane embedding loss。
在这里插入图片描述

DouDouWuH

关注

0
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
BEV-LaneDet主要内容梳理

提出BEV-LaneDet：首先，介绍了virtual camera，统一了安装在不同车辆上的cameras内外参数，保证cameras之间空间关系的一致性。由于统一的视觉空间，可以有效地促进学习过程。其次，提出了一个简单而有效的三维车道表示，称为关键点表示（Key-Points Representation）。该模块更适合表示复杂多样的三维车道结构。
复制链接

扫一扫