Ultra-Fast-Lane-Detection-v2解读

Ultra-Fast-Lane-Detection-v2解读

v2和v1对比

v1采用row-anchor方式,v2采用的是hybrid-anchor方式,原因如下:
(a) shows the definition of lanes by the CULane dataset. (a图展示了CULane数据集对车道线的定义)
(b) is the lane-wise accuracy with the row anchor system. (b图展示了使用row-anchor系统的车道线精度)
© illustrates the lane-wise accuracy with the column anchor system. (c图说明了使用column-anchor系统的车道线精度)
We can see that for ego lanes, the row anchor system gains better performance, while the column anchor system gains better performance for side lanes.(我们可以看到,对于ego车道,row-anchor系统获得了更好的性能,而对于side车道,column-anchor系统获得更好的性能)
更具体的原因见论文3.1部分

<img src='im在这里插入图片描述
在这里插入图片描述

v2算法总体设计

Illustration of the network architecture. The input image is first sent to a backbone network to get the deep feature. Then the deep feature
is flattened and fed into a classifier, which has two output branches. The first localization branch is to learn the coordinates on the hybrid anchors
with classification-based representation. The second existence branch is to predict the existence of each coordinate on the hybrid anchors. After
obtaining the localization output, we use expectation instead of argmax to get the coordinates of lanes.
(网络体系结构的图示.首先将输入图像发送到主干网络以获取深层特征.然后是深层特征被展平并送入一个具有两个输出分支的分类器.第一个是基于混合锚点分类器表示的定位分支,用来学习坐标.第二个存在分支是预测混合锚上每个坐标的存在情况.之后
为了获得定位输出,我们使用期望值而不是argmax来获得车道坐标)

锚点驱动网络设计

名词对照表
标签生成

If the lane has no intersection between certain anchors, the coordinates will be set to -1. Suppose the number of lanes assigned to row anchors is Nrlane and the one for column anchors is Nclane. The lanes in an image can be represented by a fixed-size target T where every element is either the coordinate of lane or -1, and its length is Nrow × Nrlane + Ncol × Nclane. T can be divided into two parts Tr and Tc, which correspond to the parts on row and column anchors, and the sizes are Nrow× Nrlane and Ncol × Nclane respectively.
(如果车道在某些锚之间没有交集,坐标将设置为-1。假设指定给行锚的车道数为Nrlane,而列锚的车道为Nclane。图像中的车道可以用一个固定大小的目标T表示,其中每个元素都是车道坐标或-1,其长度为Nrow×Nrlane。T可以分为两个部分Tr和Tc.这两个部分对应于行锚和列锚上的部分,其大小分别为Nrow×Nrlane和NcolxNclane)
With the help of lane representation with hybrid anchor, our goal of designing networks is to learn the fixed-size targets Tr and Tc with classification. To learn Tr and Tc with classification, we map different coordinates in Tr and Tc to distinct classes. Suppose Tr and Tc are normalized (the elements of Tr and Tc range from 0 to 1 or equal -1, i.e., the “no lane” case), and the numbers of classes are Nrdim and Ncdim. The mapping can be written as:
(借助混合锚的车道表示,我们设计网络的目标是通过分类学习固定大小的目标Tr和Tc。为了使得分类器学习Tr和Tc,我们将Tr和Tc中的不同坐标映射到不同的类。假设Tr和Tc被归一化(Tr和Tc的元素范围为0到1或等于-1,即“无车道线”情况),类的数量为Nrdim和Ncdim。映射可以写为:)

注:[in which Trcls and Tccls are the mapped class labels of the coordinates, [·] is the floor operation, and Trcls_i,j is the element in the i-th row, j-th column of Trcls. In this way, we could convert the learning of coordinate on the hybrid anchor to two classification problems with dimensions of Nrdim and Ncdim, respectively. For the no lane case, i.e., Tri,j or Tcm,n equals -1, we use an additional two-way classification to indicate:]
(其中,Trcls和Tccls是坐标的映射类标签,[·]是向下取整除法运算即将dim全局坐标转为dim坐标,Trcls_i,j是Trcls的第i行第j列中的元素。这样,我们可以将混合锚上的坐标学习转化为两个分类问题,其维数分别为Nrdim和Ncdim。对于无车道情况,即Tri,j或Tcm,n=-1,我们使用额外的双向分类来表示:)
具体处理代码可参见utils/common下的inference_culane_tusimple函数

注:[in which Trext is the class label of the coordinates’ existence, and Trext_i,j is the element in the i-th row, j-th column of Trext. The existence targets for column anchor Tcext is similar:]
(其中,Trext是坐标存在的类标签,Trext_i,j是Trext第i行第j列的元素。列锚Tcext的现有目标类似:)

注:[With the above derivation, the whole network is to learn the Trcls, Tccls, Trext and Tcext with two branches, which are localization and existence branches. Suppose the deep feature of an input image is X, the network can be written as:]
(通过以上推导,整个网络将学习Trcls、Tccls、TrextT和Tcext两个分支,即定位分支和存在分支。假设输入图像的深层特征是X,网络可以写成:)

注:[in which P and E are the localization and existence branches, f is the classifier, and flatten(·) is the flatten operation. The outputs of P and E are all composed of two parts (Pr, Pc, Erand Ec), which correspond to the row and column anchors respectively. The sizes of Pr and Pc are Nrlane × Nrow × Nrdim and Nclane × Ncol × Ncdim respectively,in which Nrdim and Ncdim are the mapped classification dimensions for row and column anchors. The sizes of Er and Ec are Nrlane×Nrow×2 and Nclane×Ncol×2 respectively]
(其中P和E是定位和存在分支,f是分类器,flature(·)是展平操作。P和E的输出都由两部分组成(Pr、Pc、Er和Ec,分别对应于行锚和列锚。Pr和Pc分别用Nrlane×NrowxNrdim和Nclane × Ncol × Ncdim 表示,Nrdim和Ncdim是行锚和列锚的映射分类维度。Er和Ec的维度分别是Nrlane×Nrowx2和Nclane×Ncol×2)

注:[we directly flatten the deep features from the backbone and feed them to the classifier. In comparison, conventional classification networks [54], [55], [56], [57] use global average pooling (GAP). The reason why we use flatten instead of GAP is that we find the spatial information is crucial for the classification-based lane detection network. Using GAP would eliminate the spatial information and result in poor performance]
(我们直接对主干层输出的深层特征展平并将其提供给分类器。相比之下,传统分类网络使用全局平均池(GAP)。之所以使用展平代替GAP,是因为我们发现空间信息对于基于分类的车道检测网络至关重要。使用GAP会消除空间信息并导致性能不佳)

顺序分类损失

正如上述公式中看到的,一个基本性质是上述分类网络中的类具有顺序关系,在我们的分类网络中,相邻分类被定义为具有和传统分类不同的紧密的顺序关系,为了更好利用这个顺序关系的先验知识,我们提议使用基础分类损失和期望损失.

基础分类损失被定义如下:

注:[上述公式中的LCE(·) 是交叉熵损失,Pri,j是第i条车道线、第j个row锚点的预测定位结果,Trcls_i,j是对应的真实标签,列锚loss同row锚点损失]
这里标签中是有-1存在,按照传统交叉熵计算是无法计算的,源码中的具体loss算法可以参见utils/loss.py中的soft_nll函数

由于分类是有顺序的,预测的期望可以看作是平均投票的结果。为了方便我们将期望值表示为如下:

注:[上式中[.]表示取索引操作,Prob函数定义如下:]

使用这种方法,我们可以限制对预测的期望,以接近实际情况。因此我们有如下的期望损失:

注:[上式中的L1是smooth_L1损失函数]

期望损失图解如下,我们可以看到,期望损失可以将预测分布的数学期望接近真实分布,因此有利于车道定位:

另外,存在分支的损失函数定义如下:

最终,总损失可以表示如下:

注释:[α 和 β 是损失函数的系数,另外源码中实际的损失不仅仅包括以上损失还将v1的损失函数加入.]

Ultra-Fast-Lane-Detection-v2解读

v2和v1对比

v1采用row-anchor方式,v2采用的是hybrid-anchor方式,原因如下:
(a) shows the definition of lanes by the CULane dataset. (a图展示了CULane数据集对车道线的定义)
(b) is the lane-wise accuracy with the row anchor system. (b图展示了使用row-anchor系统的车道线精度)
© illustrates the lane-wise accuracy with the column anchor system. (c图说明了使用column-anchor系统的车道线精度)
We can see that for ego lanes, the row anchor system gains better performance, while the column anchor system gains better performance for side lanes.(我们可以看到,对于ego车道,row-anchor系统获得了更好的性能,而对于side车道,column-anchor系统获得更好的性能)
更具体的原因见论文3.1部分

v2算法总体设计

Illustration of the network architecture. The input image is first sent to a backbone network to get the deep feature. Then the deep feature
is flattened and fed into a classifier, which has two output branches. The first localization branch is to learn the coordinates on the hybrid anchors
with classification-based representation. The second existence branch is to predict the existence of each coordinate on the hybrid anchors. After
obtaining the localization output, we use expectation instead of argmax to get the coordinates of lanes.
(网络体系结构的图示.首先将输入图像发送到主干网络以获取深层特征.然后是深层特征被展平并送入一个具有两个输出分支的分类器.第一个是基于混合锚点分类器表示的定位分支,用来学习坐标.第二个存在分支是预测混合锚上每个坐标的存在情况.之后
为了获得定位输出,我们使用期望值而不是argmax来获得车道坐标)

锚点驱动网络设计

名词对照表
标签生成

If the lane has no intersection between certain anchors, the coordinates will be set to -1. Suppose the number of lanes assigned to row anchors is Nrlane and the one for column anchors is Nclane. The lanes in an image can be represented by a fixed-size target T where every element is either the coordinate of lane or -1, and its length is Nrow × Nrlane + Ncol × Nclane. T can be divided into two parts Tr and Tc, which correspond to the parts on row and column anchors, and the sizes are Nrow× Nrlane and Ncol × Nclane respectively.
(如果车道在某些锚之间没有交集,坐标将设置为-1。假设指定给行锚的车道数为Nrlane,而列锚的车道为Nclane。图像中的车道可以用一个固定大小的目标T表示,其中每个元素都是车道坐标或-1,其长度为Nrow×Nrlane。T可以分为两个部分Tr和Tc.这两个部分对应于行锚和列锚上的部分,其大小分别为Nrow×Nrlane和NcolxNclane)
With the help of lane representation with hybrid anchor, our goal of designing networks is to learn the fixed-size targets Tr and Tc with classification. To learn Tr and Tc with classification, we map different coordinates in Tr and Tc to distinct classes. Suppose Tr and Tc are normalized (the elements of Tr and Tc range from 0 to 1 or equal -1, i.e., the “no lane” case), and the numbers of classes are Nrdim and Ncdim. The mapping can be written as:
(借助混合锚的车道表示,我们设计网络的目标是通过分类学习固定大小的目标Tr和Tc。为了使得分类器学习Tr和Tc,我们将Tr和Tc中的不同坐标映射到不同的类。假设Tr和Tc被归一化(Tr和Tc的元素范围为0到1或等于-1,即“无车道线”情况),类的数量为Nrdim和Ncdim。映射可以写为:)

注:[in which Trcls and Tccls are the mapped class labels of the coordinates, [·] is the floor operation, and Trcls_i,j is the element in the i-th row, j-th column of Trcls. In this way, we could convert the learning of coordinate on the hybrid anchor to two classification problems with dimensions of Nrdim and Ncdim, respectively. For the no lane case, i.e., Tri,j or Tcm,n equals -1, we use an additional two-way classification to indicate:]
(其中,Trcls和Tccls是坐标的映射类标签,[·]是向下取整除法运算即将dim全局坐标转为dim坐标,Trcls_i,j是Trcls的第i行第j列中的元素。这样,我们可以将混合锚上的坐标学习转化为两个分类问题,其维数分别为Nrdim和Ncdim。对于无车道情况,即Tri,j或Tcm,n=-1,我们使用额外的双向分类来表示:)
具体处理代码可参见utils/common下的inference_culane_tusimple函数

注:[in which Trext is the class label of the coordinates’ existence, and Trext_i,j is the element in the i-th row, j-th column of Trext. The existence targets for column anchor Tcext is similar:]
(其中,Trext是坐标存在的类标签,Trext_i,j是Trext第i行第j列的元素。列锚Tcext的现有目标类似:)

注:[With the above derivation, the whole network is to learn the Trcls, Tccls, Trext and Tcext with two branches, which are localization and existence branches. Suppose the deep feature of an input image is X, the network can be written as:]
(通过以上推导,整个网络将学习Trcls、Tccls、TrextT和Tcext两个分支,即定位分支和存在分支。假设输入图像的深层特征是X,网络可以写成:)

注:[in which P and E are the localization and existence branches, f is the classifier, and flatten(·) is the flatten operation. The outputs of P and E are all composed of two parts (Pr, Pc, Erand Ec), which correspond to the row and column anchors respectively. The sizes of Pr and Pc are Nrlane × Nrow × Nrdim and Nclane × Ncol × Ncdim respectively,in which Nrdim and Ncdim are the mapped classification dimensions for row and column anchors. The sizes of Er and Ec are Nrlane×Nrow×2 and Nclane×Ncol×2 respectively]
(其中P和E是定位和存在分支,f是分类器,flature(·)是展平操作。P和E的输出都由两部分组成(Pr、Pc、Er和Ec,分别对应于行锚和列锚。Pr和Pc分别用Nrlane×NrowxNrdim和Nclane × Ncol × Ncdim 表示,Nrdim和Ncdim是行锚和列锚的映射分类维度。Er和Ec的维度分别是Nrlane×Nrowx2和Nclane×Ncol×2)

注:[we directly flatten the deep features from the backbone and feed them to the classifier. In comparison, conventional classification networks [54], [55], [56], [57] use global average pooling (GAP). The reason why we use flatten instead of GAP is that we find the spatial information is crucial for the classification-based lane detection network. Using GAP would eliminate the spatial information and result in poor performance]
(我们直接对主干层输出的深层特征展平并将其提供给分类器。相比之下,传统分类网络使用全局平均池(GAP)。之所以使用展平代替GAP,是因为我们发现空间信息对于基于分类的车道检测网络至关重要。使用GAP会消除空间信息并导致性能不佳)

顺序分类损失

正如上述公式中看到的,一个基本性质是上述分类网络中的类具有顺序关系,在我们的分类网络中,相邻分类被定义为具有和传统分类不同的紧密的顺序关系,为了更好利用这个顺序关系的先验知识,我们提议使用基础分类损失和期望损失.

基础分类损失被定义如下:

注:[上述公式中的LCE(·) 是交叉熵损失,Pri,j是第i条车道线、第j个row锚点的预测定位结果,Trcls_i,j是对应的真实标签,列锚loss同row锚点损失]
这里标签中是有-1存在,按照传统交叉熵计算是无法计算的,源码中的具体loss算法可以参见utils/loss.py中的soft_nll函数

由于分类是有顺序的,预测的期望可以看作是平均投票的结果。为了方便我们将期望值表示为如下:

注:[上式中[.]表示取索引操作,Prob函数定义如下:]

使用这种方法,我们可以限制对预测的期望,以接近实际情况。因此我们有如下的期望损失:

注:[上式中的L1是smooth_L1损失函数]

期望损失图解如下,我们可以看到,期望损失可以将预测分布的数学期望接近真实分布,因此有利于车道定位:

另外,存在分支的损失函数定义如下:

最终,总损失可以表示如下:

注释:[α 和 β 是损失函数的系数,另外源码中实际的损失不仅仅包括以上损失还将v1的损失函数加入.]

  • 2
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值