EPNet

最新推荐文章于 2024-08-12 09:04:51 发布

xinxiang7

最新推荐文章于 2024-08-12 09:04:51 发布

阅读量4.3k

点赞数 1

分类专栏： paper阅读文章标签： 3D目标检测

本文链接：https://blog.csdn.net/xinxiang7/article/details/113757218

版权

paper阅读专栏收录该内容

14 篇文章 4 订阅

订阅专栏

EPnet是一种增强点特征与图像语义的3D对象检测方法，解决了多源融合和位置分类不一致问题。它通过简单管道而非BEV数据生成建立激光雷达与相机的精确对应，无信息损失且保持几何结构。一致性强制损失函数鼓励高分类自信度与真实部分重合，易于实施且不增加额外计算。方法包括两流RPN，图像和几何流，以及LI-Fusion模块，通过双线性插值获取逐点图像特征并用LiDAR指导的融合层处理干扰信息。

摘要由CSDN通过智能技术生成

EPnet

EPNet: Enhancing Point Features with Image Semantics for 3D Object Detection

Abstract:

文章主要研究以下两个问题：

多源融合（激光雷达和相机图像）
位置和分类置信度之间不一致

多源融合：在没有任何图像标注的逐点行为中，增强语义图像特征的点特征

位置和分类置信度之间的不一致：损失函数

Introduction

使用多源融合的两种方式：

在不同阶段使用不同传感器构成级联；
多个传感器输入的联合推理，融合方法；

级联方法的不足：

不同传感器之间无法互补，它们的表现受限于每个阶段；
融合方法的不足：

从全景投射和体素的角度上生成俯瞰图数据，不可避免地引起信息损失。

而且它们仅仅是在体素特征和语义图像特征之间建立了一个相对粗糙的对应关系

LI-Fusion method的优势：

通过更简单的管道而不是复杂的BEV数据生成构建了激光雷达和相机图像之间的逐点的更细化的对应关系；
没有信息损失情况下保持原始几何结构；
解决由相机图像带来的推理信息的问题；
不同于之前的方法，没有使用图像标注，如2D bounding box标注。

consistency enforce loss

我们提出了一致性强迫损失，该方法激励高分类自信度拥有与真实部分较大的重合，反之亦然。

该方法的两个优势：

方案便于实施，没有改动检测网络的结构；
该方案不会增加额外的参数和额外的推理时间；

Method

Two-Stream RPN

EPnet-struc
Image Steam:

2D Convolution Block的结构：

conv 3x3;
conv 3x3 stride=2;
BN;
ReLU;

F_i(i =1,2,3,4)卷积块的输出；
F_U 表示不同F_i的合并后的结果。

Geometric Stream:

由4个Set Abtraction（Si）和Feature Propogation layers(Pi)构成的。

每个Si和Fi通过Li-Fusion模型融合，P4与FU融合，得到有判别力的特征。然后喂给检测头，为点分割和3D 候选区域生成提供数据。

LI-Fusion Module:

EPnet-LiFusion

首先通过Grid Generator生成激光点云到图像特征的相关性。比如：

激光点云的一个点p(x, y, z)和图像的一个像素p’(x’, y’)。则对应的关系为：
$\\ M.shape = (3,4)$
通过grid generator得到激光雷达点云和图像特征的对应关系。

之后利用image sampler获得每个点的语义特征：
$V^{(p)} = K(F^{(N(p'))})$
image sampler将采样位置p’和图像特征映射F作为输入，对于每个采样位置，生成一个逐点的图像特征展示V。其中，K表示双线性插值方程，

$F^{(N(p'))}$ 表示每个采样位置p‘相邻像素的图像特征。

为了解决逐点像素的干涉信息，引入了LiDAR-guided fusion layer.

首先分别对激光特征F_P和图像特征F_I进行全连接处理，将它们投射到相同的通道上，之后再相加，构成一个紧实的特征展示，之后再压缩成权重映射w。
$\sigma(Wtanh(UF_P+VF_I))$
其中W，U和V是可学习的参数。

之后再将FI和权重相乘，再与FP合并，得到FLI。
$F_{LI} = F_P||wF_I$

其代码实现过程如下：
EPnet-LiFusion2

    def forward(self, pointcloud: torch.cuda.FloatTensor, image=None, xy=None):
        xyz, features = self._break_up_pc(pointcloud)  # (N, 16384, 3), None
        # print("*"*80)
        # print('pn_xyz.shape: ', xyz.shape)  # [1, 16384, 3]
        # print('pn_features.shape: ', features)  # None
        # print('xy.shape: ', xy.shape)  # [1, 16384, 2]

        l_xyz, l_features = [xyz], [features]

        if cfg.LI_FUSION.ENABLED:
            #### normalize xy to [-1,1]
            size_range = [1280.0, 384.0]
            xy[:, :, 0] = xy[:, :, 0] / (size_range[0] - 1.0) * 2.0 - 1.0
            xy[:, :, 1] = xy[:, :, 1] / (size_range[1] - 1.0) * 2.0 - 1.0  # = xy / (size_range - 1.) * 2 - 1.
            l_xy_cor = [xy]
            img = [image]

        for i in range(len(self.SA_modules)):
            li_xyz, li_features, li_index = self.SA_modules[i](l_xyz[i], l_features[i])
            # print("*"*80)
            # print('li_xyz.shape: ', li_xyz.shape)  # SA1: [1, 4096, 3], SA2: [1, 1024, 3], SA3: [1, 256, 3], SA4: [1, 64, 3]
            # print('li_features.shape: ', li_features.shape)  # SA1: [1, 96, 4096], SA2: [1, 256, 1024], SA3: [1, 512, 256], SA: [1, 1024, 64]

            if cfg.LI_FUSION.ENABLED:
                li_index = li_index.long().unsqueeze(-1).repeat(1,1,2)
                # print("*"*80)
                # print(len(l_xy_cor))
                # for j in range(len(l_xy_cor)):
                #     print('l_xy_cor.shape: ', l_xy_cor[j].shape)  # SA1: [1, 16384, 2], SA2: [1, 4096, 2], SA3: [1, 1024, 2], SA4: [1, 256, 2]
                li_xy_cor = torch.gather(l_xy_cor[i],1,li_index)
                # print('li_index.shape: ', li_index.shape)  # SA1: [1, 4096, 2], SA2: [1, 1024, 2], SA3: [1, 256, 2], SA4: [1, 64, 2]
                # print('l_xy_cor.shape: ', l_xy_cor[i].shape)   # SA1: [1, 16384, 2], SA2: [1, 4096, 2], SA3: [1, 1024, 2], SA4: [1, 256, 2]
                image = self.Img_Block[i](img[i])
                # print('*'*80)
                # print(image.shape)  # SA1: [1, 64, 192, 640] SA2: [1, 128, 96, 320], SA3: [1, 256, 48, 160], SA4: [1, 512, 24, 80]
                img_gather_feature = Feature_Gather(image,li_xy_cor) #, scale= 2**(i+1))  (B,C,N)

                li_features = self.Fusion_Conv[i](li_features,img_gather_feature)
                l_xy_cor.append(li_xy_cor)
                img.append(image)

            l_xyz.append(li_xyz)
            l_features.append(li_features)
            
            
def Feature_Gather(feature_map, xy):
    """
    :param xy:(B,N,2)  normalize to [-1,1]
    :param feature_map:(B,C,H,W)
    :return:
    """

    # use grid_sample for this.
    # xy(B,N,2)->(B,1,N,2)
    xy = xy.unsqueeze(1)

    interpolate_feature = grid_sample(feature_map, xy)  # (B,C,1,N)

    return interpolate_feature.squeeze(2) # (B,C,N)

## LI-Fusion Layer
self.Fusion_Conv = nn.ModuleList()
self.Fusion_Conv.append(
                        Atten_Fusion_Conv(cfg.LI_FUSION.IMG_CHANNELS[i + 1], cfg.LI_FUSION.POINT_CHANNELS[i],
                                          cfg.LI_FUSION.POINT_CHANNELS[i]))

class Atten_Fusion_Conv(nn.Module):
    def __init__(self, inplanes_I, inplanes_P, outplanes):
        super(Atten_Fusion_Conv, self).__init__()

        self.IA_Layer = IA_Layer(channels = [inplanes_I, inplanes_P])
        # self.conv1 = torch.nn.Conv1d(inplanes_P, outplanes, 1)
        self.conv1 = torch.nn.Conv1d(inplanes_P + inplanes_P, outplanes, 1)
        self.bn1 = torch.nn.BatchNorm1d(outplanes)


    def forward(self, point_features, img_features):
        # print(point_features.shape, img_features.shape)

        img_features =  self.IA_Layer(img_features, point_features)
        #print("img_features:", img_features.shape)

        #fusion_features = img_features + point_features
        fusion_features = torch.cat([point_features, img_features], dim=1)
        fusion_features = F.relu(self.bn1(self.conv1(fusion_features)))

        return fusion_features

Refinement Network

Consistency Enforcing Loss

$L_{ce} = -\log(c * \frac{Area(D\cap G)}{Area(D \cup G)})$

Overall Loss Function

$L_{total} = L_{rpn} + L_{rcnn}\\ L_{rpn} = L_{cls} + L_{reg} + \lambda L_{cf} \\ L_{cls} = -\alpha(1-c_t)^{\gamma}\log{c_t}\\ L_{reg} = \Sigma_{\mu \in x,z,\theta}E(b_{\mu}, \hat b_{\mu}) + \Sigma_{\mu \in x,y,z,h,w,l,\theta}S(r_{\mu}, \hat r_{\mu})$