点云3D检测篇三：SECOND

最新推荐文章于 2025-04-07 19:29:03 发布

hunter@@

最新推荐文章于 2025-04-07 19:29:03 发布

阅读量1.7k

点赞数 28

文章标签：深度学习人工智能

本文链接：https://blog.csdn.net/weixin_50206562/article/details/140275772

版权

论文地址：SECOND: Sparsely Embedded Convolutional Detection

代码地址：GitHub - traveller59/second.pytorch: SECOND for KITTI/NuScenes object detection

一、引言

Second稀疏嵌入卷积检测算法是点云体素化检测的又一篇重要工作，与2017年以前大多将点云转换为2D的BEV或前视图表示，丢失了大量的空间信息的3D检测方案不同，Second的整体架构还是基于Voxelnet的点云体素化解决方案，同时考虑到 Voxelnet 推理速度慢和方向估计性能较差的问题，再此基础上进行了一系列改进与创新。

主要创新点总结如下：

（1）提出了一种改进的稀疏3D卷积方法，使其运行更快。

（2） 提出了一种新颖的角度损失回归方法，表现出比其他方法更好的方向回归性能。

（3）引入了一种新颖的数据增强方法，适用于仅使用激光雷达进行学习的问题，大大提高了收敛速度和性能。

注意：Second是在Voxelnet的基础上进行了3个经典创新：改进卷积结构、改进数据增强、增加损失函数。

具体步骤：

（1）使用点云体素化模块 Point_to_voxel 对输入的原始点云 Point Cloud 进行体素化(网格化)处理，得到输出的 voxel 体素块。

（2）使用体素特征编码（voxel feature encoding VFE）层提取体素特征。

（3）根据(2)得到的结果，使用改进的稀疏3D卷积方法，对VFE层提取的体素特征进行卷积。

（4）将(3)得到的体素卷积特征，送给RPN层进行cls-head和reg-head输出。

二、pipeline

2.1 Voxelwise feature extractor 体素生成和体素特征提取

2.1.1 原理：

体素化 point_to_voxel 函数的作用是将点云数据转化为体素，转化方法为：首先根据cfg中预先设定的体素数量设置一个缓冲区，即指定大小的一个tensor，初始化为zero，然后遍历点云计算点云分别属于哪个体素，记录所属的体素坐标和每个体素的点数。

体素特征编码（voxel feature encoding VFE）层以同一体素中的所有点的数据作为输入，并使用由线性连接层，批归一化（BatchNorm）层和激活函数层（ReLU）组成的完全连接网络（FCN）来提取逐体素提取特征。然后，它使用逐体素最大池来获取每个体素的局部聚合特征。最后，将获得的体素局部特征平铺，并将这些平铺的体素局部要素和每个点的特征拼接在一起。再然后使用一个FCN（cout）将输入这些逐点的特征转换为cout维的输出特征，FCN（cout）也是一个Linear-BatchNorm-ReLU层。总体而言，三维特征提取器由几个VFE层和一个FCN层组成。

注意：这里的体素化过程是由c++封装的，来源于spconv库的points_to_voxel_3d函数，具体原理与实现过程建议去参考本人的Voxelnet。

Second其实没有直接使用VFE层，而是使用SimpleVoxel进行代替，作者认为其与VFE层有着一样的作用，这里提取特征的时候使用的是SimpleVoxel层进行替代。

2.1.2 代码：

class SimpleVoxel(nn.Module):
    def __init__(self,
                 num_input_features=4,
                 use_norm=True,
                 num_filters=[32, 128],
                 with_distance=False,
                 voxel_size=(0.2, 0.2, 4),
                 pc_range=(0, -40, -3, 70.4, 40, 1),
                 name='VoxelFeatureExtractor'):
        super(SimpleVoxel, self).__init__()
        self.name = name
        self.num_input_features = num_input_features

    def forward(self, features, num_voxels, coors):
        # features: [concated_num_points, num_voxel_size, 3(4)]    # [128108,5,4]
        # num_voxels: [concated_num_points]
        points_mean = features[:, :, :self.num_input_features].sum(dim=1, keepdim=False) / num_voxels.type_as(features).view(-1, 1)
        return points_mean.contiguous()     # [128108,4]

def points_to_voxel(points,
                    voxel_size,
                    coors_range,
                    coor_to_voxelidx,
                    max_points=35,
                    max_voxels=20000,
                    full_mean=False,
                    block_filtering=True,
                    block_factor=1,
                    block_size=8,
                    height_threshold=0.2,
                    height_high_threshold=3.0,
                    pad_output=False):
    """convert 3d points(N, >=3) to voxels. This version calculate
    everything in one loop. now it takes only 0.8ms(~6k voxels) 
    with c++ and 3.2ghz cpu.

    Args:
        points: [N, ndim] float tensor. points[:, :3] contain xyz points and
            points[:, 3:] contain other information such as reflectivity.
        voxel_size: [3] list/tuple or array, float. xyz, indicate voxel size
        coors_range: [6] list/tuple or array, float. indicate voxel range.
            format: xyzxyz, minmax
        coor_to_voxelidx: int array. used as a dense map.
        max_points: int. indicate maximum points contained in a voxel.
        max_voxels: int. indicate maximum voxels this function create.
            for voxelnet, 20000 is a good choice. you should shuffle points
            before call this function because max_voxels may drop some points.
        full_mean: bool. if true, all empty points in voxel will be filled with mean
            of exist points.
        block_filtering: filter voxels by height. used for lidar point cloud.
            use some visualization tool to see filtered result.
    Returns:
        voxels: [M, max_points, ndim] float tensor. only contain points.
        coordinates: [M, 3] int32 tensor. zyx format.
        num_points_per_voxel: [M] int32 tensor.
    """
    if full_mean:
        assert block_filtering is False
    if not isinstance(voxel_size, np.ndarray):
        voxel_size = np.array(voxel_size, dtype=points.dtype)
    if not isinstance(coors_range, np.ndarray):
        coors_range = np.array(coors_range, dtype=points.dtype)
    voxelmap_shape = (coors_range[3:] - coors_range[:3]) / voxel_size
    voxelmap_shape = tuple(np.round(voxelmap_shape).astype(np.int32).tolist())
    voxelmap_shape = voxelmap_shape[::-1]
    num_points_per_voxel = np.zeros(shape=(max_voxels, ), dtype=np.int32)
    voxels = np.zeros(shape=(max_voxels, max_points, points.shape[-1]),dtype=points.dtype)
    voxel_point_mask = np.zeros(shape=(max_voxels, max_points),dtype=points.dtype)
    coors = np.zeros(shape=(max_voxels, 3), dtype=np.int32)
    res = {
        "voxels": voxels,
        "coordinates": coors,
        "num_points_per_voxel": num_points_per_voxel,
        "voxel_point_mask": voxel_point_mask,
    }
    if full_mean:
        means = np.zeros(shape=(max_voxels, points.shape[-1]),dtype=points.dtype)
        voxel_num = points_to_voxel_3d_np_mean(points, voxels,
                                               voxel_point_mask, means, coors,
                                               num_points_per_voxel,
                                               coor_to_voxelidx,
                                               voxel_size.tolist(),
                                               coors_range.tolist(),
                                               max_points, max_voxels)
    else:
        if block_filtering:
            block_shape = [*voxelmap_shape[1:]]
            block_shape = [b // block_factor for b in block_shape]
            mins = np.full(block_shape, 99999999, dtype=points.dtype)
            maxs = np.full(block_shape, -99999999, dtype=points.dtype)
            voxel_mask = np.zeros((max_voxels, ), dtype=np.int32)
            voxel_num = points_to_voxel_3d_with_filtering(
                points, voxels, voxel_point_mask, voxel_mask, mins, maxs,
                coors, num_points_per_voxel, coor_to_voxelidx,
                voxel_size.tolist(), coors_range.tolist(), max_points,
                max_voxels, block_factor, block_size, height_threshold,
                height_high_threshold)
            voxel_mask = voxel_mask.astype(np.bool_)
            coors_ = coors[voxel_mask]
            if pad_output:
                res["coordinates"][:voxel_num] = coors_
                res["voxels"][:voxel_num] = voxels[voxel_mask]
                res["voxel_point_mask"][:voxel_num] = voxel_point_mask[
                    voxel_mask]

                res["num_points_per_voxel"][:voxel_num] = num_points_per_voxel[
                    voxel_mask]
                res["coordinates"][voxel_num:] = 0
                res["voxels"][voxel_num:] = 0
                res["num_points_per_voxel"][voxel_num:] = 0
                res["voxel_point_mask"][voxel_num:] = 0
            else:
                res["coordinates"] = coors_
                res["voxels"] = voxels[voxel_mask]
                res["num_points_per_voxel"] = num_points_per_voxel[voxel_mask]
                res["voxel_point_mask"] = voxel_point_mask[voxel_mask]
            voxel_num = coors_.shape[0]
        else:
            voxel_num = points_to_voxel_3d_np(points, voxels, voxel_point_mask,
                                              coors, num_points_per_voxel,
                                              coor_to_voxelidx,
                                              voxel_size.tolist(),
                                              coors_range.tolist(), max_points,
                                              max_voxels)
    res["voxel_num"] = voxel_num
    res["voxel_point_mask"] = res["voxel_point_mask"].reshape(
        -1, max_points, 1)
    return res

2.2 Sparse convolutional middle layer 稀疏3D卷积层

2.2.1 原理：

2.2.1.1 问题定义

点云数据与传统的图像数据不同，具有较强的稀疏性，无法使用标准的卷积神经网络进行特征提取，如图2所示。同理，考虑到2D任务中如果只处理一部分像素，标准卷积的效果也不好，需要使用2D的稀松卷积，因此本小节就从2D稀疏卷积出发，介绍一下稀疏卷积的原理，大家可以自行将其拓展到3D稀疏卷积中去，其实就多了一个深度信息D。

由上，本小节考虑一个简单的2D稀疏卷积问题来进行讲解。

输入数据： 定义一个 3 通道的 5 × 5 图像。除了对应位置点 P1 和 P2 之外，所有像素都是(0, 0, 0)。输入张量的形状按 [N,C,H,W] 顺序为 [1,3,5,5]。在稀疏形式下，[ P1,P2 ]数据列表为 [[0.1, 0.1, 0.1], [0.2, 0.2, 0.2]] ，索引列表为 [ [1,2], [2, 3] ]，如图3的左图

卷积核：定义一个3X3的卷积核，步长 stride 为1，padding 为 0。如图3的右图。

输出数据： 稀疏卷积的输出与传统卷积有很大不同。稀疏卷积有两种输出定义。一种是Sparse output definition，就像普通卷积一样，只要核覆盖一个输入点就计算输出点。另一种称为Submanifold output definition。只有当核中心覆盖输入站点时，才会计算卷积输出。

5×5 输入图像， 3×3 卷积核，stride=1，padding=0，输出张量的尺寸为 3×3 。Sparse output definition结果为图4左侧，例如 (0,0) 位置为A1，表示该位置的结果只与输入图像中的P1有关， (0,1) 位置为A1A2，表示该位置结果与P1、P2都有关。Submanifold output definition结果为图4右侧，只有A1和A2有响应。

2.2.1.2 计算实现

2.2.1.2.1 建立哈希表

第一步：根据 输入张量 和 输出张量 建立 序号-坐标哈希表，（以Sparse output definition为例）

首先建立输入哈希表 $H a s h_{i n}$ ，表中 $k e y_{i n}$ 表示输入像素的坐标， $v_{i n}$ 表示序号，每一行表示一个activate input sites。那么对于P1输入来说，output sites for key =0 , value =(2,1) ，输出张量中与P1输入相关的像素点有6个A1位置，将这6个点的位置坐标记作 $P_{\text {out }}$ 。通过 $P_{\text {out }}$ 建立哈希表 $H a s h_{\text {out }}$ ， $k e y_{o u t}$ 表示输出张量中的坐标， $v_{\text {out }}$ 同样表示序号。

同理处理第二个 P2 , output sites for key = 1 ,value =(3,2) ，输出张量中与 P2 输入相关的像素点也有 6 个 A2 位置。这时构建 $P_{\text {out }}$ ，发现有一部分的坐标是重复的，重复的我们不管它，继续写上之前没有的即 { 6, (1,2) }, { 7, (2,2) } 。

2.2.1.2.2 建立RuleBook

RuleBook定义

什么是Rulebook? 本质上来说就是一个表。2.2.1.2.1建立输入、输出的哈希表，分别将输入、输出的张量坐标映射到序号。现在要将输入、输出的哈希表中的序号建立起联系，这样就可以基本实现了稀疏卷积，因此这也是稀疏卷积实现的关键。

RuleBook构建方法

（1）总体流程

（2）从 $P_{\text {out }}$ 到 GetOffset()

如下图所示， 5×5 的输入图像经过 3×3 的卷积核输出 3×3 的output。以output中（0，0）位置为例，该点的值是input左上角的 3×3 橙色窗口卷积得到，在这个橙色的窗口中只有右侧P1位置非零，其余位置均为零。那么这次卷积操作只需要通过这个位置的卷积权重和输入值计算得到。P1位置对应到卷积核中的位置就是（1，0）。把这个（1，0）放入GetOffset()结果中。

注意：上面公式是在stride=1,padding=0的情况下的。一句话来说，GetOffset()就是用于找出output中某位置需要用卷积核中的那个weight来计算。

（3）从 GetOffset() 到 Rulebook

既然要完成卷积，上一步记录了卷积核权重的位置，那么这一步就需要记录对应的输入像素值，然后计算完了放哪里。如图9所示，可以看到Rulebook中红色方框为上一步记录的卷积核权重位置，橙色方框为输入像素值的输入序号，绿色方框为卷积结果对应的输出序号。

2.2.1.2.3 稀疏卷积的计算实现

稀疏卷积实现时是通过查询Rulebook，因为可以通过GPU并行实现，因此效率比较高。

以Rulebook第一行红色方框为例，首先通过（-1，-1）找到卷积核权重F0；其次，根据输入像素序号，查找输入哈希表找到对应的tensor向量（0.1，0.1，0.1）；

然后，需要注意到的是下图中我们可以看到红色和蓝黑色的两个方框输出结果对应的序号都是5，意味着他们的输出结果在同一位置，是需要累加的。图9中Output Sparse Tensor尺寸为 9×2 是因为9为输出热力图 3×3 ，2表示输出两通道。

最后，完成计算后再根据输出序号，找出行列坐标，放到输出tensor的对应位置。

2.2.2 代码：

这里的spconv.SparseConv3d和spconv.SubMConv3d函数中就封装了2.2.1中所讲解的计算过程。

class SpMiddleFHD(nn.Module):
    def __init__(self,
                 output_shape,
                 use_norm=True,
                 num_input_features=128,
                 num_filters_down1=[64],
                 num_filters_down2=[64, 64],
                 name='SpMiddleFHD'):
        super(SpMiddleFHD, self).__init__()
        self.name = name
        if use_norm:
            BatchNorm2d = change_default_args(eps=1e-3, momentum=0.01)(nn.BatchNorm2d)
            BatchNorm1d = change_default_args(eps=1e-3, momentum=0.01)(nn.BatchNorm1d)
            Conv2d = change_default_args(bias=False)(nn.Conv2d)
            SpConv3d = change_default_args(bias=False)(spconv.SparseConv3d)
            SubMConv3d = change_default_args(bias=False)(spconv.SubMConv3d)
            ConvTranspose2d = change_default_args(bias=False)(nn.ConvTranspose2d)
        else:
            BatchNorm2d = Empty
            BatchNorm1d = Empty
            Conv2d = change_default_args(bias=True)(nn.Conv2d)
            SpConv3d = change_default_args(bias=True)(spconv.SparseConv3d)
            SubMConv3d = change_default_args(bias=True)(spconv.SubMConv3d)
            ConvTranspose2d = change_default_args(bias=True)(nn.ConvTranspose2d)
        sparse_shape = np.array(output_shape[1:4]) + [1, 0, 0]
        # sparse_shape[0] = 11
        print(sparse_shape)
        self.sparse_shape = sparse_shape
        self.voxel_output_shape = output_shape
        # input: # [1600, 1200, 41]
        self.middle_conv = spconv.SparseSequential(
            SubMConv3d(num_input_features, 16, 3, indice_key="subm0"),
            BatchNorm1d(16),
            nn.ReLU(),
            SubMConv3d(16, 16, 3, indice_key="subm0"),
            BatchNorm1d(16),
            nn.ReLU(),
            SpConv3d(16, 32, 3, 2, padding=1),  # [1600, 1200, 41] -> [800, 600, 21]
            BatchNorm1d(32),
            nn.ReLU(),
            SubMConv3d(32, 32, 3, indice_key="subm1"),
            BatchNorm1d(32),
            nn.ReLU(),
            SubMConv3d(32, 32, 3, indice_key="subm1"),
            BatchNorm1d(32),
            nn.ReLU(),
            SpConv3d(32, 64, 3, 2,padding=1),  # [800, 600, 21] -> [400, 300, 11]
            BatchNorm1d(64),
            nn.ReLU(),
            SubMConv3d(64, 64, 3, indice_key="subm2"),
            BatchNorm1d(64),
            nn.ReLU(),
            SubMConv3d(64, 64, 3, indice_key="subm2"),
            BatchNorm1d(64),
            nn.ReLU(),
            SubMConv3d(64, 64, 3, indice_key="subm2"),
            BatchNorm1d(64),
            nn.ReLU(),
            SpConv3d(64, 64, 3, 2, padding=[0, 1, 1]),  # [400, 300, 11] -> [200, 150, 5]
            BatchNorm1d(64),
            nn.ReLU(),
            SubMConv3d(64, 64, 3, indice_key="subm3"),
            BatchNorm1d(64),
            nn.ReLU(),
            SubMConv3d(64, 64, 3, indice_key="subm3"),
            BatchNorm1d(64),
            nn.ReLU(),
            SubMConv3d(64, 64, 3, indice_key="subm3"),
            BatchNorm1d(64),
            nn.ReLU(),
            SpConv3d(64, 64, (3, 1, 1), (2, 1, 1)),  # [200, 150, 5] -> [200, 150, 2]
            BatchNorm1d(64),
            nn.ReLU(),
        )
        self.max_batch_size = 6
        # self.grid = torch.full([self.max_batch_size, *sparse_shape], -1, dtype=torch.int32).cuda()

    def forward(self, voxel_features, coors, batch_size):
        # coors[:, 1] += 1
        coors = coors.int()     # [123862,4]
        ret = spconv.SparseConvTensor(voxel_features, coors, self.sparse_shape, batch_size)       # [123862,4]
        # t = time.time()
        # torch.cuda.synchronize()
        ret = self.middle_conv(ret)         # [57238,64]
        # torch.cuda.synchronize()
        # print("spconv forward time", time.time() - t)
        ret = ret.dense()                   # [8,64,2,200,176]

        N, C, D, H, W = ret.shape
        ret = ret.view(N, C * D, H, W)      # [8,128,200,176]
        return ret

2.3 Rule Generation Algorithm

2.3.1 原理：

Second中对2.2的稀疏卷积方法进行了规则生成改进，改进了基于CPU的、使用哈希表的规则生成算法，但这种算法速度很慢，并且需要在CPU和GPU之间传输数据。

因此作者设计了一种基于GPU的规则生成算法，该算法在GPU上运行得更快。首先，收集输入索引及其相关的空间索引，而不是输出索引（2.3.2中的第一个循环）。在这个阶段会获得重复的输出位置。然后，在空间索引数据上执行一个独特的并行算法，以获取输出索引及其相关联的空间索引。根据之前的结果，生成一个与稀疏数据具有相同空间维度的缓冲区，以便在下一步进行表查询（2.3.2中的第二个循环）。最后，遍历这些规则，并利用存储的空间索引来为每个输入索引获取输出索引（2.3.2中的第三个循环）。

2.3.2 伪代码：

2.4 Region Proposal Network区域候选网络

2.4.1 原理：

RPN在Faster R-CNN中第一次提出来，简单来说就是SECOND网络里面的anchor，在每一个框或者体素上提前生成一个预选框/锚框/候选区域，网络的输出视为对候选区域的偏移和缩扩，再将预选框和网络输出的结合，经过阈值的筛选，得到整个网络目标检测的结果。

RPN网络从结构上来说，异常的简单，也都是使用是2D卷积结构，之后再分层进行了一个concat拼接，最后再接上cls-head和reg-head进行分类和回归输出。

注意：这里不同于Voxelnet中的RPN中包含2D卷积和3D卷积的做法，Second的PRN结构将2D卷积和3D卷积分开了，只保留了2D卷积结构。

2.4.2 代码：

class RPNBase(RPNNoHeadBase):
    def __init__(self,
                 use_norm=True,
                 num_class=2,
                 layer_nums=(3, 5, 5),
                 layer_strides=(2, 2, 2),
                 num_filters=(128, 128, 256),
                 upsample_strides=(1, 2, 4),
                 num_upsample_filters=(256, 256, 256),
                 num_input_features=128,
                 num_anchor_per_loc=2,
                 encode_background_as_zeros=True,
                 use_direction_classifier=True,
                 use_groupnorm=False,
                 num_groups=32,
                 box_code_size=7,
                 num_direction_bins=2,
                 name='rpn'):
        """upsample_strides support float: [0.25, 0.5, 1]
        if upsample_strides < 1, conv2d will be used instead of convtranspose2d.
        """
        super(RPNBase, self).__init__(
            use_norm=use_norm,
            num_class=num_class,
            layer_nums=layer_nums,
            layer_strides=layer_strides,
            num_filters=num_filters,
            upsample_strides=upsample_strides,
            num_upsample_filters=num_upsample_filters,
            num_input_features=num_input_features,
            num_anchor_per_loc=num_anchor_per_loc,
            encode_background_as_zeros=encode_background_as_zeros,
            use_direction_classifier=use_direction_classifier,
            use_groupnorm=use_groupnorm,
            num_groups=num_groups,
            box_code_size=box_code_size,
            num_direction_bins=num_direction_bins,
            name=name)

        self._num_anchor_per_loc = num_anchor_per_loc
        self._num_direction_bins = num_direction_bins
        self._num_class = num_class
        self._use_direction_classifier = use_direction_classifier
        self._box_code_size = box_code_size

        if encode_background_as_zeros:
            num_cls = num_anchor_per_loc * num_class
        else:
            num_cls = num_anchor_per_loc * (num_class + 1)
        if len(num_upsample_filters) == 0:
            final_num_filters = self._num_out_filters
        else:
            final_num_filters = sum(num_upsample_filters)
        self.conv_cls = nn.Conv2d(final_num_filters, num_cls, 1)
        self.conv_box = nn.Conv2d(final_num_filters, num_anchor_per_loc * box_code_size, 1)
        if use_direction_classifier:
            self.conv_dir_cls = nn.Conv2d(final_num_filters, num_anchor_per_loc * num_direction_bins, 1)

    def forward(self, x):
        res = super().forward(x)        # [8,128,200,176]
        x = res["out"]                  # [8,128,200,176]
        box_preds = self.conv_box(x)        # [8,14,200,176]
        cls_preds = self.conv_cls(x)        # [8,2, 200,176]
        # [N, C, y(H), x(W)]
        C, H, W = box_preds.shape[1:]   # 14,200,176
        box_preds = box_preds.view(-1, self._num_anchor_per_loc,self._box_code_size, H, W).permute(0, 1, 3, 4, 2).contiguous()  # [8,2,200,176,7]
        cls_preds = cls_preds.view(-1, self._num_anchor_per_loc,self._num_class, H, W).permute(0, 1, 3, 4, 2).contiguous()      # [8,2,200,176,1]
        # box_preds = box_preds.permute(0, 2, 3, 1).contiguous()
        # cls_preds = cls_preds.permute(0, 2, 3, 1).contiguous()

        ret_dict = {"box_preds": box_preds,"cls_preds": cls_preds,}
        if self._use_direction_classifier:
            dir_cls_preds = self.conv_dir_cls(x)
            dir_cls_preds = dir_cls_preds.view(-1, self._num_anchor_per_loc, self._num_direction_bins, H, W).permute(0, 1, 3, 4, 2).contiguous()
            # dir_cls_preds = dir_cls_preds.permute(0, 2, 3, 1).contiguous()
            ret_dict["dir_cls_preds"] = dir_cls_preds
        return ret_dict

2.5 Data Augmentation数据增强方案

Second的数据增强方案十分有意思，在沿用了Voxelnet的点云随机旋转和缩放之后，还采用了工业界常用的贴图方案，这部分虽然没有在源码中找到对应出处，这边就简单讲一下自己的理解吧。

Second的数据增强方案的本质其实就是3D贴图，与常见处理2D检测中样本缺失的问题一样，为了减少误报漏报的情况，2D检测中一般会采用抠图+贴图的方式。举一个简单的例子来说，我需要训练一个火焰烟雾检测器，但是在正常情况下，不可能会出现火焰的情况，也就是说你的场景视频切片样本中没有火焰的照片，那么在工业界的一个比较常见的解决方案就是从网上找一些火焰照片或者使用GAN等生成器生成一些火焰样本，再将其使用抠图软件贴到视频场景中去，再使用标准标注软件类似 labelimg 进行标注，这样就得到了当前场景下的出现火焰的数据，也就能使用2D检测器进行检测了。

同理，在3D检测场景中，同样会出现样本缺失的问题，因此Second同样才采用了这种思路，用论文中的话来说就是从训练数据集中生成一个包含所有地面真实情况标签及其相关点云数据（地面真实情况的3D边界框内的点）的数据库。然后，在训练过程中，从该数据库中随机选择几个地面真实情况，并通过连接将它们引入到当前训练的点云中，达到模拟不同环境中物体的效果。

注：贴图其实是工业界的一种方法，感觉直接作为公开benchmark上的增强方案有点图巧了，对比实验感觉没有意义，因为结果基本一定会好。