VoteNet学习笔记_layer.votinglayer-CSDN博客

本文链接：https://blog.csdn.net/Atriumm/article/details/117218701

传统2D霍夫投票

online

First, given a collection of images with annotated object bounding boxes, a codebook is constructed with stored mappings between image patches (or their features) and their offsets to the corresponding object centers.

用image和gt bbox构建一个codebook，codebook里包含的是image patches（或者features）和它们到物体中心的映射关系。

offline

At inference time, interest points are selected from the image to extract patches around them. These patches are then compared against patches in the codebook to retrieve offsets and compute votes.

推理时，从图像中提出关键点周围的patch，和codebook里的patch一一对照，retrieve出offsets，计算votes。

As object patches will tend to vote in agreement, clusters will form near object centers. Finally, the object boundaries are retrieved by tracing cluster votes back to their generating patches.

投票的clusters会聚集在物体中心周围。最后通过将cluster votes追溯到生成它们的patches来获取物体边界。

-------------------------- 以下为参考论文 Robust Object Detection with Interleaved Categorization and Segmentation （ISM, Implicit Shape Model）

简介：

1. 提取特征点

2. 提取特征点周围patch

3. 提取patch的特征

4. 将相似的patch聚类到一起

5. mean image是cluster center，并存到codebook里

Training：

Shape Representation

ISM(C)=(C, PC),C是按类区分的codebook，PC是codebook entry属于物体哪个部位的空间概率分布。

- 每个entry分布独立，只和物体中心有关

- 非参数化估计

Learning the Shape Model

训练步骤

1. 每张图片提取关键点，在关键点周围提取interest regions，将这些regions放入F集

2. F集聚类，并获得聚类中心（mean image），得到Codebook

3. (cx, cy)是参考scale下的物体中心。每个interest regions和Codebook中所有类目比较相似性，如果大于阈值，那么将interest region到物体中心的距离和interest region的尺度记录到Occurence 里。

推理步骤

1. 提取关键点和patch特征

2. 和codebook里的prototype比较

3. 对于所有matches，使用霍夫变换，每个被激活的entry都基于学到的分布Pc投票给可能的物体中心位置。

4. 在投票空间搜索local maxima

----------------------------------------------------------

为什么用霍夫投票解决点云3D目标检测

1. 基于投票的方法与RPN相比，更适合稀疏集。RPN在物体中心周围生成proposal，这个proposal很可能是空的，导致额外开销。

2. 基于投票的方法基于bottom-up原则，小信息逐渐累积成可信的检测。即使NN可以从大感受野潜在的融合语义信息，但是在投票空间累积可能更好。

Pipeline

1. 兴趣点用NN提取

2. 投票生成用网络而不是密码本。

3. 投票聚合也是训练的。通过利用投票特征，网络可以过滤掉低质量投票

4. 物体proposal 位置、维度、方向、甚至语义class都可以直接从聚合特征中生成，以追溯到vote源头。

VoteNet 结构

两部分：生成投票 + 用虚拟vote提议并分类物体

用点云学习投票

对于Nx3的点云，想要生成M个votes，每个vote包含3D坐标和高维特征。拢共分两步：

1. 学习点云特征：用pointnet++提取M个seed points，每个seed point 特征维度维(3 + C)。全体是Mx(3+C)。每个seed生成一个Vote

2. 用NN做霍夫投票。

在传统霍夫投票里，votes表示与local keypoints的offsets，是通过查codebook得到的。

作者用NN生成Euclidean空间的位置offset delta xi和feature offset delta fi。这样vote vi=[yi; gi]可以用seed si 表示，yi=xi+delta xi, gi=fi+delta fi。

学出来的offset delta xi用regression loss监督

复写1表示seed si是不是在object上，如果不在，那么不计入loss。delta xi星是seed si的位置xi到物体中心距离的ground truth。

由seeds产生的votes彼此之间比seeds更近了，这样让votes涵盖更多物体不同位置的信息。这些富含语义的信息用来聚合vote features来生成object proposal。

Object proposal and classification from votes

votes能从物体不同部位生成用于语义聚合的meeting points。这些votes首先被聚类，然后生成proposal并分类

通过采样和grouping进行聚类

均匀采样+空间相似性group。对于一组votes vi=[yi;gi] (M, 3+C)，作者用FPS采出K个votes，得到vik，然后以这K个点为聚类中心找邻居votes，规则是||vi-vik||<=r。

从vote聚类生成proposal和classification

用PointNet做vote聚合和proposal in cluster，换句话说，用pointnet提取vote cluster的特征。

给定一个vote cluster C={wi}，i属于1....n，wj是center。wi=[zi;hi] zi是vote 位置，hi是vote feature。

首先把vote 位置以vote center为中心归一化，得到zi'=(zi-zj)/r。归一化的location和feature经过一层MLP、一层maxpool、一层MLP得到proposal，proposal包含了objectness score、boudning box parameters( 中心坐标，方向和尺度信息（和F-PointNet类似））和语义分类分数。

Loss Function

包含是物体的概率，bbox预测，和语义分类loss。

作者对在GT object center附近的vote或远离各种center的vote做了监督。因为作者认为，这些votes产生的proposals是positive和negative。其他proposal不管。是物体的概率用交叉熵，通过一个batch中没有忽略的proposa来控制。对于正样本，作者还监督了bbox 预测和类别预测。分类用交叉熵，回归用smooth-L1。

细节

除了XYZ坐标，作者还加了地面的高度信息，通过所有点高度的1%得到。

voting layer最后一个fc输出259维，因为3+256。MLP2最后一层输出5+2NH+4NS+NC(2个objectness scores，3个center regression values, 2个NH heading bins，4*NS个box size regression，和NC个分类。