论文准备之研读论文

最新推荐文章于 2024-04-13 09:53:46 发布

fff_pragrammer

最新推荐文章于 2024-04-13 09:53:46 发布

阅读量2.8k

点赞数

本文链接：https://blog.csdn.net/fff_pragrammer/article/details/107095546

版权

Refinedbox: Refining for fewer and high-quality object proposals

原论文链接：https://www.sciencedirect.com/science/article/pii/S0925231220305816?dgcid=coauthor

从以下几个方面对论文进行研读：

论文解决了什么任务
用了什么模型
主要创新点是什么
在什么数据集上实验
跟谁对比
结论

论文解决了什么问题

We are motivated by the fact that many traditional proposal methods generate dense proposals to cover as many objects as possible but that
i) they usually fail to rank theseproposals properly and
ii) the number of proposals is very large.

object proposal方法的缺陷

不能将候选框正确的排序
产生的候选数量庞大才能保证检测的精度

Hence the sub-optimal proposal sampling strategies make
them difficult to fully leverage the powerful capability of CNNs. As
a result, the number of true objects (e.g. usually less than 10) in
an image is still much smaller than the number of proposals generated by these deep-based methods (e.g. usually a few hundred)

deep-based methods存在的缺陷
次优的候选抽样策略使它们难以充分利用CNN的强大功能，生成的候选数量远远大于实际的对象的数量。

用了什么模型

related work

We broadly divide the related research into four parts: segmentation-based proposal generation methods, edgebased methods, CNN-based methods, and proposal post-processing methods.

segmentation-based proposal generation methods
使用图像分割作为输入，并尝试找到这些图像分割的正确组合以覆盖所有完整的候选框。
这些方法通常结合了一些底层功能（例如显着性，颜色，SIFT等）对边界框进行评分，然后选择分数较高的候选框。
选择性搜索，最流行的产生候选框方法是利用穷举搜索和分割的优势，通过对超像素进行分层合并来获得高质量的候选框。
MCG，介绍一种有效利用多尺度信息的高性能图像分割算法。通过探索组合空间，将生成的区域多尺度层次结构组合为候选框。
edgebased methods
利用观察自然图像中的完整物体通常具有明确的定义封闭边界。近年来，几种有效的算法已经提出使用边缘特征。
级联排序SVM（CSVM）方法，使用梯度特征获得候选框。
BING算法，该算法通过量化CSVM以300fps的速度运行进行一些二进制运算。
基于闭合路径积分的闭合轮廓测量，边缘框根据数量计算客观分数全部包含在每个边界框中的轮廓。
CNN-based methods
利用CNN直接地产生目标候选，例如RPN，DeepMask和SharpMask。
RPN，同时预测对象边界和对象得分在每个位置全图像卷积特征。
DeepMask，经过联合培训有两个目标：给定图像补丁，系统首先输出与类无关的分割模板，然后输出补丁集中在整个对象上的可能性。
SharpMask，候选中增加前馈网络用于对象分割，具有新颖的自上而下的细化方法。由此产生的自下而上/自上而下的体系结构能够有效地生成highly object mask。
proposal post-processing methods
目标在于准确定位图像中的对象。基于已经提出的一个名为DeepBox的小型神经网络，并对它优化改进。

对以上的OP方法进行拓展学习并总结：
在这里插入图片描述
马克斯普朗克研究所（max planck institute，有一项叫What makes for effective detection proposals?的研究，综合分析了各种OP的性能表现！
在复现能力上，作者认为Bing和EdgeBoxes俩算法更好。原因可能是这俩算法都使用了SVM。另外，作者还认为超像素（superpixels）的灵敏度对图像的扰动是一些OP算法复现能力下降的主要原因！
在召回能力上，（什么是召回？召回率Recall：正确的结果有多少被识别出来了，可阅读文章目标检测问题中的“召回率Recall”、“精确率Precision”）
还是直接上结论：
MCG， EdgeBox，SelectiveSearch, Rigor和Geodesic在不同proposal数目下表现都不错
如果只限制小于1000的proposal，MCG,endres和CPMC效果最好
如果一开始没有较好地定位好候选框的位置，随着IoU标准严格，recall会下降比较快的包括了Bing, Rahtu, Objectness和Edgeboxes。其中Bing下降尤为明显。
在AR这个标准下，MCG表现稳定；Endres和Edgeboxes在较少proposal时候表现比较好，当允许有较多的proposal时候，Rigor和SelectiveSearch的表现会比其他要好。
PASCAL和ImageNet上，各个OP方法都是比较相似的，这说明了这些OP方法的泛化性能都不错。
参考自博客Object Proposal（OP）综述

而在本文中，作者建立了一个细化的网络架构来细化现有的边界框，RefinedBox具有最先进的性能来用于对象候选生成评估和对象检测评价。
RefinedBox
在这里插入图片描述

Our method takes the object proposals produced by other proposal generation methods as input and then tries to refine them.The refinement is twofold:
re-ranking and box regression.
To rerank the existing boxes, we recompute the objectness score for each box using the semantic information in the deep neural network.
To obtain the box regression, the network is designed to learn the regressions of the center coordinates, width, and height for each box

基于VGG16 (含13个卷积层和3个全连接层) 构建神经网络，以自然图像和由其他对象候选框生成方法产生的initial boxes作为输入。
主要采用的两个改进方法：

re-ranking 重排列
box regression 边框回归：重新调整候选框的形状和位置，以便更加紧密的覆盖真实物体

逐层分析

Convolutional layer 卷积层：卷积层与内核连接，第13个卷积层后的大小 3x3，通道数减少512->128
ROI Pooling 池化层：将每个初始框区域采样为固定的特征图大小，7×7。ROI将输入的要素图划分为网格具有相同的宽度和高度，并在每个格中进行最大池化
FC全连接层： 512个输出神经元卷积层、全连接层后加上ReLU层（ReLU激活函数）
最后使用ranking和box regression两个分支来进行对象得分 (objectness score) 的重新计算以及对任意initial boxes位置偏移量的获取
ranking 分支：是输出为两个神经元完全连接层，表示是否是目标对象的概率
box reg ：与坐标偏移相关

具体实现的损失函数：

In the training of RefinedBox, each initial box is assigned a binary class label of being an object or not. The loss function can be written as

where p is computed by a softmax over the two outputs of a fully connected layer and u is the label of this box (1 or 0). The box regression layer is a fully connected layer which is designed to learn the coordinate offsets. We perform the parameterizations of four coordinates as following:

where x, y, w, and h represent the coordinates of the box center, width, andheight,respectively.Variables x, xin, and x∗ are for the predicted box, input box, and ground truth box, respectively; similar definitions hold for y, w, and h. Hence variables v is the regression target and t is the predicted tuple.
The box regression loss is defined as

Thus the joint loss function can be written as

in which the parameter λ is a balance parameter, and we set it as 1 in this paper.

joint training

The branch of RefinedBox is designed to refine the initial boxes, then the refined boxes are inputted into the branch of Fast R-CNN for classification.
…
we connect the well-known detection framework, Fast R-CNN, after the convolutional layers as a parallel branch to RefinedBox. The refined proposals produced by the RefinedBox branch are inputted into Fast R-CNN.
…
Fast R-CNN, to evaluate the quality of proposals in object detection. Our experiments demonstrate that our method can generate high-quality proposals for object detection with good efficiency.

将Fast R-CNN作为基础卷积层后与RefinedBox平行的分支，经过RefinedBox精炼后的候选们将会作为Fast R-CNN的输入，以此评估对象检测中候选的质量。

主要创新点是什么

To significantly reduce the number of proposals, we design a computationally lightweight neural network to refine the initial object proposals. The refinement consists of two parallel processes, re-ranking and box regression.
…
To combine the superiority of traditional proposal methods and the powerful representation capability of CNNs, we propose a novel method to re-rank and align existing proposal boxes in a single inference of a neural network.
…
The proposed network can share convolutional features with other high-level tasks by joint training, so the proposal refinement can be very fast.

Goal:we focus on mining the number of proposals while obtaining high detection recall----在保持高检测召回率的前提下减少候选数量

Solvement:轻量级的神经网络来精选最初的目标候选集，且提炼过程很迅速

候选重排 - 根据覆盖完整对象的紧密程度对候选进行排序
边框回归 - 微调候选位置与形状，使其能更紧密地覆盖真实物体

该神经网络能够通过联合训练与其他高层任务共享卷积特征，以提高候选的优化速度。
在这里插入图片描述

在什么数据集上实验

We evaluate the proposed method on two widely used object detection datasets, including PASCAL VOC2007 and MS COCO.

PASCAL VOC2007
由训练集（2501幅训练图像及2510幅验证图像）和测试集（4952幅测试图像）组成，共包含20个种类

`aeroplane
bicycle
bird
boat
bottle
bus
car
cat
chair
cow
diningtable
dog
horse
motorbike
person
pottedplant
sheep
sofa
train
tvmonitor

在VOC2007训练验证集上训练RefinedBox，在VOC2007测试集上进行测试

MS COCO
包含82,783幅训练图像和40,504幅验证图像，91种类别
采用其训练集训练RefinedBox，采用其验证集进行对候选(proposals)进行评估

两种数据集的对比
在这里插入图片描述

跟谁对比

与现有的主流proposal method进行对比。

非基于深度学习的方法
BING, CSVM, Edge Boxes, Endres, GoP, LPO, MCG, Objectness, Rahtu, RandomPrim, Rantalankila, Selective Search
基于深度学习的方法
RPN, DeepBox, DeepMaskZoom, SharpMaskZoom

对比度量：

To evaluate the proposals, we adopt the metrics of object detection recall (DR), mean average best overlap (MABO), and average recall (AR)

DR (对象检测召回率), MABO (所有类别的平均最高重叠率), AR(平均召回率)

对比结果：

Evaluation of different refinement algorithms.
(proposals generated by Edge Boxes as input)
Evaluation results on the PASCAL VOC2007 test set.
(proposals generated by Edge Boxes as input)
Evaluation results on the MS COCO validation set.
(RefinedBox uses RPN as inputs)
Qualitative comparison for object detection using only top 10 proposals.
(proposals generated by Edge Boxes as input)

Compared with other proposal generation methods, RefinedBox can also achieve much higher detection performance. These evaluation results demonstrate that RefinedBox can generate a small amount of proposals with significantly high quality.

在这里插入图片描述

结论

RefinedBox可以有效地精炼给定原始候选集，且因为用于精炼的网络易于被优化的特点，可以与后续的应用进行联合训练
limitation:
当初始的候选数目过多时会降低其效率(RefinedBox的效率与初始的候选数目呈负相关)；识别图片具有太多小的物体时会影响RefinedBox的性能
future work:
在未标注的数据集上最小化候选个数，在更多高级的应用上使用RefinedBox