RefineDet论文笔记

最新推荐文章于 2020-10-11 13:03:25 发布

JimmyHHua

最新推荐文章于 2020-10-11 13:03:25 发布

阅读量518

点赞数 1

分类专栏： CV-Detection 文章标签：目标检测论文笔记深度学习

本文链接：https://blog.csdn.net/hua2599313/article/details/88380502

版权

CV-Detection 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

RefineDet 论文笔记

@Jimmy 2019-03-10 15:21:35

文章目录

一、基本信息
二、研究背景
三、创新点
四、实验结果
五、结论与思考
- 5.1 作者结论
- 5.2 思考
参考:

一、基本信息

标题：《Single-Shot Refinement Neural Network for Object Detection》

时间：2018

出版源：CVPR 2018

论文领域：目标检测（Object Detection）

主要链接：

homepage：None
arXiv（Paper）：https://arxiv.org/pdf/1711.06897.pdf
github（Official）：https://github.com/sfzhang15/RefineDet
github（Pytorch）: https://github.com/lzx1413/PytorchSSD

二、研究背景

当前的目标检测网络主要分为两大类：

single-stage：SSD、YOLO、YOLO9000
two-stage：Faster RCNN 、 R-FCN、Mask R-CNN

single-stage通过对位置，比例和长宽进行规则和密度采样来检测对象。two-stage首先选取目标区域，然后对目标分类。single-stage方法速度快，但是检测精度比two-stage低。

问：现在的two-stage方法（Faster R-CNN,R-FCN,FPN等）对比 one-stage方法有哪些优点？

答：1.two-stage克服了样本类别不平衡问题；2.使用两步串联对目标参数进行回归；3.用two-stage的特征来描述目标

Using two-stage structure with sampling heuristics to handle class imbalance; (2) using two-step cascade to
regress the object box parameters; (3) using two-stage features to describe the objects1

因此作者提出了RefineDet方法，同行继承了two-stage和single-stage两者的优点。它有两个模块构成，anchor 细化模块（the anchor refinement module，ARM）和目标检测模块（the object detection module，ODM）。

三、创新点

3.1 概述

网络总体来说就是把ssd这个one stage模型，通过加入top down的方法变成two stage模型。RefineDet有两个互相连接的模块ARM和ODM组成，这两个模块之间通过TCB模块连接。
而且这里的TCB是将不同层次的ARM特征转化为ODM，它这里有一个回传的操作，将高层次的特征通过去卷机操作（实际是一种转置卷积），使特征图之间的尺寸匹配，然后与低层次的特征相加。

针对小目标的识别，作者这里采用了两步级联回归。在ARM中先调整anchor的位置和大小，然后用这种粗略的操作作为ODM的输入，最后ODM再进一步检测和识别物体，这种做法会有更加精确的检测结果。

3.2 详解

3.2.1 ARM

该模块和RPN的思路类似，用于对物体进行初步的分类和回归，且分类不带类别信息（只区分前景和背景），主要作用如下：

过滤掉部分negative anchors，减少搜索空间；这个对结果的提升效果比较明显，作者认为two stages方法之所以效果好，很大一部分原因是因为RPN可以过滤掉极端样本，解决imbalance问题，得到比较好的采样结果
给ODM进一步的分类和回归提供一个比较原始的信息

the ARM is designed to (1)
identify and remove negative anchors to reduce search space
for the classifier, and (2) coarsely adjust the locations and
sizes of anchors to provide better initialization for the subsequent regressor.

大部分机制和SSD比较像，比如Anchor的匹配，overlap阈值选择为0.5。另外，从结构图中可以看到，和SSD类似，这也是一个multi scales的检测框架，作者会在每个scale的feature map上得到n个refined anchor boxes，并将这些boxes的信息传入下一个模块，在传入之前会进行过滤，如果一个refined anchor box的negative confidence，也就是背景类的预测分数大于某个预设的阈值（比如0.99），就排除它，不考虑使用它训练ODM模块，这就是作者说的过滤部分 negative anchors filtering 的功能……相对的是，在inference的时候，如果refined anchor box的negativea confidence比阈值大，也会在ODM中排除它。

比较值得注意的一点是，作者认为two stages的方法会大大提升检测效果，尤其是对小目标的物体，不过原因没有细谈。

3.3.2 ODM和TCB

ODM是什么已经说过了，ODM接受的是ARM经过Hard Negative Mining后的结果（虽然ARM reject了一部分anchor，仍然要做这一步骤，正负比例为1:3）。实际上本文的ODM也没有太多可以说的内容，都是大家熟悉的，主要谈谈TCB；TCB是指把ARM中的特征转化为ODM特征的模块……TCB的实现方式如下图：

应该也是很通俗，看一下就知道怎么做的了。比较值得说的是，TCB同样包含了不同scales的feature map融合的思路，和FPN有一点像，不过它将high semantic的较小feature map增大的方法是通过3x3的deconv实现的而不是upsample.

3.3.3 Two-Step Cascaded Regression

我们使用ARM 来首次调整 anchors 的位置和大小，以便为ODM 中的回归操作提供更好的初始化结果。具体而言，我们将 n 个 anchor boxes 与特定特征图上的每个规则划分的单元相互关联。每个 anchor box 相对于其对应单元的初始位置是固定的。对于每个特征图单元，我们预测经过细化的 anchor boxes相对于原始平铺 anchors 的四个偏移量以及便是这些框中存在前景对象的两个置信度分数。因此，我们可以在每个特征图单元中生成 n 个细化后anchor boxes。

loss函数也是常规的分类softmax和目标检测的框回归smoothL1。损失函数公式如下所示：

Narm和Nodm分别指的是ARM和ODM中正样本anchors的数目
pi指的是预测的anchor i是一个目标的置信度
xi指的是ARM细化后预测的anchor i的坐标
ci是ODM中预测的bbox的物体类别
ti是ODM中预测的bbox坐标
li*是anchor i真实的类别标签
gi*是anchor i真实的位置和大小

四、实验结果

五、结论与思考

5.1 作者结论

In this paper, we present a single-shot refinement neural network based detector, which consists of two interconnected modules, i.e., the ARM and the ODM. The ARM
aims to filter out the negative anchors to reduce search space
for the classifier and also coarsely adjust the locations and
sizes of anchors to provide better initialization for the subsequent regressor, while the ODM takes the refined anchors as
the input from the former ARM to regress the accurate object locations and sizes and predict the corresponding multiclass labels. The whole network is trained in an end-to-end
fashion with the multi-task loss

In the future, we plan to employ RefineDet to detect some
other specific kinds of objects, e.g., pedestrian, vehicle, and
face, and introduce the attention mechanism in RefineDet to
further improve the performance.