【TridentNet】《Scale-Aware Trident Networks for Object Detection》

最新推荐文章于 2021-08-20 16:17:42 发布

bryant_meng

最新推荐文章于 2021-08-20 16:17:42 发布

阅读量302

点赞数 1

分类专栏： CNN / Transformer 文章标签： image pyramid feature pyramid TridentNet receptive field

本文链接：https://blog.csdn.net/bryant_meng/article/details/79865508

版权

CNN / Transformer 专栏收录该内容

213 篇文章 7 订阅

订阅专栏

在这里插入图片描述

ICCV-2019

code ：https://github.com/TuSimple/simpledet

作者 Naiyan Wang 的解读：https://zhuanlan.zhihu.com/p/54334986

1 Background and Motivation

CNN 在 Object Detection 任务中取得很好的效果，可以分为 one-stage 和 two-stage，但是 scale variation is a central issue.

现有的解决方法

multi-scale image pyramids（增加了 inference times）
Feature pyramids（unsatisfactory alternative for image pyramids，因为 the representational power for objects of
different scales still differ, since their features are extracted on different layers in FPN.）

上面两大类方法 share the same motivation that models should have different receptive fields for objects of different scales

作者从感受野的角度出发，提出了 TridentNet！

以下内容节选自作者的自我总结！TridentNet：处理目标检测中尺度变化新思路

我们考虑对于一个detector本身而言，backbone有哪些因素会影响性能。总结下来，无外乎三点：

network depth（structure）
downsample rate
receptive field

对于前两者而言，其影响一般来说是比较明确的，即网络越深（或叫表示能力更强）结果会越好，下采样次数过多对于小物体有负面影响。但是没有工作，单独分离出 receptive field，保持其他变量不变，来验证它对 detector 性能的影响。

所以，我们做了一个验证性实验，分别使用 ResNet50 和 ResNet101 作为 backbone，改变最后一个 stage 中每个 3*3 conv 的dilation rate。通过这样的方法，我们便可以固定同样的网络结构，同样的参数量以及同样的 downsample rate，只改变网络的receptive field。

我们很惊奇地发现，不同尺度物体的检测性能和 dilation rate 正相关！也就是说，更大的 receptive field 对于大物体性能会更好，更小的 receptive field 对于小物体更加友好。于是下面的问题就变成了，我们有没有办法把不同 receptive field 的优点结合在一起呢？

我们的 TridentNet 在原始的 backbone上做了三点变化：

第一点是构造了不同 receptive field 的 parallel multi-branch
第二点是对于 trident block 中每一个branch的 weight 是 share 的
第三点是对于每个 branch，训练和测试都只负责一定尺度范围内的样本，也就是所谓的 scale-aware。

这三点在任何一个深度学习框架中都是非常容易实现的。

题外话，introduction 写的太好了，值得好好的借鉴！！！

2 Advantages / Contributions

first to design controlled experiments to explore receptive field on the object detection task.（大感受野对大目标更好，反之亦然）
提出 Trident Network to deal with scale variation problem
weight-sharing trident-block design
在 COCO benchmark 上验证 effectiveness，实现了 48.4 mAP using a single model with ResNet-101 as backbone

3 Related Work

Deep Object Detectors（two stage, one stage）
Methods for handling scale variation（Multi-scale image pyramid，multi-level features）
Dilated convolution，也叫 akaAtrous convolution

4 Method

4.1 Investigation of Receptive Field

a dilated 3×3 convolution 的感受野如下 3 + 2（ $d_s$ -1），也即 $d_s$ 为 1 就是标准的卷积！

假设图片下采样了 s 倍，modify n conv layers with $d_s$ dilation rate, the receptive field could be increased by $2(d_s - 1)sn$

$d_s$ 是 dilation rate
s 是 downsample rate of current feature map

在这里插入图片描述
可以看出

different scales are influenced by the receptive field of a network
虽然 101 感受野足够大，但是提高 dilation ratio 大目标的检测效果还是有显著的提升，反应了 effective receptive field is smaller than the theoretical receptive field

4.2 Trident Network

4.2.1 weight sharing trident blocks

在这里插入图片描述
we replace the blocks in the last stage of backbone with trident blocks since larger strides lead to larger difference in receptive fields as needed.

4.2.2 Scale-aware Training Scheme

表一可以看出，scale mismatching 会导致性能下降，比如，dilation 比较大的时候，小物体的训练效果不好！Thus, it is natural
to detect objects of different scales on different branches.（这一点的 motivation 来自 SNIP 的）

作者设计了 Scale-aware Training Scheme 来 avoid training objects of extreme scales on mismatched branches.
在这里插入图片描述
具体做法就是定义了 $l_i; u_i]$ for each branch $i$ .

eg：[0; 90], [30; 160] and [90;∞]

4.3 Inference and Approximation

3 个 branch 都预测结果，然后 NMS 或者 soft NMS

fast TridentNet，只用中间的那个branch，性能只下降了一丢丢！

5 Experiments

5.1 Datasets

COCO （具体介绍参考 COCO数据集介绍）

train：trainval 35k（union of 80k images from train and a random 35k subset of images from the 40k image val split）
val：minival split (the remaining 5k images from val).
test：20k test images (test-dev).

$AP_s$ ：less than 32 × 32
$AP_m$ ：32 × 32 ~ 96× 96
$AP_l$ ：greater than 96× 96

The input images are resized to have a short side of 800 pixels.

5.2 Ablation Studies

5.2.1 Components of TridentNet

感觉（c）/（d）是不是写反了
在这里插入图片描述
1）Multi-branch
见图2中的（b），可以看出，都比 baseline（a）高

2）Scale-aware
（d）与（b）比， $AP_s$ 有提升， $AP_l$ 反而下降了！
We conjecture that the scale-aware training design prevents each branch from training objects of extreme scales, but may also bring about over-fitting problem in each branch caused by reduced effective samples.（过拟合了，在训练集上ok，测试集上不行）

3）Weight-sharing
（c）/（e）都用了 weight-sharing，效果都有提升（相比b），weight sharing 后，缓解了（d）的 overfitting 问题，所以 e 相对 c 有提升

5.2.2 Number of branches

在这里插入图片描述
这里没有加 scale-aware training scheme，避免每个 branch 范围的超参数调整！可以看出都比单个 branch 好！3 最好！

5.2.3 Stage of Trident blocks

在这里插入图片描述
可以看出 stage 2 和 stage 3 提升不明显，This is because the strides in conv2 and conv3 feature maps are not large enough to widen the discrepancy of receptive fields between three branches.