1.论文概述
论文主要在backbone部分进行了创新
宏观上,如图1所示,设计了一种循环递归金字塔,对backbone与FPN提取出的特征进行了二次提炼。
图1 RFP宏观架构图
展开细节如图2所示,对图片提取两次特征,并且第一次提取的特征会融合到第二次特征的每个阶段以及最终的特征输出,其中第一次融合时,第一次提取的特征会经过一次ASPP再与第二次提取的特征图融合。
图2 RFP展开细节
而在汇聚最终的输出时,如图3所示,会通过sigmoid引入了一种类似于门控的机制,让网络自行决定融合的权重。
图3 输出特征融合
微观上,提出了SAC( Switchable Atrous Convolution)用来替代Bottleneck块中的3x3卷积层,如图4所示,在中间通过1x1卷积生成一个权重S,可以让网络自适应地设置普通卷积与空洞卷积的权重,并且在卷积前后均注入了一个全局特征信息。
图4 SAC结构图
此外,作者在两个空洞卷积上加了一个锁机制,如图5所示,空洞卷积的权重为w + ∆w,其w为来自预训练模型的权重,∆w为空洞卷积新添加的权重,初始化为0。之所以添加这个,是因为预训练网络的Bottleneck块中的3x3卷积是一个普通的卷积层,权重不能直接加载到空洞卷积上,需要把它取出来,然后加到空洞卷积上。如果固定∆w为0,AP将会降低0.1%;而如果不加载w,这意味空洞卷积的初始权重均来自随机初始化,网络很大一部分都要从0开始训练,这将导致AP降低很多。
图5 空洞卷积锁机制
2.代码解读
对openmmlab的mmdetection所实现的代码进行解读
以detectors_cascade-rcnn_r50_1x_coco.py为例
首先看配置文件
_base_ = [
'../_base_/models/cascade-rcnn_r50_fpn.py',
'../_base_/datasets/coco_detection.py',
'../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py'
]
model = dict(
backbone=dict(
type='DetectoRS_ResNet',
conv_cfg=dict(type='ConvAWS'), #为避免训练时NAN,将Conv均替换成ConvAWS
sac=dict(type='SAC', use_deform=True), # 使用带DCN的SAC
stage_with_sac=(False, True, True, True), # 第一个stage不使用
output_img=True),
neck=dict(
type='RFP',
rfp_steps=2, # 特征嵌套传递一次
aspp_out_channels=64, # ASPP的通道数
aspp_dilations=(1, 3, 6, 1), # ASPP的四个卷积分支的空洞大小
# 与上面的backbone相同
rfp_backbone=dict(
rfp_inplanes=256,
type='DetectoRS_ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=True),
norm_eval=True,
conv_cfg=dict(type='ConvAWS'),
sac=dict(type='SAC', use_deform=True),
stage_with_sac=(False, True, True, True),
pretrained='torchvision://resnet50',
style='pytorch')))
检测器整体框架定义是CascadeRCNN,继承自TwoStageDetector(models/detectors/two_stage.py),TwoStageDetector的逻辑为训练时调用loss,然后loss里调用x = self.extract_feat(batch_inputs)提取出图片特征features,然后将图片特征features传递到RPN提取proposals并计算rpn_loss,然后将feature和proposals传递到roi_head中,精修proposals和判断类别,并计算roi_losses
...
def loss(self, batch_inputs: Tensor,
batch_data_samples: SampleList) -> dict:
x = self.extract_feat(batch_inputs)
losses = dict()
# RPN forward and loss
if self.with_rpn:
proposal_cfg = self.train_cfg.get('rpn_proposal',
self.test_cfg.rpn)
rpn_data_samples = copy.deepcopy(batch_data_samples)
# set cat_id of gt_labels to 0 in RPN
for data_sample in rpn_data_samples:
data_sample.gt_instances.labels = \
torch.zeros_like(data_sample.gt_instances.labels)
rpn_losses, rpn_results_list = self.rpn_head.loss_and_predict(
x, rpn_data_samples, proposal_cfg=proposal_cfg)
# avoid get same name with roi_head loss
keys = rpn_losses.keys()
for key in list(keys):
if 'loss' in key and 'rpn' not in key:
rpn_losses[f'rpn_{key}'] = rpn_losses.pop(key)
losses.update(rpn_losses)
else:
assert batch_data_samples[0].get('proposals', None) is not None
# use pre-defined proposals in InstanceData for the second stage
# to extract ROI features.
rpn_results_list = [
data_sample.proposals for data_sample in batch_data_samples
]
roi_losses = self.roi_head.loss(x, rpn_results_list,
batch_data_samples)
losses.update(roi_losses)
return losses
...
detectors的核心改动在self.extract_feat部分
首先是backbone中,重写了Resnet,mmdet/models/backbone/detectors_resnet.py
一个是构造网络时加入了SAC,并且SAC中还使用了DCN
...
for i, num_blocks in enumerate(self.stage_blocks):
stride = self.strides[i]
dilation = self.dilations[i]
dcn = self.dcn if self.stage_with_dcn[i] else None
sac = self.sac if self.stage_with_sac[i] else None
if self.plugins is not None:
stage_plugins = self.make_stage_plugins(self.plugins, i)
else:
stage_plugins = None
planes = self.base_channels * 2**i
res_layer = self.make_res_layer(
block=self.block,
inplanes=self.inplanes,
planes=planes,
num_blocks=num_blocks,
stride=stride,
dilation=dilation,
style=self.style,
avg_down=self.avg_down,
with_cp=self.with_cp,
conv_cfg=self.conv_cfg,
norm_cfg=self.norm_cfg,
dcn=dcn,
sac=sac,
rfp_inplanes=rfp_inplanes if i > 0 else None,
plugins=stage_plugins)
self.inplanes = planes * self.block.expansion
layer_name = f'layer{i + 1}'
self.add_module(layer_name, res_layer)
self.res_layers.append(layer_name)
...
第二个是forward时,除了传递提取出的特征时,将原图也传递到neck
...
def forward(self, x):
"""Forward function."""
outs = list(super(DetectoRS_ResNet, self).forward(x))
if self.output_img:
outs.insert(0, x)
return tuple(outs)
第三个是另写了一个rfp_forward,供neck调用该backbone进行二次特征提取
def rfp_forward(self, x, rfp_feats): """Forward function for RFP.""" if self.deep_stem: x = self.stem(x) else: x = self.conv1(x) x = self.norm1(x) x = self.relu(x) x = self.maxpool(x) outs = [] for i, layer_name in enumerate(self.res_layers): res_layer = getattr(self, layer_name) rfp_feat = rfp_feats[i] if i > 0 else None for layer in res_layer: x = layer.rfp_forward(x, rfp_feat) if i in self.out_indices: outs.append(x) return tuple(outs)
def rfp_forward(self, x, rfp_feats):
"""Forward function for RFP."""
if self.deep_stem:
x = self.stem(x)
else:
x = self.conv1(x)
x = self.norm1(x)
x = self.relu(x)
x = self.maxpool(x)
outs = []
for i, layer_name in enumerate(self.res_layers):
res_layer = getattr(self, layer_name)
rfp_feat = rfp_feats[i] if i > 0 else None
for layer in res_layer:
x = layer.rfp_forward(x, rfp_feat)
if i in self.out_indices:
outs.append(x)
return tuple(outs)
bottlenck的rfp_forward逻辑与下图一致,二次提取的特征经过一个残差conv,然后与经过con1x1的RFP Features一同相加
图6 bottlenck的rfp_forward
...
def rfp_forward(self, x, rfp_feat):
"""The forward function that also takes the RFP features as input."""
def _inner_forward(x):
identity = x
out = self.conv1(x)
out = self.norm1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.norm2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.norm3(out)
if self.downsample is not None:
identity = self.downsample(x)
out += identity
return out
out = _inner_forward(x)
if self.rfp_inplanes:
rfp_feat = self.rfp_conv(rfp_feat)
out = out + rfp_feat
out = self.relu(out)
return out
...
第一次提取的特征经过FPN会先经过一个ASPP再与第二次提取的特征融合
ASPP如图7所示,经过不同的通路然后concat起来,与原始ASPP结构不同,这个不需要直接进行密集预测,所以去掉了concat后的那个1x1卷积层
图7 ASPP结构图
class ASPP(BaseModule):
"""ASPP (Atrous Spatial Pyramid Pooling)
This is an implementation of the ASPP module used in DetectoRS
(https://arxiv.org/pdf/2006.02334.pdf)"""
def __init__(self,
in_channels,
out_channels,
dilations=(1, 3, 6, 1),
init_cfg=dict(type='Kaiming', layer='Conv2d')):
super().__init__(init_cfg)
assert dilations[-1] == 1
self.aspp = nn.ModuleList()
for dilation in dilations:
kernel_size = 3 if dilation > 1 else 1
padding = dilation if dilation > 1 else 0
conv = nn.Conv2d(
in_channels,
out_channels,
kernel_size=kernel_size,
stride=1,
dilation=dilation,
padding=padding,
bias=True)
self.aspp.append(conv)
self.gap = nn.AdaptiveAvgPool2d(1)
def forward(self, x):
avg_x = self.gap(x)
out = []
for aspp_idx in range(len(self.aspp)):
inp = avg_x if (aspp_idx == len(self.aspp) - 1) else x
out.append(F.relu_(self.aspp[aspp_idx](inp)))
out[-1] = out[-1].expand_as(out[-2])
out = torch.cat(out, dim=1)
return out
最后是Neck部分,继承自FPN的RFP(mmdet/models/neck/rfp.py),将第一次与第二次提取的特征融合,最终传给RPN进行后面的计算,后面部分就跟一般的二阶段目标检测没啥区别了。
def forward(self, inputs):
inputs = list(inputs)
assert len(inputs) == len(self.in_channels) + 1 # +1 for input image
img = inputs.pop(0)
# FPN forward
x = super().forward(tuple(inputs))
for rfp_idx in range(self.rfp_steps - 1):
# 经过FPN融合的特征做一个ASPP
rfp_feats = [x[0]] + list(
self.rfp_aspp(x[i]) for i in range(1, len(x)))
# RFP特征与第二个renset的特征融合
x_idx = self.rfp_modules[rfp_idx].rfp_forward(img, rfp_feats)
# FPN forward
x_idx = super().forward(x_idx)
# 第一次特征与第二次特征融合
x_new = []
for ft_idx in range(len(x_idx)):
add_weight = torch.sigmoid(self.rfp_weight(x_idx[ft_idx]))
x_new.append(add_weight * x_idx[ft_idx] +
(1 - add_weight) * x[ft_idx])
x = x_new
return x