mmdetection中faster_rcnn的实现
前置内容:
mmdetecion 中类注册的实现(@x.register_module())
内容包括:
faster_rcnn
backbone
neck
rpn_head
faster_rcnn
@DETECTORS.register_module()
class FasterRCNN(TwoStageDetector):
在代码中,fasterRCNN继承了TwoStageDetector,TwoStageDetector继承了BaseDetector,在BaseDetector中实现了forward,其中判断训练和test调用了forward_train和forward_test,我们关注的也就是forward_train()
faster_rcnn依次调用了backbone,neck,rpn,roi,之后计算loss
def forward_train(self,
img,
img_metas,
gt_bboxes,
gt_labels,
gt_bboxes_ignore=None,
gt_masks=None,
proposals=None,
**kwargs):
x = self.extract_feat(img) # 特征提取+fpn
losses = dict()
# RPN 过程
if self.with_rpn:
proposal_cfg = self.train_cfg.get('rpn_proposal',
self.test_cfg.rpn)
rpn_losses, proposal_list = self.rpn_head.forward_train(
x,
img_metas,
gt_bboxes,
gt_labels=None,
gt_bboxes_ignore=gt_bboxes_ignore,
proposal_cfg=proposal_cfg,
**kwargs)
losses.update(rpn_losses)
else:
proposal_list = proposals
#roi过程
roi_losses = self.roi_head.forward_train(x, img_metas, proposal_list,
gt_bboxes, gt_labels,
gt_bboxes_ignore, gt_masks,
**kwargs)
losses.update(roi_losses)
return losses
backbone
backbone的调用在
x = self.extract_feat(img) # 特征提取+fpn
在传入backbone后直接传入neck(如果有)
def extract_feat(self, img):
"""Directly extract features from the backbone+neck."""
x = self.backbone(img)
if self.with_neck:
x = self.neck(x)
return x
我们使用的backbone配置文件为:
backbone=dict(
type='ResNet',
depth=50, # 使用res50
num_stages=4, #4层
out_indices=(0, 1, 2, 3), #输出前4层的特征,也就是C2,C3,C4,C5
frozen_stages=1, # 冻结的stage数量,即该stage不更新参数,-1表示所有的stage都更新参数
norm_cfg=dict(type='BN', requires_grad=True), #使用bach_norm
norm_eval=True, #在test时候使用norm
style='pytorch',
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
配置文件到类的转换可以查看[mmdetecion 中类注册的实现(@x.register_module())]
由out_indices=4可以获得我们输出的层数为4,如果batch_size为16,图像大小为800*800,那么最终输出的4个特征分别为:
(16,256,200,200)(16,512,100,100)(16,1024,50,50)(16,2048,25,25)
neck
neck之所以叫neck,他是用于处理特征提取和head之间的内容
以fpn为例:
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=5),
in_channels表示输入的4层,通道数分别为256, 512, 1024, 2048,对应我们上面res50的输出层数和通道数
具体的处理如下图:
def forward(self, inputs):
"""Forward function."""
assert len(inputs) == len(self.in_channels)
# 使用1x1卷积的部分
laterals = [
lateral_conv(inputs[i + self.start_level])
for i, lateral_conv in enumerate(self.lateral_convs)
]
# 经过最近线性插值之后加上,得到m
used_backbone_levels = len(laterals)
for i in range(used_backbone_levels - 1, 0, -1):
# In some cases, fixing `scale factor` (e.g. 2) is preferred, but
# it cannot co-exist with `size` in `F.interpolate`.
if 'scale_factor' in self.upsample_cfg:
laterals[i - 1] += F.interpolate(laterals[i]