这篇文章的主要目的是通过分析mmdetection的源码,读懂faster-rcnn的同时,也能更加深刻的理解mmdetection的代码,方便以后的模型修改和自己实现各种算法。在读文章之前尽量对mmdetection有一定了解,并且能够跑通代码。
这里主要介绍模型构建相关的几个模块以及如何拼接到一起的:
1.backbone
2.neck
3.rpn_head
4.bbox_roi_extractor
5.bbox_head
源码分析:
1.Faster-RCNN整体流程分析
这里以输入大小为(800,800), batch_size=2为例
def forward_train(self,
img,
img_metas,
gt_bboxes,
gt_labels,
gt_bboxes_ignore=None,
gt_masks=None,
proposals=None):
# backbone网络特征提取+FPN操作
#img tensor(2,3,800,800)
x = self.extract_feat(img) #具体分析请查看下面2.1和2.2详解
#经历backbone和fpn网络后的x结构是一个tuple结构,如下
#tuple(tensor(2,256,200,200), tensor(2,256,100,100), tensor(2,256,50,50), tensor(2,256,25,25), tensor(2,256,13,13))
losses = dict()
# RPN过程,RPN详情请看2.3, P2-P6对应fpn之后的5层
if self.with_rpn:
#rpn_outs结构 tuple(list(5个rpn_cls_score,对应P2-P6),list(5个rpn_cls_score,对应P2-P6))
# P2层的rpn_cls_score tensor(2, anchor_base*1 = 3, 200, 200)
# P2层的rpn_bbox_pred tensor(2, anchor_base*4 = 12, 200, 200)
# 先进行3*3卷积,然后分别进行1*1卷积得到rpn_outs
rpn_outs = self.rpn_head(x)
# gt_bboxes 大小为batch_size的列表,对应每张图片的ground_truth,维度是 tensor(n,4), n为bbox个数
# img_metas 大小为batch_size的列表,对应每张图片的信息,字典包含图片名称、大小、缩放因子等字段
# self.train_cfg.rpn对应config文件中train_cfg.rpn的字典
rpn_loss_inputs = rpn_outs + (gt_bboxes, img_metas,
self.train_cfg.rpn)
# rpn_loss_inputs 组合后,tuple长度为5,分别为list(rpn_cls_score)、list(rpn_cls_score)、list(gt_bboxe)、list(img_meta)、dict(train_cfg.rpn)
# loss计算,查看2.3.2中rpn_head的loss计算详解
rpn_losses = self.rpn_head.loss(
*rpn_loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
#更新rpn的loss,这里计算loss用的是gt与anchor_boxes匹配得到的256个bbox做的loss计算
losses.update(rpn_losses)
# rpn阶段完成后,获取rpn_outs中scores分值最高的前2000个作为rcnn阶段的proposal输入
# 查看2.3.3中rpn_head的get_bboxes计算详解
proposal_cfg = self.train_cfg.get('rpn_proposal',
self.test_cfg.rpn)
proposal_inputs = rpn_outs + (img_meta, proposal_cfg)
proposal_list = self.rpn_head.get_bboxes(*proposal_inputs)
else:
proposal_list = proposals
# proposal_list shape list(tensor(2000,5), ...)
if self.with_bbox or self.with_mask:
# 获取assigner和sampler对象,
# 用来分配正负例,以及每张图片随机选取512个proposals,详情请看2.3.4
bbox_assigner = build_assigner(self.train_cfg.rcnn.assigner)
bbox_sampler = build_sampler(
self.train_cfg.rcnn.sampler, context=self)
num_imgs = img.size(0)
if gt_bboxes_ignore is None:
gt_bboxes_ignore = [None for _ in range(num_imgs)]
sampling_results = []
for i in range(num_imgs):
assign_result = bbox_assigner.assign(proposal_list[i],
gt_bboxes[i],
gt_bboxes_ignore[i],
gt_labels[i])
sampling_result = bbox_sampler.sample(
assign_result,
proposal_list[i],
gt_bboxes[i],
gt_labels[i],
feats=[lvl_feat[i][None] for lvl_feat in x])
sampling_results.append(sampling_result)
# sampling_results 每张图片选取512个proposals,放到列表中
# bbox head forward and loss
if self.with_bbox:
# 将list(tensor(512,4), ...) 转化为 Tensor: shape (n, 5), [batch_ind, x1, y1, x2, y2]
# n代表图片对应当前批次的id
rois = bbox2roi([res.bboxes for res in sampling_results])
# roi_align操作 详情查看2.4
bbox_feats = self.bbox_roi_extractor(
x[:self.bbox_roi_extractor.num_inputs], rois)
if self.with_shared_head:
bbox_feats = self.shared_head(bbox_feats)
# bbox_head操作 详情查看2.5
cls_score, bbox_pred = self.bbox_head(bbox_feats)
# 通过gt和sampling_results 匹配正负例,用于最终loss计算
bbox_targets = self.bbox_head.get_target(sampling_results,
gt_bboxes, gt_labels,
self.train_cfg.rcnn)
loss_bbox = self.bbox_head.loss(cls_score, bbox_pred,
*bbox_targets)
losses.update(loss_bbox)
......
return losses
2.1 Backbone
这里对于backbone的设置信息如下
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
style='pytorch'),
请先注意resnet的网络结构
<