文章目录
1.RPN部分的代码
我们首先定位到RPN部分代码的forward_train部分,位于two_stage.py文件下

经历了Resnet50和FPN两部分的操作后,我们得到了如下的向量:

下面我们要对特征向量进行RPN操作。
if self.with_rpn:
proposal_cfg = self.train_cfg.get('rpn_proposal',
self.test_cfg.rpn)
rpn_losses, proposal_list = self.rpn_head.forward_train(
x,
img_metas,
gt_bboxes,
gt_labels=None,
gt_bboxes_ignore=gt_bboxes_ignore,
proposal_cfg=proposal_cfg,
**kwargs)
losses.update(rpn_losses)
else:
proposal_list = proposals
1.1 rpn_head.forward_train的代码(base_dense_head.py)
可以发现,主要代码位于self.rpn_head.forward_train这个函数,我们来看其定义,位于base_dense_head.py文件下

def forward_train(self,
x,
img_metas,
gt_bboxes,
gt_labels=None,
gt_bboxes_ignore=None,
proposal_cfg=None,
**kwargs):
"""
Args:
x (list[Tensor]): Features from FPN.
img_metas (list[dict]): Meta information of each image, e.g.,
image size, scaling factor, etc.
gt_bboxes (Tensor): Ground truth bboxes of the image,
shape (num_gts, 4).
gt_labels (Tensor): Ground truth labels of each box,
shape (num_gts,).
gt_bboxes_ignore (Tensor): Ground truth bboxes to be
ignored, shape (num_ignored_gts, 4).
proposal_cfg (mmcv.Config): Test / postprocessing configuration,
if None, test_cfg would be used
Returns:
tuple:
losses: (dict[str, Tensor]): A dictionary of loss components.
proposal_list (list[Tensor]): Proposals of each image.
"""
函数的第一句调用了如下的函数
outs = self(x)
这个self是预先定义好的,定义在rotated_rpn_head.py文件下

def _init_layers(self):
"""Initialize layers of the head."""
self.rpn_conv = nn.Conv2d(
self.in_channels, self.feat_channels, 3, padding=1)
self.rpn_cls = nn.Conv2d(self.feat_channels,
self.num_anchors * self.cls_out_channels, 1)
self.rpn_reg = nn.Conv2d(self.feat_channels, self.num_anchors * 4, 1)
def forward_single(self, x):
"""Forward feature map of a single scale level."""
x = self.rpn_conv(x)
x = F.relu(x, inplace=True)
rpn_cls_score = self.rpn_cls(x)
rpn_bbox_pred = self.rpn_reg(x)
return rpn_cls_score, rpn_bbox_pred
将FPN得到的5个特征值进行了卷积操作,将outs对结果和gt_bboxes和img_meta组合成一个元组,其中outs是维度为3和12的向量。
其中3的含义是每一个特征点产生3个anchor,每一个anchor进行0、1分类,所以self.num_anchors * self.cls_out_channels为 1 * 3为3
其中12的含义是每一个特征点产生3个anchor,每一个anchor有4个坐标,所以 3*4为12
gt_bboxes包含了这一个图片包含的ground truth数量

经历了以上处理后,函数将整合的结果送入了self.loss函数中进行损失计算(详见1.2)
if gt_labels is None:
loss_inputs = outs + (gt_bboxes, img_metas)
else:
loss_inputs = outs + (gt_bboxes, gt_labels, img_metas)
losses = self.loss(*loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)

if proposal_cfg is None:
return losses
else:
proposal_list = self.get_bboxes(
*outs, img_metas=img_metas, cfg=proposal_cfg)
return losses, proposal_list
如果 proposal_cfg 为 None,则直接返回 losses,即损失。
否则,生成候选框列表 proposal_list(详见1.3)
1.2 self.loss的代码(rotated_rpn_head.py)
def loss(self,
cls_scores,
bbox_preds,
gt_bboxes,
img_metas,
gt_bboxes_ignore=None):
"""Compute losses of the head.
Args:
cls_scores (list[Tensor]): Box scores for each scale level
Has shape (N, num_anchors * num_classes, H, W)
bbox_preds (list[Tensor]): Box energies / deltas for each scale
level with shape (N, num_anchors * 5, H, W)
gt_bboxes (list[Tensor]): Ground truth bboxes for each image with
shape (num_gts, 5) in [cx, cy, w, h, a] format.
gt_labels (list[Tensor]): class indices corresponding to each box
img_metas (list[dict]): Meta information of each image, e.g.,
image size, scaling factor, etc.
gt_bboxes_ignore (None | list[Tensor]): specify which bounding
boxes can be ignored when computing the loss. Default: None
Returns:
dict[str, Tensor]: A dictionary of loss components.
来看具体的代码实现
featmap_sizes = [featmap.size()[-2:] for featmap in cls_scores]
assert len(featmap_sizes) == self.anchor_generator.num_levels
获取分类分数张量 cls_scores 中每个特征图的尺寸,并与锚框生成器的层数进行检查。

下面的代码调用了get_anchors方法,其定义位于anchor_head.py下,详见1.2.1章节
anchor_list, valid_flag_list = self.get_anchors(
featmap_sizes, img_metas, device=device)
调用 get_anchors 方法来生成锚框列表 anchor_list 和有效标志列表 valid_flag_list


label_channels = self.cls_out_channels if self.use_sigmoid_cls else 1
如果模型使用 sigmoid 函数作为分类器的激活函数。则 值为 self.cls_out_channels
如果模型使用 softmax 函数作为分类器的激活函数,则 label_channels 的值为 1
cls_reg_targets = self.get_targets(
anchor_list,
valid_flag_list,
gt_bboxes,
img_metas,
gt_bboxes_ignore_list=gt_bboxes_ignore,
gt_labels_list=None,
label_channels=label_channels)
根据输入的锚框、有效标志、真实边界框等信息计算得到的分类和回归目标(详见1.2.2)

(labels_list, label_weights_list, bbox_targets_list, bbox_weights_list,
num_total_pos, num_total_neg) = cls_reg_targets
num_total_samples = (
num_total_pos + num_total_neg if self.sampling else num_total_pos)
将cls_reg_targets中的元素解包赋值给labels_list、label_weights_list、bbox_targets_list、bbox_weights_list、num_total_pos和num_total_neg

num_level_anchors = [anchors.size(0) for anchors in anchor_list[0]]
# concat all level anchors and flags to a single tensor
concat_anchor_list = []
for i, _ in enumerate(anchor_list):
concat_anchor_list.append(torch.cat(anchor_list[i]))
all_anchor_list = images_to_levels(concat_anchor_list,
num_level_anchors)
首先计算每个级别anchors的数量
然后,将每个图像的anchors拼接成一个单独的张量,保存在concat_anchor_list列表中
使用images_to_levels函数将concat_anchor_list转换为按级别分组的anchors列表


losses_cls, losses_bbox = multi_apply(
self.loss_single,
cls_scores,
bbox_preds,
all_anchor_list,
labels_list,
label_weights_list,
bbox_targets_list,
bbox_weights_list,
num_total_samples=num_total_samples)
把loss_single函数应用到每个级别的分类得分、边界框预测值、anchors、标签、标签权重、边界框目标和边界框权重上,以计算分类和边界框损失
详见(1.2.3)
1.2.1 get_anchors的代码(anchor_head.py)
def get_anchors(self, featmap_sizes, img_metas, device='cuda'):
"""Get anchors according to feature map sizes.
Args:
featmap_sizes (list[tuple]): Multi-level feature map sizes.
img_metas (list[dict]): Image meta info.
device (torch.device | str): Device for returned tensors
Returns:
tuple:
anchor_list (list[Tensor]): Anchors of each image.
valid_flag_list (list[Tensor]): Valid flags of each image.
"""
num_imgs = len(img_metas)
# since feature map sizes of all images are the same, we only compute
# anchors for one time
multi_level_anchors = self.prior_generator.grid_priors(
featmap_sizes, device=device)
anchor_list = [multi_level_anchors for _ in range(num_imgs)]
# for each image, we compute valid flags of multi level anchors
valid_flag_list = []
for img_id, img_meta in enumerate(img_metas):
multi_level_flags = self.prior_generator.valid_flags(
featmap_sizes, img_meta['pad_shape'], device)
valid_flag_list.append(multi_level_flags)
return anchor_list, valid_flag_list
首先获取输入图像的数量
然后通过self.prior_generator.grid_priors(详见1.2.1.1 )生成多层级的锚框
接着通过调用 self.prior_generator.valid_flags计算多层级锚框的有效标志
最后返回 anchor_list 和 valid_flag_list

看一下anchor_lis的结果

关于valid_flag_list的代码我们就不细看了,这个主要是为了判断哪些anchor是有效的,看一下结果

1.2.1.1 grid_priors的代码(anchor_generator.py)
def grid_priors(self, featmap_sizes, dtype=torch.float32, device='cuda'):
"""Generate grid anchors in multiple feature levels.
Args:
featmap_sizes (list[tuple]): List of feature map sizes in
multiple feature levels.
dtype (:obj:`torch.dtype`): Dtype of priors.
Default: torch.float32.
device (str): The device where the anchors will be put on.
Return:
list[torch.Tensor]: Anchors in multiple feature levels. \
The sizes of each tensor should be [N, 4], where \
N = width * height * num_base_anchors, width and height \
are the sizes of the corresponding feature level, \
num_base_anchors is the number of anchors for that level.
"""
assert self.num_levels == len(featmap_sizes)
multi_level_anchors = []
for i in range(self.num_levels):
anchors = self.single_level_grid_priors(
featmap_sizes[i], level_idx=i, dtype=dtype, device=device)
multi_level_anchors.append(anchors)
return multi_level_anchors
代码定义了一个名为grid_priors的方法,用于在多个特征级别生成网格锚点
主要的操作位于single_level_grid_priors函数中,详见1.2.1.2
这个函数位于anchor_generator类当中,这是我们初始化anchor的函数,看一下这个类的基本结构

1.2.1.2 single_level_grid_priors的代码(anchor_generator.py)
def single_level_grid_priors(self,
featmap_size,

文章详细介绍了目标检测模型中的RegionProposalNetwork(RPN)和ROI部分的代码流程,包括RPN头的前向传播、损失计算、锚框生成、目标框回归等关键步骤,以及ROI头的前向训练过程,如RoI池化、损失计算等,深入剖析了各个部分的代码实现和功能作用。
最低0.47元/天 解锁文章
1029

被折叠的 条评论
为什么被折叠?



