torchvision Faster RCNN 代码详解

参考博客:

  1. https://zhuanlan.zhihu.com/p/31426458
  2. https://zhuanlan.zhihu.com/p/145842317

可参考代码:https://github.com/supernotman/Faster-RCNN-with-torchvision

调用方式

import torch
from torchvision import transforms
import torchvision
from PIL import Image

# For training
images, boxes = torch.rand(4, 3, 600, 1200), torch.rand(4, 11, 4)
labels = torch.randint(1, 91, (4, 11))
images = list(image for image in images)
targets = []
for i in range(len(images)):
    d = {}
    d['boxes'] = boxes[i]
    d['labels'] = labels[i]
    targets.append(d)
output = model(images, targets)

'''
output:
    {loss_classifier: *, loss_box_reg: *, loss_objectness: *, loss_rpn_box_reg: *}
'''

# For inference
model.eval()
img1 = transforms.ToTensor()(Image.open('./imgs/000000000036.jpg').convert('RGB'))
img2 = transforms.ToTensor()(Image.open('./imgs/000000000042.jpg').convert('RGB'))
x = [img1, img2]
predictions = model(x)

'''
predictions:
    [{'boxes': num_boxes * 4 (xyxy), 'labels': 1 * num_boxes, 'scores': 1 * num_boxes},
     {'boxes': num_boxes * 4 (xyxy), 'labels': 1 * num_boxes, 'scores': 1 * num_boxes}]
'''

其中 fasterrcnn_resnet50_fpn 会在默认使用 resnet50 作为 backbone 的基础上创建一个 FasterRCNN 的对象。FasterRCNN 在创建对应模块(rpn,roi_heads,transform)之后调用其基类 GeneralizedRCNN 完成对象生成。

 GeneralizedRCNN 主要逻辑

images, targets = self.transform(images, targets)
features = self.backbone(images.tensors)
if isinstance(features, torch.Tensor):
    features = OrderedDict([('0', features)])
proposals, proposal_losses = self.rpn(images, features, targets)
detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
detections = self.transform.postprocess(detections, images.image_sizes, original_image_sizes)

losses = {}
losses.update(detector_losses)
losses.update(proposal_losses)

FasterRCNN 子模块生成

BackboneWithFPN (Backbone)

class BackboneWithFPN(nn.Module):
    """
    Adds a FPN on top of a model.
    Internally, it uses torchvision.models._utils.IntermediateLayerGetter to
    extract a submodel that returns the feature maps specified in return_layers.
    The same limitations of IntermediatLayerGetter apply here.
    Arguments:
        backbone (nn.Module)
        return_layers (Dict[name, new_name]): a dict containing the names
            of the modules for which the activations will be returned as
            the key of the dict, and the value of the dict is the name
            of the returned activation (which the user can specify).
        in_channels_list (List[int]): number of channels for each feature map
            that is returned, in the order they are present in the OrderedDict
        out_channels (int): number of channels in the FPN.
    Attributes:
        out_channels (int): the number of channels in the FPN
    """
    def __init__(self, backbone, return_layers, in_channels_list, out_channels):
        super(BackboneWithFPN, self).__init__()
        self.body = IntermediateLayerGetter(backbone, return_layers=return_layers)
        self.fpn = FeaturePyramidNetwork(
            in_channels_list=in_channels_list,
            out_channels=out_channels,
            extra_blocks=LastLevelMaxPool(),
        )
        self.out_channels = out_channels

    def forward(self, x):
        x = self.body(x)
        x = self.fpn(x)
        return x

 参数说明:

  • return_layers:指定 backbone 返回的 conv 层,并给出返回的 OrderDict 的新 Key
  • in_channels_list:指定 fpn 输入各 feature map 层的 channel 数
  • out_channels:指定 fpn 输出的各 feature map 层的 channel 数,统一为一个数

其中例子中使用的 ResNet50 backbone 返回的 feature map size 依次减小为原来的 4,8,16,32 分之一,同时 channel 数依次增加为 256,512,1024,2048。不过经过 fpn 层后,所有最终返回的 feature map 的通道数都统一为了 256.

RegionProposalNetwork (RPN)

class RegionProposalNetwork(torch.nn.Module):
    """
    Implements Region Proposal Network (RPN).

    Arguments:
        anchor_generator (AnchorGenerator): module that generates the anchors for a set of feature
            maps.
        head (nn.Module): module that computes the objectness and regression deltas
        fg_iou_thresh (float): minimum IoU between the anchor and the GT box so that they can be
            considered as positive during training of the RPN.
        bg_iou_thresh (float): maximum IoU between the anchor and the GT box so that they can be
            considered as negative during training of the RPN.
        batch_size_per_image (int): number of anchors that are sampled during training of the RPN
            for computing the loss
        positive_fraction (float): proportion of positive anchors in a mini-batch during training
            of the RPN
        pre_nms_top_n (Dict[int]): number of proposals to keep before applying NMS. It should
            contain two fields: training and testing, to allow for different values depending
            on training or evaluation
        post_nms_top_n (Dict[int]): number of proposals to keep after applying NMS. It should
            contain two fields: training and testing, to allow for different values depending
            on training or evaluation
        nms_thresh (float): NMS threshold used for postprocessing the RPN proposals

    """
        # RPN uses all feature maps that are available
        features = list(features.values())
        objectness, pred_bbox_deltas = self.head(features)
        anchors = self.anchor_generator(images, features)

        num_images = len(anchors)
        num_anchors_per_level_shape_tensors = [o[0].shape for o in objectness]
        num_anchors_per_level = [s[0] * s[1] * s[2] for s in num_anchors_per_level_shape_tensors]
        objectness, pred_bbox_deltas = \
            concat_box_prediction_layers(objectness, pred_bbox_deltas)
        # apply pred_bbox_deltas to anchors to obtain the decoded proposals
        # note that we detach the deltas because Faster R-CNN do not backprop through
        # the proposals
        proposals = self.box_coder.decode(pred_bbox_deltas.detach(), anchors)
        proposals = proposals.view(num_images, -1, 4)
        boxes, scores = self.filter_proposals(proposals, objectness, images.image_sizes, num_anchors_per_level)

        losses = {}
        if self.training:
            assert targets is not None
            labels, matched_gt_boxes = self.assign_targets_to_anchors(anchors, targets)
            regression_targets = self.box_coder.encode(matched_gt_boxes, anchors)
            loss_objectness, loss_rpn_box_reg = self.compute_loss(
                objectness, pred_bbox_deltas, labels, regression_targets)
            losses = {
                "loss_objectness": loss_objectness,
                "loss_rpn_box_reg": loss_rpn_box_reg,
            }
        return boxes, losses

 处理流程:

1、回归 proposals 及对应前景概率:head

基于 backbone 提取到的每层 feature map 通过 Conv2d 分别回归每个特征点的每个 anchor 的类别(前景的概率)和偏差(预测框与 anchor)。对应输出结果的 size 不会变化,不过 channel 数类别变为指定每个特征点提取的 num_anchors 和 4 * num_anchors。

2、生成 anchors:anchor_generator

基于传入的每个特征点需要提取的 anchor_size 和 anchor_ratios 提取 anchors。TODO

3、基于提取的 anchors 和回归的 anchors_deltas 得到 proposals

基于 det_utils.BoxCoder 的 decoder 得到 proposals

4、对提取的 propoals 进行筛选:pre_nms_top_n,nms_thresh,post_nms_top_n

  • 基于每层 anchors 提取 pre_nms_top_n 个前景概率最高的 anchors
  • 去除可视范围内过小的 proposals
  • 基于 nms_thresh 去除重叠的 proposals
  • 保留全局最优的 post_nms_top_n 个 proposals

5、(训练)编码 anchors offsets:fg_iou_thresh, bg_iou_thresh

  • 将 anchors 与真值关联,det_utils.Matcher
  • det_utils.BoxCoder 的 encoder 得到每个 anchor 对应真值的 offsets

6、(训练)计算 loss:batch_size_per_image, positive_fraction

  • 前景背景均衡化采样:BalancedPositiveNegativeSampler

  • 基于 L1 计算回归 loss,交叉熵计算分类 loss

RoIHeads

class RoIHeads(torch.nn.Module):
    def __init__(self,
                 box_roi_pool,
                 box_head,
                 box_predictor,
                 # Faster R-CNN training
                 fg_iou_thresh, bg_iou_thresh,
                 batch_size_per_image, positive_fraction,
                 bbox_reg_weights,
                 # Faster R-CNN inference
                 score_thresh,
                 nms_thresh,
                 detections_per_img,
                 )
        if self.training:
            proposals, matched_idxs, labels, regression_targets = self.select_training_samples(proposals, targets)
            
        else:
            labels = None
            regression_targets = None
            matched_idxs = None

        box_features = self.box_roi_pool(features, proposals, image_shapes)
        box_features = self.box_head(box_features)
        class_logits, box_regression = self.box_predictor(box_features)

        result = torch.jit.annotate(List[Dict[str, torch.Tensor]], [])
        losses = {}
        if self.training:
            assert labels is not None and regression_targets is not None
            loss_classifier, loss_box_reg = fastrcnn_loss(
                class_logits, box_regression, labels, regression_targets)
            losses = {
                "loss_classifier": loss_classifier,
                "loss_box_reg": loss_box_reg
            }
        else:
            boxes, scores, labels = self.postprocess_detections(class_logits, box_regression, proposals, image_shapes)
            num_images = len(boxes)
            for i in range(num_images):
                result.append(
                    {
                        "boxes": boxes[i],
                        "labels": labels[i],
                        "scores": scores[i],
                    }
                )

处理流程:

1、(训练)选取训练样本:fg_iou_thresh, bg_iou_thresh, batch_size_per_image, positive_fraction

  • proposals 与真值匹配
  • 不同类别目标均衡化采样
  • 基于采样后的 proposals 的 offset

2、roi_align:box_roi_pool

根据 proposals 对 feature map 进行尺寸归一化:MultiScaleRoIAlign

3、特征提取:box_head

归一化特征经过两层 MLP 提取特征

4、回归框的类别和 offset:box_predictor

针对每个 proposals 预测属于每一类的概率和 offset

5、(训练)计算 loss

  • 基于交叉熵计算分类 loss
  • 基于 L1 计算 offset loss

6、框 offset decode 后处理:score_thresh, nms_thresh, detections_per_img

  • decoder 得到预测框
  • softmax 得到分类类别
  • 基于 score_thresh 去除得分过低的预测
  • 去除有效范围内过小的预测
  • 基于 nms_thresh 执行 NMS
  • 选取得分最高的 detections_per_img 个预测

GeneralizedRCNNTransform (transform)

class GeneralizedRCNNTransform(nn.Module):
    """
    Performs input / target transformation before feeding the data to a GeneralizedRCNN
    model.

    The transformations it perform are:
        - input normalization (mean subtraction and std division)
        - input / target resizing to match min_size / max_size

    It returns a ImageList for the inputs, and a List[Dict[Tensor]] for the targets
    """
  •  在送入网络之前对图像进行归一化(减均值除方差),resize 操作。
  • 基于网络输出,将预测框大小 resize 成原始尺寸。

杂项

框的编码方式

其中 x,y,w,h 分别为分别为框的中心点坐标和宽高,x, x_a,x^* 分别表示预测框,anchor 框和真值框。由此,

  • t_x :表示中心点水平位置相比于框宽度的比例
  • t_y :表示中心点竖直位置相比于框高度的比例
  • t_w :表示框宽度比例值的对数
  • t_h :表示框高度比例值的对数

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值