Pytorch torchvision构建Faster-rcnn（四）----ROIHead

最新推荐文章于 2024-07-21 19:55:43 发布

叫我西瓜超人

最新推荐文章于 2024-07-21 19:55:43 发布

阅读量8.3k

点赞数 13

分类专栏： Pytorch 深度学习

本文链接：https://blog.csdn.net/watermelon1123/article/details/99942646

版权

深度学习同时被 2 个专栏收录

19 篇文章 2 订阅

订阅专栏

Pytorch

11 篇文章 2 订阅

订阅专栏

经过RPN后，我们得到了Classification/Regression loss和Proposal Region，接下来要对得到的loss和proposals做后续处理，其中包括proposal的细分类和再回归，以及ROI Pooling等操作。

RoIHeads

select_training_samples

ROIAlign pooling

box_head和box_predictor

fastrcnn_loss

postprocess_detections

总结

RoIHead

从RPN的到了一定数量的proposals，接下来需要做的事情是先将proposals和groundtruth对应的box匹配上，并生成用于traning的target，因此下面介绍一下select_training_samples。

select_training_samples

select_training_samples主要完成了以下几个功能：

需要将rpn生成的proposals与groundtruth匹配，匹配后会有一些大于high_threshold阈值的positive proposals和一些小于low_threshold阈值的negative proposals，也会有一些介于两个阈值的不参与loss计算的ignore proposals。
挑选出其中的正样本和负样本，并保持其比例一定（默认positive:negative=1:3）,且总数量一定（默认512）。
将正样本（positive proposals）与proposal region间的deltas(dx, dy, dw, dh)。

代码

    def select_training_samples(self, proposals, targets):
        self.check_targets(targets)
        gt_boxes = [t["boxes"] for t in targets]
        gt_labels = [t["labels"] for t in targets]

        # append ground-truth bboxes to propos
        proposals = self.add_gt_proposals(proposals, gt_boxes)

        # 和rpn中的match相同，计算每个proposal和groudtruth的iou
        # matched_idxs保存的是与groudtruth匹配的id（没有匹配上的默认id=0）
        # labels保存的是类别信息，其中背景为0，ignore proposal为-1
        matched_idxs, labels = self.assign_targets_to_proposals(proposals, gt_boxes, gt_labels)
        # subsample对proposal进行sample，挑选出其中的positive和negative proposals
        # 并保证参与训练的正负proposals的比例和个数保持一定
        sampled_inds = self.subsample(labels)
        matched_gt_boxes = []
        num_images = len(proposals)
        # 根据sample的结果选取对应的proposals
        for img_id in range(num_images):
            img_sampled_inds = sampled_inds[img_id]
            proposals[img_id] = proposals[img_id][img_sampled_inds]
            labels[img_id] = labels[img_id][img_sampled_inds]
            matched_idxs[img_id] = matched_idxs[img_id][img_sampled_inds]
            matched_gt_boxes.append(gt_boxes[img_id][matched_idxs[img_id]])
        # 计算gt_truth和proposal间的deltas(dx,dy,dw,dh)
        regression_targets = self.box_coder.encode(matched_gt_boxes, proposals)
        return proposals, matched_idxs, labels, regression_targets

ROIAlign pooling

proposal经过筛选后，需要进行ROIAlign操作，因为输入到roi head模块中的特征是完整尺寸的feature map，需要根据proposals的尺寸在对应的feature map中剪裁出相应的特征，这也就是ROIPooling和ROIAlign要完成的事情。在torchvision实现的faster-rcnn中，实现了精度更高的ROIAlign，定义在torchvision/ops/poolers.py中的MultiScaleRoIAlign。

参数定义：

featmap_names : ROIAlign的forward输入是一个包含feature maps的OrderedDict，因此featuremap_names指定从OrderedDict中的哪些feature来做ROIAlign

output_size : ROIAlign后输出的feature的大小

sampling_ratio :

来看一下官方给出的ROIAlign使用用例：

    Examples::
        # 创建roialign模块
        # ['feat1','feat2']指定用于做roialign的feature
        >>> m = torchvision.ops.MultiScaleRoIAlign(['feat1', 'feat3'], 3, 2)
        # i是输入到roialign中的input feature的OrderedDict
        >>> i = OrderedDict()
        >>> i['feat1'] = torch.rand(1, 5, 64, 64)
        >>> i['feat2'] = torch.rand(1, 5, 32, 32)  # this feature won't be used in the pooling
        >>> i['feat3'] = torch.rand(1, 5, 16, 16)
        >>> # 创建6个随机boxes作为forward输入
        >>> boxes = torch.rand(6, 4) * 256; boxes[:, 2:] += boxes[:, :2]
        >>> image_sizes = [(512, 512)] # image_size为图像输入大小
        >>> output = m(i, [boxes], image_sizes)
        >>> print(output.shape)
        # 经过roialign后，6个boxes生成了经过在feature上剪裁后的3×3大小的Feature
        >>> torch.Size([6, 5, 3, 3])

    """

box_head和box_predictor

box_head和box_predictor将经过roialign后的特征，通过全连接，得到分类和回归结果。

box_features = self.box_roi_pool(features, proposals, image_shapes)
box_features = self.box_head(box_features)
class_logits, box_regression = self.box_predictor(box_features)

fastrcnn_loss

如果是训练的话，需要计算loss，同rpn，分类用交叉熵，回归用SmoothL1 Loss。

postprocess_detections

如果是推理时，则需要对结果进行后处理，代码定义在roi_head.py中的postprocess_detections中：

    def postprocess_detections(self, class_logits, box_regression, proposals, image_shapes):
        device = class_logits.device
        num_classes = class_logits.shape[-1]
        
        # proposals为rpn生成的proposal region
        # proposals以list形式传入，list元素个数等于batch_size
        # boxes_in_image : p_num × 4，其中P_num为经过sample后的proposal region个数
        # boxes_per_image获取每张图片中的proposals个数
        boxes_per_image = [len(boxes_in_image) for boxes_in_image in proposals]
        # 通过网络输出的box_regression和proposals得到最后的bbox坐标
        pred_boxes = self.box_coder.decode(box_regression, proposals)

        # pred_scores为分类结果
        pred_scores = F.softmax(class_logits, -1)

        # split boxes and scores per image
        pred_boxes = pred_boxes.split(boxes_per_image, 0)
        pred_scores = pred_scores.split(boxes_per_image, 0)

        all_boxes = []
        all_scores = []
        all_labels = []
        for boxes, scores, image_shape in zip(pred_boxes, pred_scores, image_shapes):
            boxes = box_ops.clip_boxes_to_image(boxes, image_shape)

            # labels size: num_proposals × num_classes
            # labels的每一行为class的id，从0到num_classes
            labels = torch.arange(num_classes, device=device)
            labels = labels.view(1, -1).expand_as(scores)

            # id=0为背景类，推理时排除背景类别
            boxes = boxes[:, 1:]
            scores = scores[:, 1:]
            labels = labels[:, 1:]

            # 将boxes,scores,labels的大小分别resize成:N×4,N,N
            # N = num_proposals × num_classes
            # 这样做的目的是将每个类别进行处理
            boxes = boxes.reshape(-1, 4)
            scores = scores.flatten()
            labels = labels.flatten()

            # 仅保留大于score阈值的结果
            inds = torch.nonzero(scores > self.score_thresh).squeeze(1)
            boxes, scores, labels = boxes[inds], scores[inds], labels[inds]

            # 移除最小长度小于0.01的boxes
            keep = box_ops.remove_small_boxes(boxes, min_size=1e-2)
            boxes, scores, labels = boxes[keep], scores[keep], labels[keep]

            # 对boxes做nms，由于传入了labels参数，nms是针对于每个类别分别做nms
            keep = box_ops.batched_nms(boxes, scores, labels, self.nms_thresh)
            # 仅保留前detctions_per_img个结果，默认保留前100
            keep = keep[:self.detections_per_img]
            boxes, scores, labels = boxes[keep], scores[keep], labels[keep]

            all_boxes.append(boxes)
            all_scores.append(scores)
            all_labels.append(labels)

        return all_boxes, all_scores, all_labels