Maskrcnn Code Understanding

最新推荐文章于 2022-09-23 11:27:37 发布

XuMengyaAmy

最新推荐文章于 2022-09-23 11:27:37 发布

阅读量520

点赞数 1

分类专栏：人工智能文章标签：深度学习神经网络

原文链接：https://blog.csdn.net/xiangxianghehe/article/details/88793670

版权

人工智能专栏收录该内容

10 篇文章 1 订阅

订阅专栏

Reference:
1.https://blog.csdn.net/leijieZhang/article/details/91431846?spm=1001.2014.3001.5501 CSDN博主「leijieZhang」
2. https://blog.csdn.net/leijieZhang/article/details/90903462
3. https://blog.csdn.net/xiangxianghehe/article/details/88793670
4. https://blog.csdn.net/ChuiGeDaQiQiu/article/details/83868512
5. https://blog.csdn.net/lt1103725556/article/details/115406360 (dataloader)

Prepare our own dataset

https://blog.csdn.net/ChuiGeDaQiQiu/article/details/83868512
仿照他们的数据格式来准备自己的数据集

https://blog.csdn.net/azhegps/article/details/100066131 (复制一个文件夹到另一个文件夹)

文件夹1：test1/

文件夹2：test2/

目标：将test1/中的所有文件和目录拷贝到test2/中

正确指令：

cp -rf test1/. test2/

这两个文件夹也可以是在不同的目录里

cp -rf /media/mmlab/dataset/mengya/Dataset_Reproduce_OtherPapers/Scene-Graph-Benchmark_Dataset/vg/VG_100K/. /media/mmlab/data/mengya/causal/Scene-Graph-Benchmark/datasets/vg/VG_100K/

注意，test1/的后面是一个点

Prepare the path for our own dataset

https://blog.csdn.net/xiangxianghehe/article/details/88793670
1拷贝数据集到根目录的datasets下(和demo同级目录)如

maskrcnn-benchmark/datasets/jinnan/jinnan2_round1_train_20190305

2.修改paths_catalog.py
路径为maskrcnn-benchmark/maskrcnn_benchmark/config/paths_catalog.py

2.1 在paths_catalog中的DATASETS字典中添加你需要的路径，如

"jinnan_train": {
"img_dir": "jinnan2_round1_train_20190305",
"ann_file": "jinnan2_round1_train_20190305/train_no_poly.json"
},

2.2修改paths_catalog中部静态函数get(name)方法
添加一个if else，把你创建的数据集相关内容放进去,如

elif "jinnan" in name:  # name对应yaml文件传过来的数据集名字
    data_dir = DatasetCatalog.DATA_DIR
    attrs = DatasetCatalog.DATASETS[name]
    args = dict(
        root=os.path.join(data_dir, attrs["img_dir"]),  # img_dir就是2.1 步骤里面的内容
        ann_file=os.path.join(data_dir, attrs["ann_file"]),  # ann_file就是2.1 步骤里面的内容
    )
    return dict(
        factory="MyDataset",  # 这个与MyDataset对应，是我们自己定义的数据类的名字
        args=args,
    )

上面参数解释（主要是MyDataset）：

1)这个MyDataset就是你自己建的那个类，返回值是image, boxlist, idx，具体实现参考git官网（很容易）

2)比如我实现好了MyDataset类，然后这个py文件取名为 jinnan.py

3)然后放在maskrcnn-benchmark/maskrcnn_benchmark/data/datasets路径下

4)接着配置那个目录里面的__init__.py文件，第四行和all最后一个元素是自己加的

from .coco import COCODataset
from .voc import PascalVOCDataset
from .concat_dataset import ConcatDataset
from .jinnan import MyDataset

all = ["COCODataset", "ConcatDataset", "PascalVOCDataset", "MyDataset"]

5)注意，实现MyDataset要实现__len__，getitem，get_img_info，还有__init__，其中__init__会得到第一个步骤传来的attrs，__init__的一个参数参考

def __init__(self,ann_file=None, root=None, remove_images_without_annotations=None, transforms=None)

参数的意思可以去看maskrcnn-benchmark/maskrcnn_benchmark/data/build.py
3.修改yaml文件
路径为configs/xxx.yaml

在外层的 configs/xxx.yaml文件里有 DATASETS. 修改的就是这个数据load部分

MODEL:
  MASK_ON: False
DATASETS:
  TRAIN: ("jinnan_train", "jinnan_val")
  TEST: ("jinnan_test",)

在这里插入图片描述
上面三个值都是自己设的，其实有用的就jinnan_train，当然首先重要的是要把MASK_ON关闭

4.自己写的数据加载的凌乱的参考
maskrcnn-benchmark/maskrcnn_benchmark/data/datasets/jinnan.py

from maskrcnn_benchmark.structures.bounding_box import BoxList
from PIL import Image
import os
import json
import torch

class MyDataset(object):
    def __init__(self,ann_file=None, root=None, remove_images_without_annotations=None, transforms=None):
        # as you would do normally

        self.transforms = transforms

        self.train_path = root
        with open(ann_file, 'r') as f:
            self.data = json.load(f)

        self.idxs = list(range(len(self.data['images'])))  # 看要训练的图像有多少张，把id用个列表存储方便随机
        self.bbox_label = {}
        for anno in self.data['annotations']:
            bbox = anno['bbox']
            bbox[2] += bbox[0]
            bbox[3] += bbox[1]
            cate = anno['category_id']
            image_id = anno['image_id']
            if not image_id in self.bbox_label:
                self.bbox_label[image_id] = [[bbox], [cate]]
            else:
                self.bbox_label[image_id][0].append(bbox)
                self.bbox_label[image_id][1].append(cate)

    def __getitem__(self, idx):
        # load the image as a PIL Image
        idx = self.idxs[idx % len(self.data['images'])]
        if idx not in self.bbox_label:  # 210, 262, 690, 855 have no bbox
            idx += 1
        path = self.data['images'][idx]['file_name']

        folder = 'restricted' if idx < 981 else 'normal'

        image = Image.open(os.path.join(self.train_path, folder, path)).convert('RGB')
        # load the bounding boxes as a list of list of boxes
        # in this case, for illustrative purposes, we use
        # x1, y1, x2, y2 order.
        # boxes = [[0, 0, 10, 10], [10, 20, 50, 50]]
        boxes = self.bbox_label[idx][0]
        category = self.bbox_label[idx][-1]

        # and labels
        labels = torch.tensor(category)

        # create a BoxList from the boxes
        boxlist = BoxList(boxes, image.size, mode="xyxy")
        # add the labels to the boxlist
        boxlist.add_field("labels", labels)

        if self.transforms:
            image, boxlist = self.transforms(image, boxlist)

        # return the image, the boxlist and the idx in your dataset
        return image, boxlist, idx
    def __len__(self):
        return len(self.data['images'])

    def get_img_info(self, idx):
        idx = self.idxs[idx % len(self.data['images'])]
        height = self.data['images'][idx]['height']
        width = self.data['images'][idx]['width']
        # get img_height and img_width. This is used if
        # we want to split the batches according to the aspect ratio
        # of the image, as it can be more efficient than loading the
        # image from disk
        return {"height": height, "width": width}

根据自己的数据集修改 .yaml 文件里的两个参数
MODEL.ROI_BOX_HEAD.NUM_CLASSES
MODEL.ROI_RELATION_HEAD.NUM_CLASSES

From github
This implementation adds support for COCO-style datasets. But adding support for training on a new dataset can be done as follows:

from maskrcnn_benchmark.structures.bounding_box import BoxList

class MyDataset(object):
    def __init__(self, ...):
        # as you would do normally

    def __getitem__(self, idx):
        # load the image as a PIL Image
        image = ...

        # load the bounding boxes as a list of list of boxes
        # in this case, for illustrative purposes, we use
        # x1, y1, x2, y2 order.
        boxes = [[0, 0, 10, 10], [10, 20, 50, 50]]
        # and labels
        labels = torch.tensor([10, 20])

        # create a BoxList from the boxes
        boxlist = BoxList(boxes, image.size, mode="xyxy")
        # add the labels to the boxlist
        boxlist.add_field("labels", labels)

        if self.transforms:
            image, boxlist = self.transforms(image, boxlist)

        # return the image, the boxlist and the idx in your dataset
        return image, boxlist, idx

    def get_img_info(self, idx):
        # get img_height and img_width. This is used if
        # we want to split the batches according to the aspect ratio
        # of the image, as it can be more efficient than loading the
        # image from disk
        return {"height": img_height, "width": img_width}

Dataloader

https://blog.csdn.net/lt1103725556/article/details/115406360

同样从train_net.py起
from maskrcnn_benchmark.data import make_data_loader：导入的包，同样先看data目录里的__init_.py
init.py
maskrcnn_benchmark/data/build.py def make_data_loader
最后两行build_transforms和build_dataset

from .transforms import build_transforms
看data/transforms文件夹下
def build_transforms
build_transforms根据cfg里面的布尔值，为dataloader添加各种transforms
build_transforms返回T.compose的对象，包含了各种图像处理操作
2）datasets = build_dataset(dataset_list, transforms, DatasetCatalog, is_train)
dataset_list是一个字符串list，包含用于训练的数据集名字
就是返回一个dataset对象，把list里的dataset合并成一个，同时应用各种transforms

resnet.py

https://blog.csdn.net/leijieZhang/article/details/90730922
在这里插入图片描述
以下代码通过控制区块的多少，搭建出不同的Resnet

# -----------------------------------------------------------------------------
# Standard ResNet models
# -----------------------------------------------------------------------------
# ResNet-50 (包括所有的阶段)
# ResNet 分为５个阶段，但是第一个阶段都相同，变化是从第二个阶段开始的，所以下面的index是从第二个阶段开始编号的。其中block_count为该阶段区块的个数
ResNet50StagesTo5 = tuple(
    StageSpec(index=i, block_count=c, return_features=r)
    for (i, c, r) in ((1, 3, False), (2, 4, False), (3, 6, False), (4, 3, True))
)
# ResNet-50 up to stage 4 (excludes stage 5)
ResNet50StagesTo4 = tuple(
    StageSpec(index=i, block_count=c, return_features=r)
    for (i, c, r) in ((1, 3, False), (2, 4, False), (3, 6, True))
)
# ResNet-101 (including all stages)
ResNet101StagesTo5 = tuple(
    StageSpec(index=i, block_count=c, return_features=r)
    for (i, c, r) in ((1, 3, False), (2, 4, False), (3, 23, False), (4, 3, True))
)
# ResNet-101 up to stage 4 (excludes stage 5)
ResNet101StagesTo4 = tuple(
    StageSpec(index=i, block_count=c, return_features=r)
    for (i, c, r) in ((1, 3, False), (2, 4, False), (3, 23, True))
)
# ResNet-50-FPN (including all stages)
ResNet50FPNStagesTo5 = tuple(
    StageSpec(index=i, block_count=c, return_features=r)
    for (i, c, r) in ((1, 3, True), (2, 4, True), (3, 6, True), (4, 3, True))
)
# ResNet-101-FPN (including all stages)
ResNet101FPNStagesTo5 = tuple(
    StageSpec(index=i, block_count=c, return_features=r)
    for (i, c, r) in ((1, 3, True), (2, 4, True), (3, 23, True), (4, 3, True))
)
# ResNet-152-FPN (including all stages)
ResNet152FPNStagesTo5 = tuple(
    StageSpec(index=i, block_count=c, return_features=r)
    for (i, c, r) in ((1, 3, True), (2, 8, True), (3, 36, True), (4, 3, True))
)

根据以上的不同组合方案，maskrcnn benchmark可以搭建起不同的backbone:

def _make_stage(
    transformation_module,
    in_channels,
    bottleneck_channels,
    out_channels,
    block_count,
    num_groups,
    stride_in_1x1,
    first_stride,
    dilation=1,
    dcn_config={}
):
    blocks = []
    stride = first_stride
    # 根据不同的配置，构造不同的卷基层
    for _ in range(block_count):
        blocks.append(
            transformation_module(
                in_channels,
                bottleneck_channels,
                out_channels,
                num_groups,
                stride_in_1x1,
                stride,
                dilation=dilation,
                dcn_config=dcn_config
            )
        )
        stride = 1
        in_channels = out_channels
    return nn.Sequential(*blocks)

这几种不同的backbone之后被集成为一个统一的对象以便于调用，其代码为:

_STAGE_SPECS = Registry({
    "R-50-C4": ResNet50StagesTo4,
    "R-50-C5": ResNet50StagesTo5,
    "R-101-C4": ResNet101StagesTo4,
    "R-101-C5": ResNet101StagesTo5,
    "R-50-FPN": ResNet50FPNStagesTo5,
    "R-50-FPN-RETINANET": ResNet50FPNStagesTo5,
    "R-101-FPN": ResNet101FPNStagesTo5,
    "R-101-FPN-RETINANET": ResNet101FPNStagesTo5,
    "R-152-FPN": ResNet152FPNStagesTo5,
})

Registry.py

https://blog.csdn.net/leijieZhang/article/details/90747741

registry的主要作用是作为模型字典来保存生成的不同网络结构模型，其有两处定义分别为utils里面定义的Registry.py和在modeling里面定义的Registry.py。前者是定义数据类型，后者是这个字典类型的一个集合，为保存不同的网络结构提供了不同的模型字典。

backbone.py

https://blog.csdn.net/leijieZhang/article/details/90748788

在backbone.py文件中，定义了各种不同的backbone结构，并使用Registry装饰器类来实现构造这些backbone结构函数的调用。指的一提的是，backbone.py将不同的模块搭建出拥有不同功能的backbone结构，为边框预测等操作提供各自合适的特征提取网络；

其中Resnet的第2个stage非常重要，因为第一个阶段就是对原始图像的一次粗糙的特征提取，从第二阶段开始往后，特征图的大小缩小两倍，但是每个stage的输入输出通道扩大两倍，这是典型的用通道换面积。

fpn.py

https://blog.csdn.net/leijieZhang/article/details/90749819
FPN网络主要应用于多层特征提取，使用多尺度的特征层来进行目标检测，可以利用不同的特征层对于不同大小特征的敏感度不同，将他们充分利用起来，以更有利于目标检测.

image_list.py

https://blog.csdn.net/leijieZhang/article/details/90898439
image_list.py主要是用于保存图像列表的数据类型以及将其他保存图像列表的数据类型转换成maskrcnn benchmark所规定的ImageList类型。ImageList的属性包含一个保存图像列表的张量还有一个保存图像大小的列表，其代码为：

class ImageList(object):
    """
    Structure that holds a list of images (of possibly
    varying sizes) as a single tensor.
    This works by padding the images to the same size,
    and storing in a field the original sizes of each image
    用于保存图片（大小可以不同)列表并使之成为一个单一张量的数据结构，
    通过扩充图像到相同的大小来保存原图像的大小
    """
 
    def __init__(self, tensors, image_sizes):
        """
        Arguments:
            tensors (tensor)
            image_sizes (list[tuple[int, int]])
        """
        # 该数据类型有两个属性，保存图像数据的tensors，和图像各自的大小image_sizes
        self.tensors = tensors
        self.image_sizes = image_sizes
 
    # 实现转换功能，比如todevice todetype等等
    def to(self, *args, **kwargs):
        cast_tensor = self.tensors.to(*args, **kwargs)
        return ImageList(cast_tensor, self.image_sizes)

本文件还提供了将其他形式的用于保存图像列表的数据结构转换成ImageList格式，其他数据类型可以是一个ImageList，一个torch.Tensor或者一个可以迭代的张量，但是不能是一个numpy数列。当tensors是一个可以迭代的张量时，它将通过填充０将图片扩充成同样的大小ImageList。其代码为：

def to_image_list(tensors, size_divisible=0):
    """
    tensors can be an ImageList, a torch.Tensor or
    an iterable of Tensors. It can't be a numpy array.
    When tensors is an iterable of Tensors, it pads
    the Tensors with zeros so that they have the same
    shape
    参数tensors可以是一个ImageList，一个torch.Tensor或者一个可以迭代的张量
    但是不能是一个numpy数列。当tensors是一个可以迭代的张量时，它将通过填充０将图片
    扩充成同样的大小
    """
    # 如果是torch里的张亮类型，图片大小一样
    if isinstance(tensors, torch.Tensor) and size_divisible > 0:
        tensors = [tensors]
 
    # 如果是ImageList类型
    if isinstance(tensors, ImageList):
        return tensors
 
    elif isinstance(tensors, torch.Tensor):
        # single tensor shape can be inferred
        # 只有一张图片的３个维度，表示只有一张图片
        if tensors.dim() == 3:
            tensors = tensors[None]
        # 如果有多张图片
        assert tensors.dim() == 4
        # 取出shape最后两个维度，即图片的宽和高
        image_sizes = [tensor.shape[-2:] for tensor in tensors]
        # 返回含有图片宽和高信息的数组
        return ImageList(tensors, image_sizes)
    # 如果是可迭代的张量
    elif isinstance(tensors, (tuple, list)):
        # 找到最大的图片的尺寸
        max_size = tuple(max(s) for s in zip(*[img.shape for img in tensors]))
 
        # TODO Ideally, just remove this and let me model handle arbitrary
        # input sizs如果图片大小不一样，　size_divisible什么意思不清楚目前
        if size_divisible > 0:
            import math
 
            stride = size_divisible
            max_size = list(max_size)
            max_size[1] = int(math.ceil(max_size[1] / stride) * stride)
            max_size[2] = int(math.ceil(max_size[2] / stride) * stride)
            max_size = tuple(max_size)
        # 填充为大小一致的张量
        batch_shape = (len(tensors),) + max_size
        batched_imgs = tensors[0].new(*batch_shape).zero_()
        for img, pad_img in zip(tensors, batched_imgs):
            pad_img[: img.shape[0], : img.shape[1], : img.shape[2]].copy_(img)
 
        image_sizes = [im.shape[-2:] for im in tensors]
 
        return ImageList(batched_imgs, image_sizes)
    else:
        raise TypeError("Unsupported type for to_image_list: {}".format(type(tensors)))

box_coder.py

https://blog.csdn.net/leijieZhang/article/details/90900906
在这里插入图片描述

在卷积神经网络中，数据的映射关系是通过卷积层或者全连接层实现的，也就是说映射关系可以由卷积神经网络模拟出来，用一个全连接层或者一个卷积层就可以。

class BoxCoder(object):
    """
    This class encodes and decodes a set of bounding boxes into
    the representation used for training the regressors.
    BoxCoder主要用于编码和解码基准边框（bounding boxes），并将其应用到回归训练中
    本BoxCoder主要用于解决RCNN论文里提到的Bounding-box regression
    """
 
    def __init__(self, weights, bbox_xform_clip=math.log(1000. / 16)):
        """
        Arguments:
            weights (4-element tuple)
            weights:表示的是x, y, w, h在运算中所占的权重　　　
            bbox_xform_clip (float)
        """
        self.weights = weights
        # 边框的长和宽的最高值
        self.bbox_xform_clip = bbox_xform_clip
 
    def encode(self, reference_boxes, proposals):
        """
        Encode a set of proposals with respect to some
        reference boxes
        Arguments:
            reference_boxes (Tensor): reference boxes
            proposals (Tensor): boxes to be encoded
        """
        """
        这个Encode的作用是实现RCNN中提到的边框回归，其中的回归目标（regression target)t*
        的计算，主要是计算候选框与与之相关的基准框的偏差
        参数：
            proposals：候选边框，由一定规则选出来的可能含有目标的目标边框
            reference_boxes：与候选边框重叠度最高的基准边框gt
        """
 
        # 计算两个数之间的真实距离，需要相减之后加１
        TO_REMOVE = 1  # TODO remove
        # 计算候选框的宽度
        ex_widths = proposals[:, 2] - proposals[:, 0] + TO_REMOVE
        # 计算候选框的高度
        ex_heights = proposals[:, 3] - proposals[:, 1] + TO_REMOVE
        # 计算候选框中心的ｘ坐标
        ex_ctr_x = proposals[:, 0] + 0.5 * ex_widths
        # 计算候选框中心的ｙ坐标
        ex_ctr_y = proposals[:, 1] + 0.5 * ex_heights
 
        # 计算基准边框（ground truth)的宽度
        gt_widths = reference_boxes[:, 2] - reference_boxes[:, 0] + TO_REMOVE
        # 计算基准边框（ground truth)的高度
        gt_heights = reference_boxes[:, 3] - reference_boxes[:, 1] + TO_REMOVE
        # 计算基准边框（ground truth)中心的ｘ坐标
        gt_ctr_x = reference_boxes[:, 0] + 0.5 * gt_widths
        # 计算基准边框（ground truth)中心的ｙ坐标
        gt_ctr_y = reference_boxes[:, 1] + 0.5 * gt_heights
 
        # 得到计算回归目标时各个部分的权重
        wx, wy, ww, wh = self.weights
        # 计算带有权重的回归目标的各个部分
        targets_dx = wx * (gt_ctr_x - ex_ctr_x) / ex_widths
        targets_dy = wy * (gt_ctr_y - ex_ctr_y) / ex_heights
        targets_dw = ww * torch.log(gt_widths / ex_widths)
        targets_dh = wh * torch.log(gt_heights / ex_heights)
 
        # 将回归目标的各个部分合并为一个元组，并依次保存到一个栈里
        targets = torch.stack((targets_dx, targets_dy, targets_dw, targets_dh), dim=1)
        return targets
 
    def decode(self, rel_codes, boxes):
        """
        From a set of original boxes and encoded relative box offsets,
        get the decoded boxes.
        Arguments:
            rel_codes (Tensor): encoded boxes
            boxes (Tensor): reference boxes.
        """
        """
        根据得到的候选框以及与之对应的中心ｘ，ｙ宽和高的各部分的回归值和得到预测边框
        参数：
            rel_codes：根据候选框与基准边框（ground truth)的差距计算出来的候选边框中心ｘ，ｙ宽和高的各部分的变差回归值
            boxes：候选边框
        """
        # 将候选边框的数据类型设置成回归目标值一样的类型
        boxes = boxes.to(rel_codes.dtype)
 
        # 计算两个数之间的真实距离，需要相减之后加１
        TO_REMOVE = 1  # TODO remove
        # 计算候选框的宽度
        widths = boxes[:, 2] - boxes[:, 0] + TO_REMOVE
        # 计算候选框的高度
        heights = boxes[:, 3] - boxes[:, 1] + TO_REMOVE
        # 计算候选框中心的ｘ坐标
        ctr_x = boxes[:, 0] + 0.5 * widths
        # 计算候选框中心的ｙ坐标
        ctr_y = boxes[:, 1] + 0.5 * heights
 
        # 得到计算回归目标时各个部分的权重
        wx, wy, ww, wh = self.weights
        # 计算去除权重的回归目标的各个部分
        dx = rel_codes[:, 0::4] / wx
        dy = rel_codes[:, 1::4] / wy
        dw = rel_codes[:, 2::4] / ww
        dh = rel_codes[:, 3::4] / wh
 
        # Prevent sending too large values into torch.exp()
        # 防止候选框的长和宽过大影响到torch.exp()，控制其大小
        dw = torch.clamp(dw, max=self.bbox_xform_clip)
        dh = torch.clamp(dh, max=self.bbox_xform_clip)
 
        # 计算预测框中心的ｘ坐标
        pred_ctr_x = dx * widths[:, None] + ctr_x[:, None]
        # 计算预测框中心的ｙ坐标
        pred_ctr_y = dy * heights[:, None] + ctr_y[:, None]
        # 计算预测框的宽度
        pred_w = torch.exp(dw) * widths[:, None]
        # 计算预测框的高度
        pred_h = torch.exp(dh) * heights[:, None]
 
        # 将预测狂转换为标准边框格式
        pred_boxes = torch.zeros_like(rel_codes)
        # x1
        pred_boxes[:, 0::4] = pred_ctr_x - 0.5 * pred_w
        # y1
        pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h
        # x2 (note: "- 1" is correct; don't be fooled by the asymmetry)
        pred_boxes[:, 2::4] = pred_ctr_x + 0.5 * pred_w - 1
        # y2 (note: "- 1" is correct; don't be fooled by the asymmetry)
        pred_boxes[:, 3::4] = pred_ctr_y + 0.5 * pred_h - 1
 
        return pred_boxes

make_layers.py

https://blog.csdn.net/leijieZhang/article/details/90907322

maskrcnn benchmark中各种层都是经过处理的，比如GN层需要判断通道数和组数合不合规范、group normalization是所有通道都参于还是一部分通道参与规范化、卷基层与初始化等操作组和成统一的组件以及卷基层的类型是否是空洞卷积等等。

等到别的地方调用make_layers.py中的这些网络模型组件的时候，这里定义的卷基层、全连接层都是以一个整体出现的，包含了GN以及经过激活函数的激活层等一系列操作，调用起来更为方便，调用的时候不用在考虑详细的初始化规范化等操作。其详细代码如下：

# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
"""
Miscellaneous utility functions
"""
 
import torch
from torch import nn
from torch.nn import functional as F
from maskrcnn_benchmark.config import cfg
from maskrcnn_benchmark.layers import Conv2d
from maskrcnn_benchmark.modeling.poolers import Pooler
 
 
# todo 根据通道数来获得group normalization的群组数
def get_group_gn(dim, dim_per_gp, num_groups):
    """get number of groups used by GroupNorm, based on number of channels."""
    # 如果每一组的通道数为负数或者组的个数为负数则报错
    assert dim_per_gp == -1 or num_groups == -1, \
        "GroupNorm: can only specify G or C/G."
 
    # 如果存在每组通道数，则通过总通道数和每组通道数来确定组数
    if dim_per_gp > 0:
        # 如果总通道数不是每组通道数整数倍，即每组通道数不合理，则报错
        assert dim % dim_per_gp == 0, \
            "dim: {}, dim_per_gp: {}".format(dim, dim_per_gp)
        # 通过总通道数和每组通道数来确定组数
        group_gn = dim // dim_per_gp
    else:
        # 如果总通道数不是组数整数倍，即组的个数不合理，则报错
        assert dim % num_groups == 0, \
            "dim: {}, num_groups: {}".format(dim, num_groups)
        # 通过总通道数和组数来确定组数
        group_gn = num_groups
 
    return group_gn
 
 
# todo 实现group normalization
def group_norm(out_channels, affine=True, divisor=1):
    # divisor一般为１是除子，当大于１时只有一部分的通道参与ＧＮ
    out_channels = out_channels // divisor
    dim_per_gp = cfg.MODEL.GROUP_NORM.DIM_PER_GP // divisor
    num_groups = cfg.MODEL.GROUP_NORM.NUM_GROUPS // divisor
    # 防止求标准差开平方的时候为０这样数学操作有错
    eps = cfg.MODEL.GROUP_NORM.EPSILON # default: 1e-5
    return torch.nn.GroupNorm(
        get_group_gn(out_channels, dim_per_gp, num_groups), 
        out_channels, 
        eps, 
        affine
    )
 
 
# todo 将卷基层与之后的标准化层（GN）初始化，激活层等一系列层结合成统一的一个整体
def make_conv3x3(
    in_channels, 
    out_channels, 
    dilation=1, 
    stride=1, 
    use_gn=False,
    use_relu=False,
    kaiming_init=True
):
    # 规定卷基层
    conv = Conv2d(
        in_channels, 
        out_channels, 
        kernel_size=3, 
        stride=stride, 
        padding=dilation, 
        dilation=dilation, 
        bias=False if use_gn else True
    )
    # 参数初始化
    if kaiming_init:
        nn.init.kaiming_normal_(
            conv.weight, mode="fan_out", nonlinearity="relu"
        )
    else:
        torch.nn.init.normal_(conv.weight, std=0.01)
    # 组标准化GN
    if not use_gn:
        nn.init.constant_(conv.bias, 0)
    module = [conv,]
    if use_gn:
        module.append(group_norm(out_channels))
    # 激活层
    if use_relu:
        module.append(nn.ReLU(inplace=True))
    if len(module) > 1:
        return nn.Sequential(*module)
    return conv
 
 
# todo 将全链接层与之后的标准化层（GN）初始化等一系列层结合成统一的一个整体
def make_fc(dim_in, hidden_dim, use_gn=False):
    '''
        Caffe2 implementation uses XavierFill, which in fact
        corresponds to kaiming_uniform_ in PyTorch
    '''
    # 如果使用GN
    if use_gn:
        # 得到全连接层
        fc = nn.Linear(dim_in, hidden_dim, bias=False)
        # 初始化
        nn.init.kaiming_uniform_(fc.weight, a=1)
        # 返回ＧＮ后的初始化的全连接层
        return nn.Sequential(fc, group_norm(hidden_dim))
    # 得到全连接层
    fc = nn.Linear(dim_in, hidden_dim)
    # 初始化
    nn.init.kaiming_uniform_(fc.weight, a=1)
    nn.init.constant_(fc.bias, 0)
    # 直接返回全连接层
    return fc
 
 
# todo 将卷基层与之后的标准化层（GN），何凯明初始化，激活层等一系列层结合成统一的一个整体
def conv_with_kaiming_uniform(use_gn=False, use_relu=False):
    def make_conv(
        in_channels, out_channels, kernel_size, stride=1, dilation=1
    ):
        # 规定卷基层，使用空洞卷积
        conv = Conv2d(
            in_channels, 
            out_channels, 
            kernel_size=kernel_size, 
            stride=stride, 
            padding=dilation * (kernel_size - 1) // 2, 
            dilation=dilation, 
            bias=False if use_gn else True
        )
        # Caffe2 implementation uses XavierFill, which in fact
        # corresponds to kaiming_uniform_ in PyTorch
        # 初始化卷基层的权重都为１
        nn.init.kaiming_uniform_(conv.weight, a=1)
        if not use_gn:
            nn.init.constant_(conv.bias, 0)
        module = [conv,]
        # 采用ＧＮ
        if use_gn:
            module.append(group_norm(out_channels))
        # 加入激活层
        if use_relu:
            module.append(nn.ReLU(inplace=True))
        if len(module) > 1:
            return nn.Sequential(*module)
        return conv
 
    return make_conv

modeling/utils.py

https://blog.csdn.net/leijieZhang/article/details/90909781

def cat(tensors, dim=0):
    """
    Efficient version of torch.cat that avoids a copy if there is only a single element in a list
    """
    """
    重写了cat函数，使得当传入的张量只有一个元素的时候避免做拼接操作，只有可拼接的情况下，即有多个元素的情况下进行拼接
    """
    assert isinstance(tensors, (list, tuple))
    if len(tensors) == 1:
        return tensors[0]
    return torch.cat(tensors, dim)

anchor_generator.py

https://blog.csdn.net/leijieZhang/article/details/91359429?spm=1001.2014.3001.5501

在RPN中需要生成拥有不同步长(stride)不同大小(size)以及不同长宽比(ratio)的anchor,　在maskrcnn_benchmark中，这一工作由anchor_generator.py完成。anchor_generator.py生成所需的不同特征图上的anchor，需要注意的是，如果在backbone中采用了FPN的话，需要保证，stride个数，size个数以及FPN的层数相同。

balanced_positive_negative_sampler.py

在二阶段目标检测的卷积神经网络模型中，在经过RPN筛选出相应的候选边框（Proposal)中应既包含内容为背景的边框又包含内容为目标的边框。这就需要一次对这两种边框数目的调整，以使得模型训练效果更好，这两种边框的选择比例一般为１：１，但是随着检测任务的不同而不同。在maskrcnn benchmark中实现这一操作的代码为balanced_positive_negative_sampler.py：

# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
import torch
 
 
class BalancedPositiveNegativeSampler(object):
    """
    This class samples batches, ensuring that they contain a fixed proportion of positives
    """
    """
    这个类是用来平衡选出来的候选框中正样本和负样本的比例平衡一些
    """
 
    def __init__(self, batch_size_per_image, positive_fraction):
        """
        Arguments:
            batch_size_per_image (int): number of elements to be selected per image
            positive_fraction (float): percentace of positive elements per batch
        """
        """
        参数：
            batch_size_per_image：每一张图片上提取的候选框的个数
            positive_fraction：在提取的候选框中正样本应占有的比例
        """
        self.batch_size_per_image = batch_size_per_image
        self.positive_fraction = positive_fraction
 
    def __call__(self, matched_idxs):
        """
        Arguments:
            matched idxs: list of tensors containing -1, 0 or positive values.
                Each tensor corresponds to a specific image.
                -1 values are ignored, 0 are considered as negatives and > 0 as
                positives.
        Returns:
            pos_idx (list[tensor])
            neg_idx (list[tensor])
        Returns two lists of binary masks for each image.
        The first list contains the positive elements that were selected,
        and the second list the negative example.
        """
        """
        参数：
            matched idxs:与获得的候选框相匹配的目标标签，值为-1,0或者是正的值。
                        　它包含n个张量，n为每次训练的图片个数，每个张量对应一张图片
                        -1为需要忽略的边框，一般为0.3<iou<0.7的边框
                        ０为背景边框，一般为iou<0.3,大于0的边框为目标边框
        返回值：
            pos_idx：　含有目标的边框的索引
            neg_idx：　内容为背景的边框的索引            
        """
        # 初始化目标边框索引列表和背景边框索引列表
        pos_idx = []
        neg_idx = []
        # 按照图片个数，一张图片一张图片的处理
        for matched_idxs_per_image in matched_idxs:
            # 取出图片中目标边框和背景边框的索引
            positive = torch.nonzero(matched_idxs_per_image >= 1).squeeze(1)
            negative = torch.nonzero(matched_idxs_per_image == 0).squeeze(1)
 
            # 得到根据默认参数算出的目标边框的个数
            num_pos = int(self.batch_size_per_image * self.positive_fraction)
            # protect against not enough positive examples
            # 当得到的目标边框少于默认参数需要的边框时，从实际出发，将边框数设置为较小的那一个
            num_pos = min(positive.numel(), num_pos)
            # 得到根据默认参数算出的背景边框的个数
            num_neg = self.batch_size_per_image - num_pos
            # protect against not enough negative examples
            # 当得到的背景边框少于默认参数需要的边框时，从实际出发，将边框数设置为较小的那一个
            num_neg = min(negative.numel(), num_neg)
 
            # randomly select positive and negative examples
            # 随机打乱得到的边框
            perm1 = torch.randperm(positive.numel(), device=positive.device)[:num_pos]
            perm2 = torch.randperm(negative.numel(), device=negative.device)[:num_neg]
 
            pos_idx_per_image = positive[perm1]
            neg_idx_per_image = negative[perm2]
 
            # create binary mask from indices
            # 根据索引为确定得到图片边框内是否建立mask
            pos_idx_per_image_mask = torch.zeros_like(
                matched_idxs_per_image, dtype=torch.uint8
            )
            neg_idx_per_image_mask = torch.zeros_like(
                matched_idxs_per_image, dtype=torch.uint8
            )
            pos_idx_per_image_mask[pos_idx_per_image] = 1
            neg_idx_per_image_mask[neg_idx_per_image] = 1
 
            pos_idx.append(pos_idx_per_image_mask)
            neg_idx.append(neg_idx_per_image_mask)
 
        return pos_idx, neg_idx

https://pytorch.org/docs/stable/generated/torch.nonzero.html
torch.nonzero(input, *, out=None, as_tuple=False) → LongTensor or tuple of LongTensors
torch.nonzero(…, as_tuple=False) (default) returns a 2-D tensor where each row is the index for a nonzero value.

https://pytorch.org/docs/stable/generated/torch.squeeze.html
去除维度为1的那个维度
在这里插入图片描述
Example:

>>> x = torch.zeros(2, 1, 2, 1, 2)
>>> x.size()
torch.Size([2, 1, 2, 1, 2])
>>> y = torch.squeeze(x)
>>> y.size()
torch.Size([2, 2, 2])
>>> y = torch.squeeze(x, 0)
>>> y.size()
torch.Size([2, 1, 2, 1, 2])
>>> y = torch.squeeze(x, 1)
>>> y.size()
torch.Size([2, 2, 1, 2])

https://pytorch.org/docs/stable/generated/torch.numel.html
在这里插入图片描述

TORCH.RANDPERM
https://pytorch.org/docs/stable/generated/torch.randperm.html

Returns a random permutation of integers from 0 to n - 1.
返回从0到n-1的整数的随机排列。

>>> torch.randperm(4)
tensor([2, 1, 0, 3])

File “./maskrcnn_benchmark/modeling/rpn/loss.py”, line 106, in call
sampled_pos_inds, sampled_neg_inds = self.fg_bg_sampler(labels)
File “./maskrcnn_benchmark/modeling/balanced_positive_negative_sampler.py”, line 53, in call
neg_idx_per_image = negative[perm2]
RuntimeError: CUDA error: device-side assert triggered

Remove the “device=negative.device” part can slove the issue

下面记录一下我是如何发现这个错误并找到解决办法的

class BalancedPositiveNegativeSampler(object):
    """
    This class samples batches, ensuring that they contain a fixed proportion of positives
    """

    def __init__(self, batch_size_per_image, positive_fraction):
        """
        Arguments:
            batch_size_per_image (int): number of elements to be selected per image
            positive_fraction (float): percentace of positive elements per batch
        """
        self.batch_size_per_image = batch_size_per_image
        self.positive_fraction = positive_fraction

    def __call__(self, matched_idxs):
        """
        Arguments:
            matched idxs: list of tensors containing -1, 0 or positive values.
                Each tensor corresponds to a specific image.
                -1 values are ignored, 0 are considered as negatives and > 0 as
                positives.

        Returns:
            pos_idx (list[tensor])
            neg_idx (list[tensor])

        Returns two lists of binary masks for each image.
        The first list contains the positive elements that were selected,
        and the second list the negative example.
        """
        pos_idx = []
        neg_idx = []
        for matched_idxs_per_image in matched_idxs: # matched_idxs_per_image is the tensor with torch.Size([187940]) which contains -1, 0 or positive values
            
            # torch.nonzero: Returns a 2-D tensor where each row is the index for a nonzero value;
            # .squeeze(1): Returns a tensor with all the dimensions of input of size 1 removed.
            positive = torch.nonzero(matched_idxs_per_image >= 1).squeeze(1) 
            negative = torch.nonzero(matched_idxs_per_image == 0).squeeze(1)

            num_pos = int(self.batch_size_per_image * self.positive_fraction)
            # protect against not enough positive examples
            num_pos = min(positive.numel(), num_pos)
            num_neg = self.batch_size_per_image - num_pos
            # protect against not enough negative examples
            num_neg = min(negative.numel(), num_neg)

            # randomly select positive and negative examples
            # torch.randperm: Returns a random permutation of integers from 0 to n - 1.
            # .numel(): Returns the total number of elements in the input tensor
            
            #perm1 = torch.randperm(positive.numel(), device=positive.device)[:num_pos]
            #perm2 = torch.randperm(negative.numel(), device=negative.device)[:num_neg] ################# the elments inside the perm2 are too large

            perm1 = torch.randperm(positive.numel())[:num_pos]
            perm2 = torch.randperm(negative.numel())[:num_neg] ################# the elments inside the perm2 are too large
            perm1.cuda()
            perm2.cuda()

            # print(negative.numel())
            # a = torch.randperm(negative.numel(), device=negative.device)[:num_neg]
            # print(a.shape)
            # print(a)

            # print('mengyaaaaaaaaaaaaaaaaaaaaaaaa')
            # print(num_neg) # 252
            
            # # print('positive', positive) # positive tensor([186182, 186258, 186334, 187722], device='cuda:1')
            # # print('negative',negative) # negative torch.Size([154822]), negative tensor([  1248,   1252,   1256,  ..., 187072, 187076, 187594], device='cuda:1')

            # # print('perm1',perm1) # perm1 tensor([1, 2, 3, 0], device='cuda:1') index smallers
            # print('perm2', perm2) # perm2 torch.Size([252])
            # print('perm2', perm2[251]) # tensor(4602819557773017088, device='cuda:1')
            
            pos_idx_per_image = positive[perm1] # List[index], re-arrange the tensor 
            # print('pos_idx_per_image',pos_idx_per_image) # tensor([186258, 186334, 187722, 186182], device='cuda:1')
            neg_idx_per_image = negative[perm2]  ################### original one
            
            
            # print('neg_idx_per_image',neg_idx_per_image)

            # create binary mask from indices
            pos_idx_per_image_mask = torch.zeros_like(
                matched_idxs_per_image, dtype=torch.uint8
            )
            neg_idx_per_image_mask = torch.zeros_like(
                matched_idxs_per_image, dtype=torch.uint8
            )
            pos_idx_per_image_mask[pos_idx_per_image] = 1
            neg_idx_per_image_mask[neg_idx_per_image] = 1

            pos_idx.append(pos_idx_per_image_mask)
            neg_idx.append(neg_idx_per_image_mask)

        return pos_idx, neg_idx

modeling/rpn/loss.py

https://blog.csdn.net/leijieZhang/article/details/91588292
在maskrcnn_benchmark的RPN层，选取预测边框（Proposal）的过程和计算边框目标得分（objectiveness）以及计算边框回归（box regression）的损失值的过程不是同步的，过程也有偏差。

选取边框是从多个特征层分别选取若干个目标得分高的边框，然后再从这些边框里选出若干预测边框（proposal）或者单个特征层（非FPN）提取目标得分（objectiveness）高的若干边框。

而对于计算损失函数(loss)的过程则不同，首先要得到与所有的锚点(anchor)与基准边框(ground truth box)两者之间互相的IoU，相对应的基准边框(ground truth box), 然后再计算每个锚点所对应的基准边框。

得到每个锚点和其对应的基准边框后，就可以从所有的锚点中选择合适的锚点计算损失函数(loss)了。首先是给所有的锚点打标签，将不采纳的标签赋值为-1，内容为背景的锚点标签为0,内容含有目标的锚点赋值为1。这一就可以从标签为0和1的锚点里随机的筛选出符合一定个数和比例的背景锚点与含目标锚点，以这些锚点为基础就可以计算损失函数(loss)。

loss分为两部分，第一部分为锚点评分的损失函数，他的比较对象是锚点的标签，为1的情况表示该锚点含有目标的概率为１，因此用网络模型得到的锚点目标得分(objectiveness)与锚点的标签(label)对比得到损失值；第二部分为边框回归损失，通过锚点的边框回归层(box regression)与实际计算出来的锚点(anchor)与基准框(ground truth box)的偏差值对比得到损失值。

# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
"""
This file contains specific functions for computing losses on the RPN
file
"""
 
import torch
from torch.nn import functional as F
 
from .utils import concat_box_prediction_layers
 
from ..balanced_positive_negative_sampler import BalancedPositiveNegativeSampler
from ..utils import cat
 
from maskrcnn_benchmark.layers import smooth_l1_loss
from maskrcnn_benchmark.modeling.matcher import Matcher
from maskrcnn_benchmark.structures.boxlist_ops import boxlist_iou
from maskrcnn_benchmark.structures.boxlist_ops import cat_boxlist
 
 
class RPNLossComputation(object):
    """
    This class computes the RPN loss.
    """
 
    def __init__(self, proposal_matcher, fg_bg_sampler, box_coder,
                 generate_labels_func):
        """
        Arguments:
            proposal_matcher (Matcher)
            fg_bg_sampler (BalancedPositiveNegativeSampler)
            box_coder (BoxCoder)
        """
        # self.target_preparator = target_preparator
        # 指定边框匹配函数，用于找到与锚点对应的基准边框(gt）
        self.proposal_matcher = proposal_matcher
        # 指定目标锚点和背景锚点选择函数，用于选择一定比例的背景和目标边框
        self.fg_bg_sampler = fg_bg_sampler
        # 指定边框编码函数，用以实现边框回归和得到预测边框
        self.box_coder = box_coder
        # 初始化需要复制的属性
        self.copied_fields = []
        # 指定标签生成函数，用以生成锚点对应的基准边框的索引
        self.generate_labels_func = generate_labels_func
        # 指定需要放弃的锚点类型
        self.discard_cases = ['not_visibility', 'between_thresholds']
 
    # todo 找到所有锚点对应的基准边框(gt)
    def match_targets_to_anchors(self, anchor, target, copied_fields=[]):
        # 计算所有锚点与所有基准边框之间的IoU
        match_quality_matrix = boxlist_iou(target, anchor)
        # 计算所有锚点各自对应的基准边框的索引，包括背景的索引为-1,等
        matched_idxs = self.proposal_matcher(match_quality_matrix)
        # RPN doesn't need any fields from target
        # for creating the labels, so clear them all
        # 将需要复制的属性复制到基准边框列表中
        target = target.copy_with_fields(copied_fields)
        # get the targets corresponding GT for each anchor
        # NB: need to clamp the indices because we can have a single
        # GT in the image, and matched_idxs can be -2, which goes
        # out of bounds
        # 得到所有锚点各自对应的基准边框索引，将背景边框等无对应边框的锚点统统映射到第一个基准边框
        matched_targets = target[matched_idxs.clamp(min=0)]
        # 得到所有锚点各自对应的基准边框列表
        matched_targets.add_field("matched_idxs", matched_idxs)
        return matched_targets
 
    # todo　获得锚点(anchor)的标签：-1为要舍弃的，０为背景，其余的为对应的gt。获得所有锚点与和其对应的gt的偏差，即边框回归
    def prepare_targets(self, anchors, targets):
        # 初始化锚点的标签
        labels = []
        # 初始化锚点与gt基准边框之间的偏差
        regression_targets = []
        # 循环从每一张图片中读取锚点和gt,然后进行处理
        for anchors_per_image, targets_per_image in zip(anchors, targets):
            # 得到与各个锚点对应的gt
            matched_targets = self.match_targets_to_anchors(
                anchors_per_image, targets_per_image, self.copied_fields
            )
            # 得到与各个锚点对应的gt的索引
            matched_idxs = matched_targets.get_field("matched_idxs")
            # 得到与各个锚点对应的gt的标签列表，其中０为舍弃，１为有用边框
            labels_per_image = self.generate_labels_func(matched_targets)
            labels_per_image = labels_per_image.to(dtype=torch.float32)
 
            # Background (negative examples)得到与各个锚点内容为背景的索引，并将其标签设为０
            bg_indices = matched_idxs == Matcher.BELOW_LOW_THRESHOLD
            labels_per_image[bg_indices] = 0
 
            # discard anchors that go out of the boundaries of the image
            # 将需要放弃的锚点的索引置为-1
            # 处理超出图片的锚点
            if "not_visibility" in self.discard_cases:
                labels_per_image[~anchors_per_image.get_field("visibility")] = -1
 
            # discard indices that are between thresholds
            # 丢掉IoU介于背景和目标之间的锚点
            if "between_thresholds" in self.discard_cases:
                inds_to_discard = matched_idxs == Matcher.BETWEEN_THRESHOLDS
                labels_per_image[inds_to_discard] = -1
 
            # compute regression targets计算每张图片中，所有锚点与其对应基准边框之间的偏差
            regression_targets_per_image = self.box_coder.encode(
                matched_targets.bbox, anchors_per_image.bbox
            )
            # 将标签信息和边框回归信息保存到最开始初始化的列表里
            labels.append(labels_per_image)
            regression_targets.append(regression_targets_per_image)
 
        return labels, regression_targets
 
 
    def __call__(self, anchors, objectness, box_regression, targets):
        """
        Arguments:
            anchors (list[BoxList]):生成的所有锚点
            objectness (list[Tensor])：由FPN得到的计算目标得分的特征图
            box_regression (list[Tensor])：由FPN得到的计算边框回归的特征图
            targets (list[BoxList])：每个图片上的基准边框(gt)
        Returns:
            objectness_loss (Tensor)
            box_loss (Tensor
        """
        # 分别将每一个图片的不同FPN层中生成的锚点合并起来
        anchors = [cat_boxlist(anchors_per_image) for anchors_per_image in anchors]
        # 分别得到每一个图片的所有锚点相对应的基准边框的列表
        labels, regression_targets = self.prepare_targets(anchors, targets)
        # 根据所有锚点的标签选取作为背景的锚点和作为目标的锚点的标签，该标签中0为未选择，１为选择
        sampled_pos_inds, sampled_neg_inds = self.fg_bg_sampler(labels)
        # 将选择的锚点标签转换为锚点的索引值，并去除掉索引列表中多余的维度
        sampled_pos_inds = torch.nonzero(torch.cat(sampled_pos_inds, dim=0)).squeeze(1)
        sampled_neg_inds = torch.nonzero(torch.cat(sampled_neg_inds, dim=0)).squeeze(1)
        # 将选中的正负锚点索引值合并到一起
        sampled_inds = torch.cat([sampled_pos_inds, sampled_neg_inds], dim=0)
 
        # 将所有图片中的RPN　Head中的边框目标得分层和边框回归层分别合并成统一张量N*ratio（或N*４ratio）
        # 的边框分类信息和边框回归信息
        objectness, box_regression = \
                concat_box_prediction_layers(objectness, box_regression)
        # 去除边框得分信息张量中的多余维度
        objectness = objectness.squeeze()
        # 将所有图片中的所有锚点的标签合并在一起
        labels = torch.cat(labels, dim=0)
        # 将所有图片中的所有锚点的边框偏差合并在一起
        regression_targets = torch.cat(regression_targets, dim=0)
        # 计算锚点边框损失，只是用随机选择的有目标的锚点进行计算
        box_loss = smooth_l1_loss(
            box_regression[sampled_pos_inds],
            regression_targets[sampled_pos_inds],
            beta=1.0 / 9,
            size_average=False,
        ) / (sampled_inds.numel())
        # 使用随机选择的背景锚点和有目标的锚点计算边框回归损失
        objectness_loss = F.binary_cross_entropy_with_logits(
            objectness[sampled_inds], labels[sampled_inds]
        )
        # 返回得到的边框目标得分损失和边框回归损失
        return objectness_loss, box_loss
 
 
# This function should be overwritten in RetinaNet
# todo 生成锚点的标签，有目标的锚点标签为１，没有的为0
def generate_rpn_labels(matched_targets):
    # 获取锚点对应的基准边框的索引，
    matched_idxs = matched_targets.get_field("matched_idxs")
    # 将有目标的锚点置为１
    labels_per_image = matched_idxs >= 0
    return labels_per_image
 
 
def make_rpn_loss_evaluator(cfg, box_coder):
    # 指定边框匹配函数，用于找到与锚点对应的基准边框(gt），指定两个非极大线性抑制参数等
    matcher = Matcher(
        cfg.MODEL.RPN.FG_IOU_THRESHOLD,
        cfg.MODEL.RPN.BG_IOU_THRESHOLD,
        allow_low_quality_matches=True,
    )
    # 指定目标锚点和背景锚点选择函数，用于选择一定比例的背景和目标边框，指定两种锚点的个数和比例
    fg_bg_sampler = BalancedPositiveNegativeSampler(
        cfg.MODEL.RPN.BATCH_SIZE_PER_IMAGE, cfg.MODEL.RPN.POSITIVE_FRACTION
    )
    # 调用RPN的损失计算函数
    loss_evaluator = RPNLossComputation(
        matcher,
        fg_bg_sampler,
        box_coder,
        generate_rpn_labels
    )
    return loss_evaluator

Errror:
https://blog.csdn.net/wz22881916/article/details/112691232
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED 可能原因
可能原因：

没有安装CUDNN
pytorch和cuda版本不对应。具体来说，是pytorch编译时cuda的版本和环境cuda版本不一致。
显卡和安装的CUDA及CUDNN版本不兼容。比如2080至少需要 cuda9.2及以上才可以较好运行。
内存不足，dataloder每次处理的数据过大
显存不足，OOM。有时候当程序调用cuDNN时遇到显存不足，此时可能不会报OOM，而会报cuDNN error
根据我的经验，如果在代码刚跑的时候就报cuDNN error，应该是前三种原因。

XuMengyaAmy

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
1
评论
Maskrcnn Code Understanding

Reference:1.https://blog.csdn.net/leijieZhang/article/details/91431846?spm=1001.2014.3001.5501 CSDN博主「leijieZhang」2. https://blog.csdn.net/leijieZhang/article/details/909034623. https://blog.csdn.net/xiangxianghehe/article/details/887936704. https://b
复制链接

扫一扫