FasterRCNN代码分析（二）

最新推荐文章于 2024-01-23 14:50:24 发布

深度菜鸟

最新推荐文章于 2024-01-23 14:50:24 发布

阅读量869

点赞数 20

分类专栏：目标检测文章标签：计算机视觉目标检测人工智能

本文链接：https://blog.csdn.net/m0_51619560/article/details/135277709

版权

目标检测专栏收录该内容

8 篇文章 2 订阅

订阅专栏

FasterRCNN代码分析

项目源码：https://github.com/chenyuntc/simple-faster-rcnn-pytorch

对源码增加注释后的代码（代码2）：

simple-faster-rcnn-pytorch-master https://www.alipan.com/s/faJGPR261aG 提取码: ry92 点击链接保存，或者复制本段内容，打开「阿里云盘」APP ，无需下载极速在线查看，视频原画倍速播放。
代码2只要将VOCdevkit数据集放到dataset目录下即可运行

这个项目是一个基于Faster R-CNN模型的目标检测项目，主要包含以下几个部分：

utils/config.py：这个文件包含了项目的配置选项，例如数据集路径、学习率、优化器类型、预训练模型的路径等。
data/dataset.py：这个文件包含了处理PASCAL VOC数据集的类，例如Dataset、TestDataset。这些类用于加载数据集（这里会调用data/voc_dataset.py中的VOCBboxDataset），对图像和标签进行预处理，并提供了用于训练和测试模型的接口。
model/faster_rcnn.py：这个文件包含了Faster R-CNN模型的实现。这个模型由三个主要部分组成：特征提取器、RPN网络和头部网络。特征提取器用于从图像中提取特征，RPN网络用于生成目标的候选区域，头部网络用于在这些候选区域上进行分类和回归。
model/faster_rcnn_vgg16.py：这个文件包含了基于VGG-16的Faster R-CNN模型FasterRCNNVGG16（继承了model/faster_rcnn.py中的FasterRCNN）。这个模型由三个主要部分组成：基于VGG-16的特征提取器、RPN网络和VGG16RoIHead头部网络。特征提取器用于从图像中提取特征，RPN网络用于生成目标的候选区域，头部网络用于在这些候选区域上进行分类和回归。
model/region_proposal_network.py：这个文件包含了Region Proposal Network（RPN）的实现。RPN是Faster R-CNN模型的一个关键组件，它用于生成目标的候选区域。
model/utils/creator_tool.py：这个文件包含了一些用于生成训练Faster R-CNN模型所需的目标的工具类，例如AnchorTargetCreator和ProposalTargetCreator。
trainer.py：这个文件包含了一个用于训练Faster R-CNN模型的类FasterRCNNTrainer。这个类提供了一些方法，例如train_step用于执行一步训练，save和load用于保存和加载模型，update_meters和reset_meters用于更新和重置度量等。
train.py：这个文件是项目的主程序，它首先加载数据集，然后创建Faster R-CNN模型和训练器，接着进入一个循环，每个循环代表一个训练周期，在每个训练周期中，它会遍历数据集中的所有图像，并使用训练器的train_step方法来更新模型的参数。

这些文件之间的关系主要是通过数据和模型的流动来实现的。首先，train.py会加载dataset.py中的数据集，然后使用model/faster_rcnn.py中的模型对数据进行处理，接着使用trainer.py中的训练器对模型进行训练。在训练过程中，model/utils/creator_tool.py中的工具类会被用来生成训练所需的目标，model/region_proposal_network.py中的RPN会被用来生成目标的候选区域。

1.train.py

在main函数中调用train()方法，train()方法的主要步骤为：

（1）加载数据集

dataset = Dataset(opt)
print('load data')
dataloader = data_.DataLoader(dataset, \
                              batch_size=1, \
                              shuffle=True, \
                              # pin_memory=True,
                              num_workers=opt.num_workers)
testset = TestDataset(opt)
test_dataloader = data_.DataLoader(testset,
                                   batch_size=1,
                                   num_workers=opt.test_num_workers,
                                   shuffle=False, \
                                   pin_memory=True
                                  )

（2）创建Faster RCNN模型及其训练器

faster_rcnn = FasterRCNNVGG16()# 创建Faster R-CNN模型对象
print('model construct completed')
trainer = FasterRCNNTrainer(faster_rcnn).cuda() # 创建Faster R-CNN模型的训练器
if opt.load_path:# 如果指定了预训练模型的路径
   trainer.load(opt.load_path)# 加载预训练模型
   print('load pretrained model from %s' % opt.load_path)

（3）开启训练

best_map = 0 # 初始化最佳mAP为0
lr_ = opt.lr # 获取初始学习率 lr=0.001
for epoch in range(opt.epoch):
   trainer.reset_meters() # 重置训练器的度量计数器
   for ii, (img, bbox_, label_, scale) in tqdm(enumerate(dataloader)): # 对数据加载器中的每个batch进行循环 ii为批次索引 img==tensor(1,3,800,600) bbox_==tensor(1,1,4) label_==tensor(1,1) scale==(1,)
        scale = at.scalar(scale) # 获取缩放因子
        img, bbox, label = img.cuda().float(), bbox_.cuda(), label_.cuda() # 将图像、边界框（ground_truth）和标签移动到GPU上，并将图像转换为浮点类型
        trainer.train_step(img, bbox, label, scale) # 执行一个训练步骤

（4）评估训练结果并在visdom中可视化展示

eval_result = eval(test_dataloader, faster_rcnn, test_num=opt.test_num)# 对测试数据集进行评估
trainer.vis.plot('test_map', eval_result['map'])# 在visdom中绘制mAP
lr_ = trainer.faster_rcnn.optimizer.param_groups[0]['lr']# 获取当前的学习率
log_info = 'lr:{}, map:{},loss:{}'.format(str(lr_),
                                          str(eval_result['map']),
                                          str(trainer.get_meter_data()))# 生成日志信息
trainer.vis.log(log_info)# 在visdom中显示日志信息

if eval_result['map'] > best_map:# 如果当前的mAP大于最佳mAP
    best_map = eval_result['map']# 更新最佳mAP
    best_path = trainer.save(best_map=best_map)# 保存当前的模型，并获取保存路径
    if epoch == 9:# 如果当前是第10个训练周期
        trainer.load(best_path)# 加载最佳模型
        trainer.faster_rcnn.scale_lr(opt.lr_decay)# 调整学习率
        lr_ = lr_ * opt.lr_decay # 更新当前的学习率

        if epoch == 13: # 如果当前是第14个训练周期
            break# 结束训练

2.trainer.py

在train.py的train()开启训练后，trainer.train_step(img, bbox, label, scale) 会执行一个训练步骤，即进入trainer.py的train_step()方法，进而通过内部的losses = self.forward(imgs, bboxes, labels, scale)进行前向传播，前向传播的主要步骤为：

（1）使用Faster R-CNN的特征提取器提取特征

features = self.faster_rcnn.extractor(imgs)  # 使用Faster R-CNN的特征提取器提取特征

（2）使用RPN生成RoIs(初步筛选后得到每张图片约2000个RoI)

# RPN
# rpn_locs代表所有偏移锚框的位置，形状为(1, hh*ww*9, 4)
# rpn_scores代表所有偏移锚框的得分下，形状为(1, hh*ww*9, 2)
# rois代表所有的RoIs(感兴趣区域)，形状约为(2000, 4)
# roi_indices代表RoIs对应的图像索引，形状约为(2000, ) 表明rois中的每个RoI都对应于哪张图片（这里batch=1，每次都来自第0张图片）
rpn_locs, rpn_scores, rois, roi_indices, anchor = \
	self.faster_rcnn.rpn(features, img_size, scale) # 使用RPN生成RoIs

class RegionProposalNetwork(nn.Module):

    def __init__(
            self, in_channels=512, mid_channels=512, ratios=[0.5, 1, 2],
            anchor_scales=[8, 16, 32], feat_stride=16,
            proposal_creator_params=dict(),
    ):  # 初始化方法，接受输入通道数、中间通道数、长宽比、锚框尺度、特征步长和proposal创建器参数作为参数
        super(RegionProposalNetwork, self).__init__()
        self.anchor_base = generate_anchor_base(
            anchor_scales=anchor_scales, ratios=ratios)# 生成基础参考框anchor_base==>shape(9,4)
        self.feat_stride = feat_stride # 设置特征步长
        self.proposal_layer = ProposalCreator(self, **proposal_creator_params)  # 创建proposal创建器
        n_anchor = self.anchor_base.shape[0]  # 获取锚框数量 n_anchor ==> 9
        self.conv1 = nn.Conv2d(in_channels, mid_channels, 3, 1, 1) # 创建一个卷积层，用于特征提取 ==> shape(512,512,3,1,1) kernel_size=(3,3), stride=(1,1), padding=(1,1)
        self.score = nn.Conv2d(mid_channels, n_anchor * 2, 1, 1, 0) # 创建一个卷积层，用于计算锚框的得分 ==> shape(512,9*2,1,1,0)
        self.loc = nn.Conv2d(mid_channels, n_anchor * 4, 1, 1, 0) # 创建一个卷积层，用于计算锚框的位置 ==> shape(512,9*4,1,1,0)
        normal_init(self.conv1, 0, 0.01) # 初始化conv1的权重
        normal_init(self.score, 0, 0.01) # 初始化score的权重
        normal_init(self.loc, 0, 0.01) # 初始化loc的权重

    def forward(self, x, img_size, scale=1.):# 定义前向传播方法，接受FasterRCNN特征提取得到的(也是vgg16的)特征图features、图像尺寸和缩放因子作为参数 img_size==(800,600) scale==1
      
        n, _, hh, ww = x.shape # 获取输入特征图x的形状
        anchor = _enumerate_shifted_anchor(
            np.array(self.anchor_base),
            self.feat_stride, hh, ww) # 枚举所有的偏移锚框，数量为特征图的像素数每个像素的锚框数==>(hh*ww)*9

        n_anchor = anchor.shape[0] // (hh * ww) # 计算每个像素的锚框数量
        h = F.relu(self.conv1(x)) # 对输入进行卷积操作并通过ReLU激活函数

        rpn_locs = self.loc(h) # 计算锚框的位置
        # UNNOTE: check whether need contiguous
        # A: Yes
        rpn_locs = rpn_locs.permute(0, 2, 3, 1).contiguous().view(n, -1, 4) # 调整rpn_locs的形状  rpn_locs代表所有偏移锚框的位置，形状为(1, hh*ww*9, 4)
        rpn_scores = self.score(h) # 计算锚框的得分 rpn_scores代表所有偏移锚框的得分
        rpn_scores = rpn_scores.permute(0, 2, 3, 1).contiguous() # 调整rpn_scores的形状
        rpn_softmax_scores = F.softmax(rpn_scores.view(n, hh, ww, n_anchor, 2), dim=4)  # (1,hh,ww,9,2)# 对得分进行softmax操作，得到每个类别的概率  rpn_softmax_scores代表所有偏移锚框的得分，形状为(1, hh, ww, 9, 2)
        rpn_fg_scores = rpn_softmax_scores[:, :, :, :, 1].contiguous()  # 获取前景的得分
        rpn_fg_scores = rpn_fg_scores.view(n, -1)  # 调整rpn_fg_scores的形状 rpn_fg_scores代表所有偏移锚框的前景得分，形状为(1, hh*ww*9)
        rpn_scores = rpn_scores.view(n, -1, 2)  # 调整rpn_scores的形状 (1, hh*ww*9, 2)

        rois = list() # 创建一个列表，用于存储RoIs
        roi_indices = list() # 创建一个列表，用于存储RoIs的索引
        for i in range(n): # 对每个图像进行循环
            roi = self.proposal_layer(
                rpn_locs[i].cpu().data.numpy(),
                rpn_fg_scores[i].cpu().data.numpy(),
                anchor, img_size,
                scale=scale) # 使用proposal创建器生成RoIs roi==>(1944,4)
            batch_index = i * np.ones((len(roi),), dtype=np.int32) # 创建一个数组，用于存储RoIs的索引
            rois.append(roi) # 将RoIs添加到列表中
            roi_indices.append(batch_index) # 将RoIs的索引添加到列表中

        rois = np.concatenate(rois, axis=0) # 将所有图像的RoIs合并
        roi_indices = np.concatenate(roi_indices, axis=0) # 将所有图像的RoIs的索引合并
        return rpn_locs, rpn_scores, rois, roi_indices, anchor


# 用处：获取偏移后的锚框
# 在Faster R-CNN中，我们首先在图像中生成一系列的锚点（也称为锚框或参考框）
# 这些锚点通常是在不同的位置、尺度和长宽比下生成的（称为基础锚点：3*3=9——特征图的每个像素处都会有9个锚点——即下文代码中的A=9）
# 然后，我们会预测每个锚点需要偏移多少，才能更好地匹配到真实的目标边界框，这个偏移的过程就是所谓的偏移锚点

# 为什么需要偏移?
# 因为我们希望在特征图的每个位置都有一组锚点，这样可以更全面地覆盖到图像中的所有可能的目标。
# 如果只使用基础锚点，那么锚点的位置就只能在参考窗口的位置，这样可能会漏掉一些位于其他位置的目标。
# 通过偏移，我们可以让锚点覆盖到特征图的每个位置，从而更好地检测到所有的目标。
def _enumerate_shifted_anchor(anchor_base, feat_stride, height, width):
    import numpy as xp
    shift_y = xp.arange(0, height * feat_stride, feat_stride) # 计算y方向上的所有偏移
    shift_x = xp.arange(0, width * feat_stride, feat_stride) # 计算x方向上的所有偏移
    shift_x, shift_y = xp.meshgrid(shift_x, shift_y)  # 生成网格坐标
    shift = xp.stack((shift_y.ravel(), shift_x.ravel(),
                      shift_y.ravel(), shift_x.ravel()), axis=1) # 将偏移堆叠成一个数组

    A = anchor_base.shape[0] # 每个像素处会有A个锚点（A=9）
    K = shift.shape[0] # 特征图的像素数量（K=hh*ww）==> 整张特征图总的偏移量的数量=K*A
    anchor = anchor_base.reshape((1, A, 4)) + \
             shift.reshape((1, K, 4)).transpose((1, 0, 2)) # 将基础锚点和偏移相加，得到所有的偏移后的锚点
    anchor = anchor.reshape((K * A, 4)).astype(np.float32) # 调整偏移锚点的形状，并转换为浮点类型
    return anchor

（3）生成用于训练Faster R-CNN的head网络（即RoI网络）所需的目标：经IoU阈值筛选后得到128个ROIs(sample_roi)，使用ProposalTargetCreator为sample_roi分配真实边界框gt_roi_loc和真实标签gt_roi_label，此时就生成了用于训练Faster R-CNN的head网络（即RoI网络）所需的目标。这些目标包括每个RoI对应的ground truth边界框的偏移和比例（用于边界框回归任务），以及每个RoI对应的ground truth边界框的类别标签（用于分类任务）

# sample_roi代表经过IoU阈值筛选后的RoIs，由前景roi和背景roi组成，形状为(128, 4)
# gt_roi_loc代表sample_roi与其对应的真实边界框（其实是与它IoU最大的真实边界框）之间的偏移和比例，形状为(128, 4)
# gt_roi_label代表sample_roi对应的真实标签（其实是与它IoU最大的真实边界框的标签），形状为(128,)
sample_roi, gt_roi_loc, gt_roi_label = self.proposal_target_creator(
            roi,
            at.tonumpy(bbox),
            at.tonumpy(label),
            self.loc_normalize_mean,
            self.loc_normalize_std)# 使用ProposalTargetCreator生成训练目标

utils/creator_tool.py中的类ProposalTargetCreator的代码如下

# 用处：为给定的RoIs分配ground truth边界框
# 1.计算RoIs和边界框的IoU
# 2.找出每个RoI与哪个边界框的IoU最大
# 3.将每个RoI分配给与其IoU最大的边界框
# 4.计算这些RoI与其对应的边界框的偏移和比例
# 5.对偏移和比例进行归一化
class ProposalTargetCreator(object):

    def __init__(self,
                 n_sample=128,
                 pos_ratio=0.25, pos_iou_thresh=0.5,
                 neg_iou_thresh_hi=0.5, neg_iou_thresh_lo=0.0
                 ):# 初始化方法，接受一系列参数，包括采样区域的数量、前景的比例、前景的IoU阈值、背景的IoU阈值等
        self.n_sample = n_sample  # 采样区域的数量
        self.pos_ratio = pos_ratio  # 前景的比例
        self.pos_iou_thresh = pos_iou_thresh  # 前景的IoU阈值
        self.neg_iou_thresh_hi = neg_iou_thresh_hi  # 背景的IoU阈值上限
        self.neg_iou_thresh_lo = neg_iou_thresh_lo  # 背景的IoU阈值下限 NOTE:default 0.1 in py-faster-rcnn

    def __call__(self, roi, bbox, label,
        loc_normalize_mean=(0., 0., 0., 0.),
        loc_normalize_std=(0.1, 0.1, 0.2, 0.2)):# 定义__call__方法，接受RoIs、边界框、标签、位置归一化的均值和标准差作为参数
        n_bbox, _ = bbox.shape # 获取边界框的数量

        roi = np.concatenate((roi, bbox), axis=0)  # 将RoIs和边界框合并

        pos_roi_per_image = np.round(self.n_sample * self.pos_ratio)  # 计算每个图像的前景RoI数量
        iou = bbox_iou(roi, bbox)  # 计算RoIs和边界框的IoU
        gt_assignment = iou.argmax(axis=1)  # 找出每个RoI与哪个边界框的IoU最大
        max_iou = iou.max(axis=1)  # 找出每个RoI的最大IoU
        # Offset range of classes from [0, n_fg_class - 1] to [1, n_fg_class].
        # The label with value 0 is the background.
        gt_roi_label = label[gt_assignment] + 1 # 将每个RoI分配给与其IoU最大的边界框的标签

        # Select foreground RoIs as those with >= pos_iou_thresh IoU.
        pos_index = np.where(max_iou >= self.pos_iou_thresh)[0]  # 找出IoU大于等于前景阈值的RoIs
        pos_roi_per_this_image = int(min(pos_roi_per_image, pos_index.size)) # 计算这个图像的前景RoI数量
        if pos_index.size > 0:  # 如果存在前景RoI
            pos_index = np.random.choice(
                pos_index, size=pos_roi_per_this_image, replace=False)

        # Select background RoIs as those within
        # [neg_iou_thresh_lo, neg_iou_thresh_hi).
        neg_index = np.where((max_iou < self.neg_iou_thresh_hi) &
                             (max_iou >= self.neg_iou_thresh_lo))[0]  # 找出IoU在背景阈值范围内的RoIs
        neg_roi_per_this_image = self.n_sample - pos_roi_per_this_image  # 计算这个图像的背景RoI数量
        neg_roi_per_this_image = int(min(neg_roi_per_this_image,
                                         neg_index.size)) # 计算这个图像的背景RoI数量
        if neg_index.size > 0:  # 如果存在背景RoI
            neg_index = np.random.choice(
                neg_index, size=neg_roi_per_this_image, replace=False)

        # The indices that we're selecting (both positive and negative).
        keep_index = np.append(pos_index, neg_index)  # 将前景和背景的索引合并
        gt_roi_label = gt_roi_label[keep_index]  # 保留这些RoI的标签
        gt_roi_label[pos_roi_per_this_image:] = 0  # negative labels --> 0  # 将背景RoI的标签设置为0
        sample_roi = roi[keep_index] # 保留这些RoI

        # Compute offsets and scales to match sampled RoIs to the GTs.
        gt_roi_loc = bbox2loc(sample_roi, bbox[gt_assignment[keep_index]]) # 计算这些RoI与其对应的边界框的偏移和比例
        gt_roi_loc = ((gt_roi_loc - np.array(loc_normalize_mean, np.float32)
                       ) / np.array(loc_normalize_std, np.float32))# 对偏移和比例进行归一化

        return sample_roi, gt_roi_loc, gt_roi_label
    #sample_roi代表经过IoU阈值筛选后的RoIs，由前景roi和背景roi组成，形状为(128, 4)
    #gt_roi_loc代表sample_roi与其对应的真实边界框（其实是与它IoU最大的真实边界框）之间的偏移和比例，形状为(128, 4)
    #gt_roi_label代表sample_roi对应的真实标签（其实是与它IoU最大的真实边界框的标签），形状为(128,)

（4）生成用于训练Faster R-CNN的RPN网络所需的目标：这些目标包括每个锚点对应的ground truth边界框的偏移和比例（用于边界框回归任务），以及每个锚点是否包含物体的标签（用于前景/背景分类任务）

# rpn_loc代表偏移后的锚框的位置
# gt_rpn_loc代表每个anchor与其对应的真实边界框之间的偏移和比例 (hh*ww*9, 4)
# gt_rpn_label代表每个anchor对应的真实边界框的标签 (hh*ww*9,)
gt_rpn_loc, gt_rpn_label = self.anchor_target_creator(
            at.tonumpy(bbox),
            anchor,
            img_size)  # 使用AnchorTargetCreator生成训练目标

utils/creator_tool.py中的类AnchorTargetCreator的代码如下

class AnchorTargetCreator(object):

    def __init__(self,
                 n_sample=256,
                 pos_iou_thresh=0.7, neg_iou_thresh=0.3,
                 pos_ratio=0.5):  # 初始化方法，接受一系列参数，包括采样区域的数量、前景的IoU阈值、背景的IoU阈值、前景的比例等
        self.n_sample = n_sample # 采样区域的数量
        self.pos_iou_thresh = pos_iou_thresh  # 前景的IoU阈值
        self.neg_iou_thresh = neg_iou_thresh  # 背景的IoU阈值
        self.pos_ratio = pos_ratio   # 前景的比例

    def __call__(self, bbox, anchor, img_size):   # 定义__call__方法，接受边界框、锚点和图像尺寸作为参数

        img_H, img_W = img_size  # 获取图像的尺寸

        n_anchor = len(anchor)  # 获取锚点的数量
        inside_index = _get_inside_index(anchor, img_H, img_W) # 获取在图像内部的锚点的索引
        anchor = anchor[inside_index]   # 获取在图像内部的锚点
        argmax_ious, label = self._create_label(
            inside_index, anchor, bbox)  # 创建标签

        # compute bounding box regression targets
        loc = bbox2loc(anchor, bbox[argmax_ious])  # 计算anchor和ground truth边界框之间的偏移和缩放

        # map up to original set of anchors
        label = _unmap(label, n_anchor, inside_index, fill=-1)  # 将标签映射回原始的锚点集
        loc = _unmap(loc, n_anchor, inside_index, fill=0)  # 将位置映射回原始的锚点集

        return loc, label  # 返回位置和标签

    def _create_label(self, inside_index, anchor, bbox):
        # label: 1 is positive, 0 is negative, -1 is dont care
        label = np.empty((len(inside_index),), dtype=np.int32)  # 创建一个空的标签数组
        label.fill(-1)  # 将标签数组填充为-1

        argmax_ious, max_ious, gt_argmax_ious = \
            self._calc_ious(anchor, bbox, inside_index)  # 计算IoU

        # assign negative labels first so that positive labels can clobber them
        label[max_ious < self.neg_iou_thresh] = 0  # 将IoU小于背景阈值的锚点标记为背景

        # positive label: for each gt, anchor with highest iou
        label[gt_argmax_ious] = 1  # 将每个ground truth对应的IoU最大的锚点标记为前景

        # positive label: above threshold IOU
        label[max_ious >= self.pos_iou_thresh] = 1  # 将IoU大于前景阈值的锚点标记为前景

        # subsample positive labels if we have too many
        n_pos = int(self.pos_ratio * self.n_sample)   # 计算前景的数量
        pos_index = np.where(label == 1)[0]   # 获取前景的索引
        if len(pos_index) > n_pos:   # 如果前景的数量过多
            disable_index = np.random.choice(
                pos_index, size=(len(pos_index) - n_pos), replace=False)   # 随机选择一部分前景
            label[disable_index] = -1   # 将这部分前景标记为不关心

        # subsample negative labels if we have too many
        n_neg = self.n_sample - np.sum(label == 1)  # 计算背景的数量
        neg_index = np.where(label == 0)[0] # 获取背景的索引
        if len(neg_index) > n_neg:  # 如果背景的数量过多
            disable_index = np.random.choice(
                neg_index, size=(len(neg_index) - n_neg), replace=False) # 随机选择一部分背景
            label[disable_index] = -1 # 将这部分背景标记为不关心

        return argmax_ious, label  # 返回最大IoU的索引和标签

    def _calc_ious(self, anchor, bbox, inside_index):
        # ious between the anchors and the gt boxes
        ious = bbox_iou(anchor, bbox) # 计算锚点和ground truth边界框之间的IoU
        argmax_ious = ious.argmax(axis=1)  # 获取每个锚点对应的最大IoU的索引
        max_ious = ious[np.arange(len(inside_index)), argmax_ious]  # 获取每个锚点的最大IoU
        gt_argmax_ious = ious.argmax(axis=0) # 获取每个ground truth边界框对应的最大IoU的索引
        gt_max_ious = ious[gt_argmax_ious, np.arange(ious.shape[1])]  # 获取每个ground truth边界框的最大IoU
        gt_argmax_ious = np.where(ious == gt_max_ious)[0] # 获取最大IoU的索引

        return argmax_ious, max_ious, gt_argmax_ious   # 返回最大IoU的索引、最大IoU和ground truth边界框的最大IoU的索引

（5）使用Faster R-CNN的头部网络（即RoI网络）对sample_roi进行前向传播，返回roi_cls_locs、roi_scores

# sample_roi代表经过IoU阈值筛选后的RoIs，由前景roi和背景roi组成，形状为(128, 4)
# sample_roi_index：一个全零的数组，用于存储RoIs的索引
# roi_cls_locs代表每个sample_roi在经过head网络预测之后，预测出来的每个类别（21类）的边界框位置 (128,21*4)，128为RPN得出的sample_roi的数量
# roi_scores代表每个sample_roi在经过head网络预测之后，预测出来的每个类别（21类）的得分 (128,21)
roi_cls_loc, roi_score = self.faster_rcnn.head(
    features,
    sample_roi,
    sample_roi_index) # 使用Faster R-CNN的头部网络进行前向传播

（6）计算RPN losses

RPN的定位损失：偏移后的锚框位置和真实边界框位置之间的平滑L1损失

# ------------------ RPN losses -------------------#
# rpn_loc代表偏移后的锚框的位置 (hh*ww*9, 4)
# gt_rpn_loc代表每个anchor与其对应的真实边界框之间的偏移和比例 (hh*ww*9, 4)
# gt_rpn_label代表每个anchor对应的真实边界框的标签 (hh*ww*9,)
rpn_loc_loss = _fast_rcnn_loc_loss(
rpn_loc,
gt_rpn_loc,
gt_rpn_label.data,
self.rpn_sigma)  # 计算RPN的定位损失：偏移后的锚框位置和真实边界框位置之间的平滑L1损失

RPN的分类损失

# rpn_score代表偏移后的锚框的得分 (hh*ww*9, 2)
rpn_cls_loss = F.cross_entropy(rpn_score, gt_rpn_label.cuda(), ignore_index=-1)  # 计算RPN的分类损失

（7）计算ROI losses（fast rcnn loss）

ROI的定位损失

# ------------------ ROI losses (fast rcnn loss) -------------------#
# roi_cls_loc代表每个sample_roi在经过head网络预测之后，预测出来的每个类别（21类）的边界框位置 (128,21,4)，128为RPN得出的sample_roi的数量
# roi_score代表每个sample_roi在经过head网络预测之后，预测出来的每个类别（21类）的得分 (128,21)
# gt_roi_loc代表sample_roi与其对应的真实边界框（其实是与它IoU最大的真实边界框）之间的偏移和比例，形状为(128, 4)
# gt_roi_label代表sample_roi对应的真实标签（其实是与它IoU最大的真实边界框的标签），形状为(128,)
roi_loc_loss = _fast_rcnn_loc_loss(
roi_loc.contiguous(),
gt_roi_loc,
gt_roi_label.data,
self.roi_sigma)  # 计算RoI的定位损失

ROI的分类损失

（8）计算总损失

# 总损失 = rpn定位损失+rpn分类损失+roi定位损失+roi分类损失
losses = [rpn_loc_loss, rpn_cls_loss, roi_loc_loss, roi_cls_loss]  # 创建一个列表，用于存储所有的损失
losses = losses + [sum(losses)]  # 计算总损失

（9）前向传播返回总损失后，执行反向传播、参数更新等操作

def train_step(self, imgs, bboxes, labels, scale): # 定义训练步骤方法，接受图像、边界框、标签和缩放因子作为参数
    self.optimizer.zero_grad() # 清零优化器的梯度缓存
    losses = self.forward(imgs, bboxes, labels, scale) # 调用forward方法，计算损失 imgs==tensor(1,3,800,600) bboxes==tensor(1,1,4) labels==tensor(1,1) scale==(1,)
    losses.total_loss.backward()# 对总损失进行反向传播
    self.optimizer.step()  # 执行一步优化（参数更新）
    self.update_meters(losses)  # 更新度量
    return losses # 返回损失