


我参考的是这位日本大佬实现的YoloV1损失函数。实现得非常的优美,内部的 for 循环都尽可能的使用矩阵运算替代了,我自己在这份代码的基础上进行了一些整理,让大家能够更简单的理解,现在就来解读一下吧。


loss = λ coord ∑ i = 0 S 2 ∑ j = 0 B 1 i j obj ( x i − x ^ i ) 2 + ( y i − y ^ i ) 2 + λ coord ∑ i = 0 S 2 ∑ j = 0 B 1 i j obj ( w i − w ^ i ) 2 + ( h i − h ^ i ) 2 + ∑ i = 0 S 2 ∑ j = 0 B 1 i j obj ( C i − C ^ i ) 2 + λ noobj ∑ i = 0 S 2 ∑ j = 0 B 1 i j noobj ( C i − C ^ i ) 2 + ∑ i = 0 S 2 1 i obj ∑ c ∈ classes ( p i ( c ) − p ^ i ( c ) ) 2 \begin{equation} \begin{aligned} \text{loss} &= \lambda _{\textbf{coord}}\sum _{i=0}^{S^2}\sum _{j=0}^{B}\mathbb{1}_{ij}^{\text{obj}} \left ( x_i - \hat{x}_i \right )^2 + \left ( y_i - \hat{y}_i \right )^2 \\ &+ \lambda _{\textbf{coord}}\sum _{i=0}^{S^2}\sum _{j=0}^{B}\mathbb{1}_{ij}^{\text{obj}} \left ( \sqrt[]{w_i} - \sqrt[]{\hat{w}_i } \right )^2 + \left ( \sqrt[]{h_i} - \sqrt[]{\hat{h}_i} \right )^2 \\ &+ \sum _{i=0}^{S^2}\sum _{j=0}^{B}\mathbb{1}_{ij}^{\text{obj}} \left ( C_i - \hat{C}_i \right )^2 \\ &+ \lambda _{\textbf{noobj}}\sum _{i=0}^{S^2}\sum _{j=0}^{B}\mathbb{1}_{ij}^{\text{noobj}} \left ( C_i - \hat{C}_i \right )^2 \\ &+ \sum_{i=0}^{S^2}\mathbb{1}_{i}^{\text{obj}} \sum _{c\in \textbf{classes}} \left ( p_i(c) - \hat{p}_i(c) \right )^2 \end{aligned} \end{equation} loss=λcoordi=0S2j=0B1ijobj(xix^i)2+(yiy^i)2+λcoordi=0S2j=0B1ijobj(wi w^i )2+(hi h^i )2+i=0S2j=0B1ijobj(CiC^i)2+λnoobji=0S2j=0B1ijnoobj(CiC^i)2+i=0S21iobjcclasses(pi(c)p^i(c))2

上图是 YoloV1 损失函数的数学表达式,我们知道它是由三大部分组成的:

  • 边界框 bounding box 的几何损失:
    • 中心点位置: λ coord ∑ i = 0 S 2 ∑ j = 0 B 1 i j obj ( x i − x ^ i ) 2 + ( y i − y ^ i ) 2 \lambda _{\textbf{coord}}\sum _{i=0}^{S^2}\sum _{j=0}^{B}\mathbb{1}_{ij}^{\text{obj}} \left ( x_i - \hat{x}_i \right )^2 + \left ( y_i - \hat{y}_i \right )^2 λcoordi=0S2j=0B1ijobj(xix^i)2+(yiy^i)2
    • 宽高尺寸: λ coord ∑ i = 0 S 2 ∑ j = 0 B 1 i j obj ( w i − w ^ i ) 2 + ( h i − h ^ i ) 2 \lambda _{\textbf{coord}}\sum _{i=0}^{S^2}\sum _{j=0}^{B}\mathbb{1}_{ij}^{\text{obj}} \left ( \sqrt[]{w_i} - \sqrt[]{\hat{w}_i } \right )^2 + \left ( \sqrt[]{h_i} - \sqrt[]{\hat{h}_i} \right )^2 λcoordi=0S2j=0B1ijobj(wi w^i )2+(hi h^i )2
  • 边界框 bounding box 的置信度损失:
    • 包含目标: ∑ i = 0 S 2 ∑ j = 0 B 1 i j obj ( C i − C ^ i ) 2 \sum _{i=0}^{S^2}\sum _{j=0}^{B}\mathbb{1}_{ij}^{\text{obj}} \left ( C_i - \hat{C}_i \right )^2 i=0S2j=0B1ijobj(CiC^i)2
    • 不包含目标: λ noobj ∑ i = 0 S 2 ∑ j = 0 B 1 i j noobj ( C i − C ^ i ) 2 \lambda _{\textbf{noobj}}\sum _{i=0}^{S^2}\sum _{j=0}^{B}\mathbb{1}_{ij}^{\text{noobj}} \left ( C_i - \hat{C}_i \right )^2 λnoobji=0S2j=0B1ijnoobj(CiC^i)2
  • 网格 grid 的分类损失: ∑ i = 0 S 2 1 i obj ∑ c ∈ classes ( p i ( c ) − p ^ i ( c ) ) 2 \sum_{i=0}^{S^2}\mathbb{1}_{i}^{\text{obj}} \sum _{c\in \textbf{classes}} \left ( p_i(c) - \hat{p}_i(c) \right )^2 i=0S21iobjcclasses(pi(c)p^i(c))2

可以看到上面的损失函数,有三个部分,有两个部分是和边界框(bounding box)相关的,有两个部分是和网格(grid)相关的。为什么要分得这么细呢?这主要是和代码的实现相关,其实我对这位日本大佬的代码实现也有一些疑惑,后续会提到。


from torch.nn import Module
class Yolov1Loss(Module):
	def __init__(self, num_grids, num_bboxes, num_classes, lambda_coord, lambda_noobj):
		super(Yolov1Loss, self).__init__()
		self.S = num_grids
		self.B = num_bboxes
		self.C = num_classes
		self.lambda_coord = lambda_coord
		self.lambda_noobj = lambda_noobj
		self.N = 5 * num_bboxes + num_classes
  • num_grids:图片被分为 S × S S \times S S×S 的网格
  • num_bboxes:每个网格内,设置 B B B 个边界框(bounding box)来进行预测
  • num_classes:需要预测的目标类别数量
  • lambda_coord:公式中的 λ coord \lambda _{\textbf{coord}} λcoord
  • lambda_noobj:公式中的 λ noobj \lambda _{\textbf{noobj}} λnoobj



已知网格内部含有目标,网格内部有多个用于预测的边界框,但是只有其中一个边界框会对目标负责,我们要通过计算边界框与真实框(ground truth box)的 IOU,有着最大 IOU 的边界框才是负责目标物体的。

这个 IOU 的计算使用到了矩阵运算。

 def compute_iou(self, bbox1, bbox2):
        N, M = bbox1.size(0), bbox2.size(0)

        left_top = torch.max(
            bbox1[:, :2].unsqueeze(1).expand(N, M, 2), # [N, 2] -> [N, 1, 2] -> [N, M, 2]
            bbox2[:, :2].unsqueeze(0).expand(N, M, 2)  # [M, 2] -> [1, M, 2] -> [N, M, 2]

        right_bottom = torch.min(
            bbox1[:, 2:].unsqueeze(1).expand(N, M, 2), # [N, 2] -> [N, 1, 2] -> [N, M, 2]
            bbox2[:, 2:].unsqueeze(0).expand(N, M, 2)  # [M, 2] -> [1, M, 2] -> [N, M, 2]
        # Compute area of the intersections from the coordinates
        wh = right_bottom - left_top   # width and height of the intersection, [N, M, 2]
        wh[wh < 0] = 0 # clip at 0
        inter = wh[:, :, 0] * wh[:, :, 1] # [N, M]

        # Compute area of the bboxes
        area1 = (bbox1[:, 2] - bbox1[:, 0]) * (bbox1[:, 3] - bbox1[:, 1]) # [N, ]
        area2 = (bbox2[:, 2] - bbox2[:, 0]) * (bbox2[:, 3] - bbox2[:, 1]) # [M, ]
        area1 = area1.unsqueeze(1).expand_as(inter) # [N, ] -> [N, 1] -> [N, M]
        area2 = area2.unsqueeze(0).expand_as(inter) # [M, ] -> [1, M] -> [N, M]

        # Compute IoU from the areas
        union = area1 + area2 - inter # [N, M]
        iou = inter / union           # [N, M]
        return iou


我们都知道 Yolo 的数据标注格式:label center_x center_y w h

其中除了 label,其他的和位置坐标相关的参数,都是经过图片尺寸宽高进行归一化操作的,所以在计算 IOU 之前,我们要将边界框的相关参数进行反归一化。

def denormlaize(self, xywh):
	xyxy = Variable(torch.FloatTensor(xywh.size()))
	xyxy[:, :2] = xywh[:, :2] / float(self.S) - 0.5 * xywh[:, 2:4]
	xyxy[:, 2:4] = xywh[:, :2] / float(self.S) + 0.5 * xywh[:, 2:4]
	return xyxy[:, :4]

参数计算: 1 i obj \mathbb{1}_{i}^{\text{obj}} 1iobj 1 i j obj \mathbb{1}_{ij}^{\text{obj}} 1ijobj 1 i j noobj \mathbb{1}_{ij}^{\text{noobj}} 1ijnoobj

这三个参数其实都算是一个 flag, 标志位:

  • 1 i obj \mathbb{1}_{i}^{\text{obj}} 1iobj:网格 i 内含有目标则为 1,不含目标则为 0 。
  • 1 i j obj \mathbb{1}_{ij}^{\text{obj}} 1ijobj:网格 i 的边界框 j 负责目标则为 1,不负责目标则为 0 。在计算包含目标的置信率的时候,负责目标的情况是为 IOU 值,而不是 1。
  • 1 i j noobj \mathbb{1}_{ij}^{\text{noobj}} 1ijnoobj:网格 i 的边界框 j 不负责目标则为 1,负责目标则为 0 。

所以相当于有 4 个参数。

但是具体到代码实现的时候,因为大部分都是使用矩阵运算,所以下面的代码都使用掩码矩阵等效替代上面的参数,经过掩码矩阵出来之后的矩阵,只会保留为 1 的结果值。

获得 1 i obj \mathbb{1}_{i}^{\text{obj}} 1iobj

我们知道在计算类别损失的时候要用到这个参数,这个参数的含义也非常简单:一张图片有 S × S S \times S S×S 个网格,如果网格 i 内有目标物体,则 1 i obj = 1 \mathbb{1}_{i}^{\text{obj}}=1 1iobj=1,没有则为 0 。

def get_lambda_i_obj(self, pred_tensor, target_tensor):
    coord_mask = target_tensor[..., 4] > 0  
    coord_mask = coord_mask.unsqueeze(-1).expand_as(target_tensor)  # 在最后一维加一维,整得和tensor一样

    # 相当于已经知道在网格内是否有目标了,但是不知道具体是哪个bbox是负责目标的 I_i^{obj}
    coord_pred = pred_tensor[coord_mask].view(-1, self.N)
    coord_target = target_tensor[coord_mask].view(-1, self.N)
    return coord_pred, coord_target


  • coord_pred:预测值中含有目标的网格
  • coord_target:真实值中含有目标的网格

其实这里我是有疑问的:coord_mask = target_tensor[..., 4] > 0 这行代码只判断了第一个边界框的 conf 是否大于 0 ,而没有去判断后面边界框的 conf,让我很疑惑。有没有大佬解释一下。

2023/08/16 更新:我后来自己想了一下,target_tensor 实际就是真实框(ground truth box)数据,因为真实框只有一个,在传入损失函数与预测框进行比较的时候,会将它的形式转成 7 × 7 × 30 7\times7\times30 7×7×30 的矩阵,在这个过程中,因为只有一个真实框,为了保持整体数据维度一致,实际上会将其复制成 B 个(也就是和预测框的数量一直),就是说最后一维 30 的 0~4 与 5~9 实际是一样的( B = 2 B=2 B=2 的情况)。上面的代码写成 coord_mask = target_tensor[..., 9] > 0 也是可以的。


获得 1 i j obj \mathbb{1}_{ij}^{\text{obj}} 1ijobj

从上面损失函数的公式中,我们可以看到 1 i j obj \mathbb{1}_{ij}^{\text{obj}} 1ijobj 是最常出现的参数,其实就是 conf 这个参数,它的计算也是损失函数中的重点。


Otherwise we want the confidence score to equal the intersection over union (IOU) between the predicted box and the ground truth.

  • 在边界框的几何损失中:如果网格 i 中存在目标物体,且边界框 j 是对这个目标物体负责的话, 1 i j obj = 1 \mathbb{1}_{ij}^{\text{obj}} = 1 1ijobj=1, 否则 1 i j obj = 0 \mathbb{1}_{ij}^{\text{obj}} = 0 1ijobj=0
  • 在边界框的置信度损失(包含目标)中:如果网格 i 中存在目标物体,且边界框 j 是对这个目标物体负责的话, 1 i j obj = IOU(bbox, ground truth box) \mathbb{1}_{ij}^{\text{obj}} = \text{IOU(bbox, ground truth box)} 1ijobj=IOU(bbox, ground truth box),否则也是 1 i j obj = 0 \mathbb{1}_{ij}^{\text{obj}} = 0 1ijobj=0,这里的区别如果是负责目标物体的话, 1 i j obj \mathbb{1}_{ij}^{\text{obj}} 1ijobj 为两者实际的 IOU 值,而不是 1 。
def get_lambda_ij_obj(self, bbox_pred, bbox_target):
    # 通过bbox计算出iou来确定某个网格的某个bbox是否为一个目标负责
    # buffer
    bbox_with_obj_mask = torch.zeros(bbox_target.size(0), dtype=torch.bool).cuda()
    bbox_without_obj_mask = torch.ones(bbox_target.size(0), dtype=torch.bool).cuda()
    bbox_target_iou = torch.zeros(bbox_target.size()).cuda()
    # 遍历网格内的bbox
    for i in range(0, bbox_target.size(0), self.B):
        # 预测值与真实值的坐标重新转换(由于归一化)
        pred_xyxy = self.denormalize(bbox_pred[i: i + self.B])
        target_xyxy = self.denormalize(bbox_target[i].view(-1, 5))
        # max iou (ground truth box and bbox)
        iou = iou_compute(pred_xyxy, target_xyxy)
        max_iou, max_index = iou.max(0)
        max_index = max_index.data.cuda()
        bbox_with_obj_mask[i + max_index] = 1  # 这个就是 i_{ij}^{obj}
        bbox_without_obj_mask[i + max_index] = 0  # 好像没有用到
        bbox_target_iou[i + max_index, torch.LongTensor([4]).cuda()] = max_iou.data.cuda()  # 只填充conf的位置
    bbox_target_iou = Variable(bbox_target_iou).cuda()
    return bbox_with_obj_mask, bbox_target_iou


  • bbox_with_obj_mask: 这是一个掩码。和边界框 bbox 集合的矩阵运算之后,只会保留下负责目标的边界框,也就是等效的第一种 1 i j obj \mathbb{1}_{ij}^{\text{obj}} 1ijobj
  • bbox_target_iou:维度为 [n_coord x B, 5] ,其中第 5 列表示 conf, 如果边界框负责有目标物体,则会被赋值 IOU 实际值。后续还有和 bbox_with_obj_mask进行运算,保留下负责目标的边界框。这也就是上面的第二种 1 i j obj \mathbb{1}_{ij}^{\text{obj}} 1ijobj

获得 1 i j noobj \mathbb{1}_{ij}^{\text{noobj}} 1ijnoobj

def get_lambda_ij_noobj(self, pred_tensor, target_tensor):

    noobj_mask = target_tensor[..., 4] == 0  # mask=[batchsize, S, S], bool
    noobj_mask = noobj_mask.unsqueeze(-1).expand_as(target_tensor)  # mask=[batchsize, S, S, N]

    # 网格没有目标的
    noobj_pred = pred_tensor[noobj_mask].view(-1, self.N)
    noobj_target = target_tensor[noobj_mask].view(-1, self.N)
    noobj_conf_mask = torch.zeros(noobj_pred.size(), dtype=torch.bool).cuda()

    for b in range(self.B):
        noobj_conf_mask[:, 4 + b * 5] = 1  # 将不含目标的bbox的参数conf变成1
    noobj_pred_conf = noobj_pred[noobj_conf_mask]
    noobj_target_conf = noobj_target[noobj_conf_mask]
    return noobj_pred_conf, noobj_target_conf


我们知道边界框有 3 种类型:

  • 网格包含目标,负责该目标的边界框
  • 网格包含目标,但是不负责该目标的边界框
  • 网格不包含目标,肯定也没有要负责目标的边界框

但是说是要计算 1 i j noobj \mathbb{1}_{ij}^{\text{noobj}} 1ijnoobj, 但是这个代码貌似只是将考虑了第三种,就是不包含目标的网格,内部的边界框。如果是这样的话,参数表达直接写成: 1 i noobj \mathbb{1}_{i}^{\text{noobj}} 1inoobj 就好了?和上面的 1 i obj \mathbb{1}_{i}^{\text{obj}} 1iobj 一样,都只做了网格这一层面的判断。没有考虑网格包含目标,但是不负责该目标的边界框。既然写成 1 i j noobj \mathbb{1}_{ij}^{\text{noobj}} 1ijnoobj,应该同时遍历了网格和边界框的啊。



def forward(self, pred_tensor, target_tensor):

    batch_size = pred_tensor.size(0)  # pred_tensor = [batchsize, S, S, N=Bx5+C]

    coord_pred, coord_target = self.get_lambda_i_obj(pred_tensor, target_tensor)

    bbox_pred = coord_pred[:, :5 * self.B].contiguous().view(-1, 5)  # 网格含目标的bbox集合, [n_coord x B, 5=(x, y, w, h, conf)]
    bbox_target = coord_target[:, :5 * self.B].contiguous().view(-1, 5)

    coord_response_mask, bbox_target_iou = self.get_lambda_ij_obj(bbox_pred, bbox_target)

    bbox_pred_response = bbox_pred[coord_response_mask].view(-1, 5)
    bbox_target_response = bbox_target[coord_response_mask].view(-1, 5)
    target_iou = bbox_target_iou[coord_response_mask].view(-1, 5)

    noobj_pred_conf, noobj_target_conf = self.get_lambda_ij_noobj(pred_tensor, target_tensor)

    loss_wh = functional.mse_loss(bbox_pred_response[:, 2:4], bbox_target_response[:, 2:4], reduction='sum')
    loss_xy = functional.mse_loss(bbox_pred_response[:, :2], bbox_target_response[:, :2], reduction='sum')
    loss_obj = functional.mse_loss(bbox_pred_response[:, 4], target_iou[:, 4], reduction='sum')
    loss_noobj = functional.mse_loss(noobj_pred_conf, noobj_target_conf, reduction='sum')
    loss_class = functional.mse_loss(coord_pred[:, 5*self.B:], coord_target[:, 5*self.B:], reduction='sum')

    total_loss = self.i_coord * (loss_xy + loss_wh) + loss_obj + self.i_noobj * loss_noobj + loss_class
    total_loss = total_loss / float(batch_size)
    return total_loss


import torch
from torch.nn import Module, functional
from torch.autograd import Variable

def iou_compute(bbox_1, bbox_2):
    N, M = bbox_1.size(0), bbox_2.size(0)  # [N, 4=(x1, y2, x2, y2)], [M, 4=(x, y, w, y)]

    left_top = torch.max(bbox_1[:, :2].unsqueeze(1).expand(N, M, 2),  # [N, 2] -> [N, 1, 2] -> [N, M, 2]
                         bbox_2[:, :2].unsqueeze(0).expand(N, M, 2))  # [M, 2] -> [1, M, 2] -> [N, M, 2]
    right_bottom = torch.min(bbox_1[:, 2:].unsqueeze(1).expand(N, M, 2),
                             bbox_2[:, 2:].unsqueeze(0).expand(N, M, 2))

    wh = right_bottom - left_top  # 【N, M, 2】
    wh[wh < 0] = 0  # w, h < 0, 说明没有相交区域, 直接设置为 0

    # 求面积 w * h
    inter = wh[:, :, 0] * wh[:, :, 1]  # w * h
    area_1 = (bbox_1[:, 2] - bbox_1[:, 0]) * (bbox_1[:, 3] - bbox_1[:, 1])
    area_2 = (bbox_2[:, 2] - bbox_2[:, 0]) * (bbox_2[:, 3] - bbox_2[:, 1])
    area_1 = area_1.unsqueeze(1).expand_as(inter)
    area_2 = area_2.unsqueeze(0).expand_as(inter)

    return inter / (area_1 + area_2 - inter)  # [N, M, 2], iou

class Yolov1Loss(Module):
    def __init__(self, num_grids, num_bboxes, num_classes, i_coord, i_noobj):
        super(Yolov1Loss, self).__init__()

        self.S = num_grids
        self.B = num_bboxes
        self.C = num_classes
        self.i_coord = i_coord
        self.i_noobj = i_noobj
        self.N = 5 * num_bboxes + num_classes  # [x, y, w, h, conf] x num_bbox + num_class

    def forward(self, pred_tensor, target_tensor):

        batch_size = pred_tensor.size(0)  # pred_tensor = [batchsize, S, S, N=Bx5+C]

        coord_pred, coord_target = self.get_lambda_i_obj(pred_tensor, target_tensor)

        bbox_pred = coord_pred[:, :5 * self.B].contiguous().view(-1, 5)  # 网格含目标的bbox集合, [n_coord x B, 5=(x, y, w, h, conf)]
        bbox_target = coord_target[:, :5 * self.B].contiguous().view(-1, 5)

        coord_response_mask, bbox_target_iou = self.get_lambda_ij_obj(bbox_pred, bbox_target)
        bbox_pred_response = bbox_pred[coord_response_mask].view(-1, 5)  # x
        bbox_target_response = bbox_target[coord_response_mask].view(-1, 5)
        target_iou = bbox_target_iou[coord_response_mask].view(-1, 5)

        noobj_pred_conf, noobj_target_conf = self.get_lambda_ij_noobj(pred_tensor, target_tensor)

        loss_wh = functional.mse_loss(torch.sqrt(bbox_pred_response[:, 2:4]), 
                                      torch.sqrt(bbox_target_response[:, 2:4]), reduction='sum')
        loss_xy = functional.mse_loss(bbox_pred_response[:, :2],
                                      bbox_target_response[:, :2], reduction='sum')
        loss_obj = functional.mse_loss(bbox_pred_response[:, 4], target_iou[:, 4], reduction='sum')
        loss_noobj = functional.mse_loss(noobj_pred_conf, noobj_target_conf, reduction='sum')
        loss_class = functional.mse_loss(coord_pred[:, 5 * self.B:],
                                         coord_target[:, 5 * self.B:], reduction='sum')

        total_loss = self.i_coord * (loss_xy + loss_wh) + loss_obj + self.i_noobj * loss_noobj + loss_class
        total_loss = total_loss / float(batch_size)

        return total_loss

    def get_lambda_ij_noobj(self, pred_tensor, target_tensor):
        noobj_mask = target_tensor[..., 4] == 0  # mask=[batchsize, S, S], bool
        noobj_mask = noobj_mask.unsqueeze(-1).expand_as(target_tensor)  # mask=[batchsize, S, S, N]

        # 网格没有目标的
        noobj_pred, noobj_target = pred_tensor[noobj_mask].view(-1, self.N), target_tensor[noobj_mask].view(-1, self.N)

        noobj_conf_mask = torch.zeros(noobj_pred.size(), dtype=torch.bool).cuda()
        for b in range(self.B):
            noobj_conf_mask[:, 4 + b * 5] = 1  # 将不含目标的bbox的参数conf变成1
        noobj_pred_conf, noobj_target_conf = noobj_pred[noobj_conf_mask], noobj_target[noobj_conf_mask]

        return noobj_pred_conf, noobj_target_conf

    def get_lambda_i_obj(self, pred_tensor, target_tensor):
        coord_mask = target_tensor[..., 4] > 0  # 因为有2个bounding box,这里判断的应该是第一个bbox的conf吧,有进行排序吗?会降低一维度
        coord_mask = coord_mask.unsqueeze(-1).expand_as(target_tensor)  # 在最后一维加一维,整得和tensor一样

        # 相当于已经知道在网格内是否有目标了,但是不知道具体是哪个bbox是负责目标的 I_i^{obj}
        coord_pred = pred_tensor[coord_mask].view(-1, self.N)
        coord_target = target_tensor[coord_mask].view(-1, self.N)
        return coord_pred, coord_target

    def get_lambda_ij_obj(self, bbox_pred, bbox_target):
        # 通过bbox计算出iou来确定某个网格的某个bbox是否为一个目标负责

        # buffer
        bbox_with_obj_mask = torch.zeros(bbox_target.size(0), dtype=torch.bool).cuda()
        bbox_without_obj_mask = torch.ones(bbox_target.size(0), dtype=torch.bool).cuda()
        bbox_target_iou = torch.zeros(bbox_target.size()).cuda()
        # 遍历网格内的bbox
        for i in range(0, bbox_target.size(0), self.B):
            # 预测值与真实值的坐标重新转换(由于归一化)
            pred_xyxy = self.denormalize(bbox_pred[i: i + self.B])
            target_xyxy = self.denormalize(bbox_target[i].view(-1, 5))
            # max iou (ground truth box and bbox)
            iou = iou_compute(pred_xyxy, target_xyxy)
            max_iou, max_index = iou.max(0)

            max_index = max_index.data.cuda()
            bbox_with_obj_mask[i + max_index] = 1  # 这个就是 i_{ij}^{obj}
            bbox_without_obj_mask[i + max_index] = 0  # 好像没有用到

            bbox_target_iou[i + max_index, torch.LongTensor([4]).cuda()] = max_iou.data.cuda()  # 只填充conf的位置
        bbox_target_iou = Variable(bbox_target_iou).cuda()

        return bbox_with_obj_mask, bbox_target_iou

    def denormalize(self, xywh):
        # xywh 转成 xyxy 格式,同时反归一化,恢复原来尺寸
        xyxy = Variable(torch.FloatTensor(xywh.size()))
        xyxy[:, :2] = xywh[:, :2] / float(self.S) - 0.5 * xywh[:, 2:4]
        xyxy[:, 2:4] = xywh[:, :2] / float(self.S) + 0.5 * xywh[:, 2:4]
        return xyxy[:, :4]

if __name__ == '__main__':
    loss = Yolov1Loss(7, 2, 20, 0.5, 0.5)
    a = torch.randn(32, 7, 7, 30).cuda()
    b = torch.randn(32, 7, 7, 30).cuda()
    j = loss.forward(a, b)

