【YoloV1】损失函数最贴近公式的实现+解读（pytorch）

Jiangnan_Cai

已于 2023-08-17 11:34:25 修改

阅读量481

点赞数

分类专栏：深度学习文章标签： python pytorch 深度学习目标检测 YOLO 计算机视觉神经网络

于 2023-08-11 14:50:48 首次发布

本文链接：https://blog.csdn.net/Jiangnan_Cai/article/details/132192813

版权

深度学习专栏收录该内容

19 篇文章 0 订阅

订阅专栏

最近在从头开始复现YOLO家族的网络。

文章目录

数学公式
构造函数
辅助工具函数
- IOU计算
- 反归一化
参数计算： $\mathbb{1}_{i}^{\text{obj}}$ ， $\mathbb{1}_{ij}^{\text{obj}}$ ， $\mathbb{1}_{ij}^{\text{noobj}}$
前向传播函数计算损失
全部代码

首先从Yolov1开始动手。

我参考的是这位日本大佬实现的YoloV1损失函数。实现得非常的优美，内部的 for 循环都尽可能的使用矩阵运算替代了，我自己在这份代码的基础上进行了一些整理，让大家能够更简单的理解，现在就来解读一下吧。

数学公式

$\begin{equation} \begin{aligned} \text{loss} &= \lambda _{\textbf{coord}}\sum _{i=0}^{S^2}\sum _{j=0}^{B}\mathbb{1}_{ij}^{\text{obj}} \left ( x_i - \hat{x}_i \right )^2 + \left ( y_i - \hat{y}_i \right )^2 \\ &+ \lambda _{\textbf{coord}}\sum _{i=0}^{S^2}\sum _{j=0}^{B}\mathbb{1}_{ij}^{\text{obj}} \left ( \sqrt[]{w_i} - \sqrt[]{\hat{w}_i } \right )^2 + \left ( \sqrt[]{h_i} - \sqrt[]{\hat{h}_i} \right )^2 \\ &+ \sum _{i=0}^{S^2}\sum _{j=0}^{B}\mathbb{1}_{ij}^{\text{obj}} \left ( C_i - \hat{C}_i \right )^2 \\ &+ \lambda _{\textbf{noobj}}\sum _{i=0}^{S^2}\sum _{j=0}^{B}\mathbb{1}_{ij}^{\text{noobj}} \left ( C_i - \hat{C}_i \right )^2 \\ &+ \sum_{i=0}^{S^2}\mathbb{1}_{i}^{\text{obj}} \sum _{c\in \textbf{classes}} \left ( p_i(c) - \hat{p}_i(c) \right )^2 \end{aligned} \end{equation}$

上图是 YoloV1 损失函数的数学表达式，我们知道它是由三大部分组成的：

边界框 bounding box 的几何损失：
- 中心点位置： $\lambda _{\textbf{coord}}\sum _{i=0}^{S^2}\sum _{j=0}^{B}\mathbb{1}_{ij}^{\text{obj}} \left ( x_i - \hat{x}_i \right )^2 + \left ( y_i - \hat{y}_i \right )^2$
- 宽高尺寸： $\lambda _{\textbf{coord}}\sum _{i=0}^{S^2}\sum _{j=0}^{B}\mathbb{1}_{ij}^{\text{obj}} \left ( \sqrt[]{w_i} - \sqrt[]{\hat{w}_i } \right )^2 + \left ( \sqrt[]{h_i} - \sqrt[]{\hat{h}_i} \right )^2$
边界框 bounding box 的置信度损失：
- 包含目标： $\sum _{i=0}^{S^2}\sum _{j=0}^{B}\mathbb{1}_{ij}^{\text{obj}} \left ( C_i - \hat{C}_i \right )^2$
- 不包含目标： $\lambda _{\textbf{noobj}}\sum _{i=0}^{S^2}\sum _{j=0}^{B}\mathbb{1}_{ij}^{\text{noobj}} \left ( C_i - \hat{C}_i \right )^2$
网格 grid 的分类损失： $\sum_{i=0}^{S^2}\mathbb{1}_{i}^{\text{obj}} \sum _{c\in \textbf{classes}} \left ( p_i(c) - \hat{p}_i(c) \right )^2$

可以看到上面的损失函数，有三个部分，有两个部分是和边界框（bounding box）相关的，有两个部分是和网格（grid）相关的。为什么要分得这么细呢？这主要是和代码的实现相关，其实我对这位日本大佬的代码实现也有一些疑惑，后续会提到。

构造函数

from torch.nn import Module
class Yolov1Loss(Module):
	def __init__(self, num_grids, num_bboxes, num_classes, lambda_coord, lambda_noobj):
		super(Yolov1Loss, self).__init__()
		self.S = num_grids
		self.B = num_bboxes
		self.C = num_classes
		self.lambda_coord = lambda_coord
		self.lambda_noobj = lambda_noobj
		self.N = 5 * num_bboxes + num_classes

num_grids：图片被分为 $\times S$ 的网格
num_bboxes：每个网格内，设置 $B$ 个边界框（bounding box）来进行预测
num_classes：需要预测的目标类别数量
lambda_coord：公式中的 $\lambda _{\textbf{coord}}$
lambda_noobj：公式中的 $\lambda _{\textbf{noobj}}$

辅助工具函数

IOU计算

已知网格内部含有目标，网格内部有多个用于预测的边界框，但是只有其中一个边界框会对目标负责，我们要通过计算边界框与真实框（ground truth box）的 IOU，有着最大 IOU 的边界框才是负责目标物体的。

这个 IOU 的计算使用到了矩阵运算。

 def compute_iou(self, bbox1, bbox2):
        N, M = bbox1.size(0), bbox2.size(0)

        left_top = torch.max(
            bbox1[:, :2].unsqueeze(1).expand(N, M, 2), # [N, 2] -> [N, 1, 2] -> [N, M, 2]
            bbox2[:, :2].unsqueeze(0).expand(N, M, 2)  # [M, 2] -> [1, M, 2] -> [N, M, 2]
        )

        right_bottom = torch.min(
            bbox1[:, 2:].unsqueeze(1).expand(N, M, 2), # [N, 2] -> [N, 1, 2] -> [N, M, 2]
            bbox2[:, 2:].unsqueeze(0).expand(N, M, 2)  # [M, 2] -> [1, M, 2] -> [N, M, 2]
        )
        # Compute area of the intersections from the coordinates
        wh = right_bottom - left_top   # width and height of the intersection, [N, M, 2]
        wh[wh < 0] = 0 # clip at 0
        inter = wh[:, :, 0] * wh[:, :, 1] # [N, M]

        # Compute area of the bboxes
        area1 = (bbox1[:, 2] - bbox1[:, 0]) * (bbox1[:, 3] - bbox1[:, 1]) # [N, ]
        area2 = (bbox2[:, 2] - bbox2[:, 0]) * (bbox2[:, 3] - bbox2[:, 1]) # [M, ]
        area1 = area1.unsqueeze(1).expand_as(inter) # [N, ] -> [N, 1] -> [N, M]
        area2 = area2.unsqueeze(0).expand_as(inter) # [M, ] -> [1, M] -> [N, M]

        # Compute IoU from the areas
        union = area1 + area2 - inter # [N, M]
        iou = inter / union           # [N, M]
        return iou

反归一化

我们都知道 Yolo 的数据标注格式：label center_x center_y w h

其中除了 label，其他的和位置坐标相关的参数，都是经过图片尺寸宽高进行归一化操作的，所以在计算 IOU 之前，我们要将边界框的相关参数进行反归一化。

def denormlaize(self, xywh):
	xyxy = Variable(torch.FloatTensor(xywh.size()))
	xyxy[:, :2] = xywh[:, :2] / float(self.S) - 0.5 * xywh[:, 2:4]
	xyxy[:, 2:4] = xywh[:, :2] / float(self.S) + 0.5 * xywh[:, 2:4]
	return xyxy[:, :4]

参数计算： $\mathbb{1}_{i}^{\text{obj}}$ ， $\mathbb{1}_{ij}^{\text{obj}}$ ， $\mathbb{1}_{ij}^{\text{noobj}}$

这三个参数其实都算是一个 flag，标志位：

$\mathbb{1}_{i}^{\text{obj}}$ ：网格 i 内含有目标则为 1，不含目标则为 0 。
$\mathbb{1}_{ij}^{\text{obj}}$ ：网格 i 的边界框 j 负责目标则为 1，不负责目标则为 0 。在计算包含目标的置信率的时候，负责目标的情况是为 IOU 值，而不是 1。
$\mathbb{1}_{ij}^{\text{noobj}}$ ：网格 i 的边界框 j 不负责目标则为 1，负责目标则为 0 。

所以相当于有 4 个参数。

但是具体到代码实现的时候，因为大部分都是使用矩阵运算，所以下面的代码都使用掩码矩阵等效替代上面的参数，经过掩码矩阵出来之后的矩阵，只会保留为 1 的结果值。

获得 $\mathbb{1}_{i}^{\text{obj}}$

我们知道在计算类别损失的时候要用到这个参数，这个参数的含义也非常简单：一张图片有 $\times S$ 个网格，如果网格 i 内有目标物体，则 $\mathbb{1}_{i}^{\text{obj}}=1$ ，没有则为 0 。

def get_lambda_i_obj(self, pred_tensor, target_tensor):
    coord_mask = target_tensor[..., 4] > 0  
    coord_mask = coord_mask.unsqueeze(-1).expand_as(target_tensor)  # 在最后一维加一维，整得和tensor一样

    # 相当于已经知道在网格内是否有目标了，但是不知道具体是哪个bbox是负责目标的 I_i^{obj}
    coord_pred = pred_tensor[coord_mask].view(-1, self.N)
    coord_target = target_tensor[coord_mask].view(-1, self.N)
    
    return coord_pred, coord_target

返回的两个参数：

coord_pred：预测值中含有目标的网格
coord_target：真实值中含有目标的网格

其实这里我是有疑问的：coord_mask = target_tensor[..., 4] > 0 这行代码只判断了第一个边界框的 conf 是否大于 0 ，而没有去判断后面边界框的 conf，让我很疑惑。有没有大佬解释一下。

2023/08/16 更新：我后来自己想了一下，target_tensor 实际就是真实框（ground truth box）数据，因为真实框只有一个，在传入损失函数与预测框进行比较的时候，会将它的形式转成 $7\times7\times30$ 的矩阵，在这个过程中，因为只有一个真实框，为了保持整体数据维度一致，实际上会将其复制成 B 个（也就是和预测框的数量一直），就是说最后一维 30 的 0～4 与 5～9 实际是一样的（ $B = 2$ 的情况）。上面的代码写成 coord_mask = target_tensor[..., 9] > 0 也是可以的。

只要网格中含有目标的话，就会被保留下来。有了这个参数，我们可以很方便的计算类别损失。

获得 $\mathbb{1}_{ij}^{\text{obj}}$

从上面损失函数的公式中，我们可以看到 $\mathbb{1}_{ij}^{\text{obj}}$ 是最常出现的参数，其实就是 conf 这个参数，它的计算也是损失函数中的重点。

尽管它在三个地方出现了，但是实际上有些细微的区别，如同论文下面所说：

Otherwise we want the confidence score to equal the intersection over union (IOU) between the predicted box and the ground truth.

在边界框的几何损失中：如果网格 i 中存在目标物体，且边界框 j 是对这个目标物体负责的话， $\mathbb{1}_{ij}^{\text{obj}} = 1$ ，否则 $\mathbb{1}_{ij}^{\text{obj}} = 0$
在边界框的置信度损失（包含目标）中：如果网格 i 中存在目标物体，且边界框 j 是对这个目标物体负责的话， $\mathbb{1}_{ij}^{\text{obj}} = \text{IOU(bbox, ground truth box)}$ ，否则也是 $\mathbb{1}_{ij}^{\text{obj}} = 0$ ，这里的区别如果是负责目标物体的话， $\mathbb{1}_{ij}^{\text{obj}}$ 为两者实际的 IOU 值，而不是 1 。

def get_lambda_ij_obj(self, bbox_pred, bbox_target):
    # 通过bbox计算出iou来确定某个网格的某个bbox是否为一个目标负责
    
    # buffer
    bbox_with_obj_mask = torch.zeros(bbox_target.size(0), dtype=torch.bool).cuda()
    bbox_without_obj_mask = torch.ones(bbox_target.size(0), dtype=torch.bool).cuda()
    bbox_target_iou = torch.zeros(bbox_target.size()).cuda()
    
    # 遍历网格内的bbox
    for i in range(0, bbox_target.size(0), self.B):
        
        # 预测值与真实值的坐标重新转换（由于归一化）
        pred_xyxy = self.denormalize(bbox_pred[i: i + self.B])
        target_xyxy = self.denormalize(bbox_target[i].view(-1, 5))
        
        # max iou (ground truth box and bbox)
        iou = iou_compute(pred_xyxy, target_xyxy)
        max_iou, max_index = iou.max(0)
        
        max_index = max_index.data.cuda()
        bbox_with_obj_mask[i + max_index] = 1  # 这个就是 i_{ij}^{obj}
        bbox_without_obj_mask[i + max_index] = 0  # 好像没有用到
        bbox_target_iou[i + max_index, torch.LongTensor([4]).cuda()] = max_iou.data.cuda()  # 只填充conf的位置
        
    bbox_target_iou = Variable(bbox_target_iou).cuda()
    return bbox_with_obj_mask, bbox_target_iou

上面返回的两个参数:

bbox_with_obj_mask：这是一个掩码。和边界框 bbox 集合的矩阵运算之后，只会保留下负责目标的边界框，也就是等效的第一种 $\mathbb{1}_{ij}^{\text{obj}}$ 。
bbox_target_iou：维度为 [n_coord x B, 5] ，其中第 5 列表示 conf, 如果边界框负责有目标物体，则会被赋值 IOU 实际值。后续还有和 bbox_with_obj_mask进行运算，保留下负责目标的边界框。这也就是上面的第二种 $\mathbb{1}_{ij}^{\text{obj}}$ 。

获得 $\mathbb{1}_{ij}^{\text{noobj}}$

def get_lambda_ij_noobj(self, pred_tensor, target_tensor):

    noobj_mask = target_tensor[..., 4] == 0  # mask=[batchsize, S, S], bool
    noobj_mask = noobj_mask.unsqueeze(-1).expand_as(target_tensor)  # mask=[batchsize, S, S, N]

    # 网格没有目标的
    noobj_pred = pred_tensor[noobj_mask].view(-1, self.N)
    noobj_target = target_tensor[noobj_mask].view(-1, self.N)
    noobj_conf_mask = torch.zeros(noobj_pred.size(), dtype=torch.bool).cuda()

    for b in range(self.B):
        noobj_conf_mask[:, 4 + b * 5] = 1  # 将不含目标的bbox的参数conf变成1
    
    noobj_pred_conf = noobj_pred[noobj_conf_mask]
    noobj_target_conf = noobj_target[noobj_conf_mask]
    
    return noobj_pred_conf, noobj_target_conf

其实对此我个人是也是有点疑惑的，代码不难理解。

我们知道边界框有 3 种类型：

网格包含目标，负责该目标的边界框
网格包含目标，但是不负责该目标的边界框
网格不包含目标，肯定也没有要负责目标的边界框

但是说是要计算 $\mathbb{1}_{ij}^{\text{noobj}}$ ，但是这个代码貌似只是将考虑了第三种，就是不包含目标的网格，内部的边界框。如果是这样的话，参数表达直接写成： $\mathbb{1}_{i}^{\text{noobj}}$ 就好了？和上面的 $\mathbb{1}_{i}^{\text{obj}}$ 一样，都只做了网格这一层面的判断。没有考虑网格包含目标，但是不负责该目标的边界框。既然写成 $\mathbb{1}_{ij}^{\text{noobj}}$ ，应该同时遍历了网格和边界框的啊。

不过我也在别处看到一种说法，说第二种情况的边界框损失已经在含目标的置信率损失里面考虑到了，我没太懂怎么考虑到了，求助大佬解惑。

前向传播函数计算损失

def forward(self, pred_tensor, target_tensor):

    batch_size = pred_tensor.size(0)  # pred_tensor = [batchsize, S, S, N=Bx5+C]

    coord_pred, coord_target = self.get_lambda_i_obj(pred_tensor, target_tensor)

    bbox_pred = coord_pred[:, :5 * self.B].contiguous().view(-1, 5)  # 网格含目标的bbox集合， [n_coord x B, 5=(x, y, w, h, conf)]
    bbox_target = coord_target[:, :5 * self.B].contiguous().view(-1, 5)

    coord_response_mask, bbox_target_iou = self.get_lambda_ij_obj(bbox_pred, bbox_target)

    bbox_pred_response = bbox_pred[coord_response_mask].view(-1, 5)
    bbox_target_response = bbox_target[coord_response_mask].view(-1, 5)
    target_iou = bbox_target_iou[coord_response_mask].view(-1, 5)

    noobj_pred_conf, noobj_target_conf = self.get_lambda_ij_noobj(pred_tensor, target_tensor)

    loss_wh = functional.mse_loss(bbox_pred_response[:, 2:4], bbox_target_response[:, 2:4], reduction='sum')
    loss_xy = functional.mse_loss(bbox_pred_response[:, :2], bbox_target_response[:, :2], reduction='sum')
    loss_obj = functional.mse_loss(bbox_pred_response[:, 4], target_iou[:, 4], reduction='sum')
    loss_noobj = functional.mse_loss(noobj_pred_conf, noobj_target_conf, reduction='sum')
    loss_class = functional.mse_loss(coord_pred[:, 5*self.B:], coord_target[:, 5*self.B:], reduction='sum')

    total_loss = self.i_coord * (loss_xy + loss_wh) + loss_obj + self.i_noobj * loss_noobj + loss_class
    total_loss = total_loss / float(batch_size)
    return total_loss

全部代码

import torch
from torch.nn import Module, functional
from torch.autograd import Variable


def iou_compute(bbox_1, bbox_2):
    N, M = bbox_1.size(0), bbox_2.size(0)  # [N， 4=(x1, y2, x2, y2)], [M， 4=(x, y, w, y)]

    left_top = torch.max(bbox_1[:, :2].unsqueeze(1).expand(N, M, 2),  # [N, 2] -> [N, 1, 2] -> [N, M, 2]
                         bbox_2[:, :2].unsqueeze(0).expand(N, M, 2))  # [M, 2] -> [1, M, 2] -> [N, M, 2]
    right_bottom = torch.min(bbox_1[:, 2:].unsqueeze(1).expand(N, M, 2),
                             bbox_2[:, 2:].unsqueeze(0).expand(N, M, 2))

    wh = right_bottom - left_top  # 【N, M， 2】
    wh[wh < 0] = 0  # w, h < 0, 说明没有相交区域， 直接设置为 0

    # 求面积 w * h
    inter = wh[:, :, 0] * wh[:, :, 1]  # w * h
    area_1 = (bbox_1[:, 2] - bbox_1[:, 0]) * (bbox_1[:, 3] - bbox_1[:, 1])
    area_2 = (bbox_2[:, 2] - bbox_2[:, 0]) * (bbox_2[:, 3] - bbox_2[:, 1])
    area_1 = area_1.unsqueeze(1).expand_as(inter)
    area_2 = area_2.unsqueeze(0).expand_as(inter)

    return inter / (area_1 + area_2 - inter)  # [N, M, 2], iou


class Yolov1Loss(Module):
    def __init__(self, num_grids, num_bboxes, num_classes, i_coord, i_noobj):
        super(Yolov1Loss, self).__init__()

        self.S = num_grids
        self.B = num_bboxes
        self.C = num_classes
        self.i_coord = i_coord
        self.i_noobj = i_noobj
        self.N = 5 * num_bboxes + num_classes  # [x, y, w, h, conf] x num_bbox + num_class

    def forward(self, pred_tensor, target_tensor):

        batch_size = pred_tensor.size(0)  # pred_tensor = [batchsize, S, S, N=Bx5+C]

        coord_pred, coord_target = self.get_lambda_i_obj(pred_tensor, target_tensor)

        bbox_pred = coord_pred[:, :5 * self.B].contiguous().view(-1, 5)  # 网格含目标的bbox集合， [n_coord x B, 5=(x, y, w, h, conf)]
        bbox_target = coord_target[:, :5 * self.B].contiguous().view(-1, 5)

        coord_response_mask, bbox_target_iou = self.get_lambda_ij_obj(bbox_pred, bbox_target)
        bbox_pred_response = bbox_pred[coord_response_mask].view(-1, 5)  # x
        bbox_target_response = bbox_target[coord_response_mask].view(-1, 5)
        target_iou = bbox_target_iou[coord_response_mask].view(-1, 5)

        noobj_pred_conf, noobj_target_conf = self.get_lambda_ij_noobj(pred_tensor, target_tensor)

        loss_wh = functional.mse_loss(torch.sqrt(bbox_pred_response[:, 2:4]), 
                                      torch.sqrt(bbox_target_response[:, 2:4]), reduction='sum')
        loss_xy = functional.mse_loss(bbox_pred_response[:, :2],
                                      bbox_target_response[:, :2], reduction='sum')
        loss_obj = functional.mse_loss(bbox_pred_response[:, 4], target_iou[:, 4], reduction='sum')
        loss_noobj = functional.mse_loss(noobj_pred_conf, noobj_target_conf, reduction='sum')
        loss_class = functional.mse_loss(coord_pred[:, 5 * self.B:],
                                         coord_target[:, 5 * self.B:], reduction='sum')

        total_loss = self.i_coord * (loss_xy + loss_wh) + loss_obj + self.i_noobj * loss_noobj + loss_class
        total_loss = total_loss / float(batch_size)

        return total_loss

    def get_lambda_ij_noobj(self, pred_tensor, target_tensor):
        noobj_mask = target_tensor[..., 4] == 0  # mask=[batchsize, S, S], bool
        noobj_mask = noobj_mask.unsqueeze(-1).expand_as(target_tensor)  # mask=[batchsize, S, S, N]

        # 网格没有目标的
        noobj_pred, noobj_target = pred_tensor[noobj_mask].view(-1, self.N), target_tensor[noobj_mask].view(-1, self.N)

        noobj_conf_mask = torch.zeros(noobj_pred.size(), dtype=torch.bool).cuda()
        for b in range(self.B):
            noobj_conf_mask[:, 4 + b * 5] = 1  # 将不含目标的bbox的参数conf变成1
        noobj_pred_conf, noobj_target_conf = noobj_pred[noobj_conf_mask], noobj_target[noobj_conf_mask]

        return noobj_pred_conf, noobj_target_conf

    def get_lambda_i_obj(self, pred_tensor, target_tensor):
        coord_mask = target_tensor[..., 4] > 0  # 因为有2个bounding box，这里判断的应该是第一个bbox的conf吧，有进行排序吗？会降低一维度
        coord_mask = coord_mask.unsqueeze(-1).expand_as(target_tensor)  # 在最后一维加一维，整得和tensor一样

        # 相当于已经知道在网格内是否有目标了，但是不知道具体是哪个bbox是负责目标的 I_i^{obj}
        coord_pred = pred_tensor[coord_mask].view(-1, self.N)
        coord_target = target_tensor[coord_mask].view(-1, self.N)
        return coord_pred, coord_target

    def get_lambda_ij_obj(self, bbox_pred, bbox_target):
        # 通过bbox计算出iou来确定某个网格的某个bbox是否为一个目标负责

        # buffer
        bbox_with_obj_mask = torch.zeros(bbox_target.size(0), dtype=torch.bool).cuda()
        bbox_without_obj_mask = torch.ones(bbox_target.size(0), dtype=torch.bool).cuda()
        bbox_target_iou = torch.zeros(bbox_target.size()).cuda()
        # 遍历网格内的bbox
        for i in range(0, bbox_target.size(0), self.B):
            # 预测值与真实值的坐标重新转换（由于归一化）
            pred_xyxy = self.denormalize(bbox_pred[i: i + self.B])
            target_xyxy = self.denormalize(bbox_target[i].view(-1, 5))
            # max iou (ground truth box and bbox)
            iou = iou_compute(pred_xyxy, target_xyxy)
            max_iou, max_index = iou.max(0)

            max_index = max_index.data.cuda()
            bbox_with_obj_mask[i + max_index] = 1  # 这个就是 i_{ij}^{obj}
            bbox_without_obj_mask[i + max_index] = 0  # 好像没有用到

            bbox_target_iou[i + max_index, torch.LongTensor([4]).cuda()] = max_iou.data.cuda()  # 只填充conf的位置
        bbox_target_iou = Variable(bbox_target_iou).cuda()

        return bbox_with_obj_mask, bbox_target_iou

    def denormalize(self, xywh):
        # xywh 转成 xyxy 格式，同时反归一化，恢复原来尺寸
        xyxy = Variable(torch.FloatTensor(xywh.size()))
        xyxy[:, :2] = xywh[:, :2] / float(self.S) - 0.5 * xywh[:, 2:4]
        xyxy[:, 2:4] = xywh[:, :2] / float(self.S) + 0.5 * xywh[:, 2:4]
        return xyxy[:, :4]


if __name__ == '__main__':
    loss = Yolov1Loss(7, 2, 20, 0.5, 0.5)
    a = torch.randn(32, 7, 7, 30).cuda()
    b = torch.randn(32, 7, 7, 30).cuda()
    j = loss.forward(a, b)
    print(j)