yolox之SimOTA代码解析与详细注释

刀么克瑟拉莫

已于 2023-02-17 15:11:48 修改

阅读量635

点赞数 1

分类专栏： deeplearning pytorch 文章标签：深度学习

于 2023-02-17 15:05:04 首次发布

本文链接：https://blog.csdn.net/random_repick/article/details/129087342

版权

该文描述了一种目标检测算法中的关键步骤，包括选择候选前景网格，基于IoU和代价计算来分配每个GroundTruth（GT）的预测框作为正样本。通过计算候选前景与GT的交并比（IoU）和成本，动态地为每个GT分配最多10个预测框，确保正样本的合理选择。

摘要由CSDN通过智能技术生成

1.算法流程

1.选择候选前景，由两部分组成：gt内的网格和以gt为中心固定范围内的网格
2.计算候选前景的预测框与gt的iou和cost
3.给每个gt分配合适个数的预测框作为正样本：
- 每个gt选最多前十个iou最大的预测框
- 每个gt的iou求和向下取整得到各自的分配个数k
- 每个gt分配前k个cost最小的预测框
- 如果一个预测框被分配给多个gt，则选择cost最小的那个

2.代码

选择候选前景

def get_in_boxes_info(
        self,
        gt_bboxes_per_image,
        expanded_strides,
        x_shifts,
        y_shifts,
        total_num_anchors,
        num_gt,
    ):
        torch.set_printoptions(threshold=1e6)
        # print(expanded_strides.shape)
        # print(expanded_strides)
        # torch.Size([1, 8400])
        # 8:6400     16:1600     32:400
        # tensor([[ 8.,  8.,  8.,  ..., 32., 32., 32.]])
        # print(x_shifts.shape)
        # print(x_shifts)
        # torch.Size([1, 8400])
        # 0~79:80    0~39:40    0~19:20
        # tensor([[ 0.,  1.,  2.,  ..., 17., 18., 19.]])
        # print(y_shifts.shape)
        # print(y_shifts)
        # torch.Size([1, 8400])
        # 0:80~79:80  0:40~39:40  0:20~19:20
        # tensor([[ 0.,  0.,  0.,  ..., 19., 19., 19.]])
        # 输出网格映射回原图像
        expanded_strides_per_image = expanded_strides[0]
        x_shifts_per_image = x_shifts[0] * expanded_strides_per_image
        # print(x_shifts_per_image.shape)
        # print(x_shifts_per_image)
        y_shifts_per_image = y_shifts[0] * expanded_strides_per_image
        x_centers_per_image = (
            (x_shifts_per_image + 0.5 * expanded_strides_per_image)
            .unsqueeze(0)
            .repeat(num_gt, 1)
        )  # [n_anchor] -> [n_gt, n_anchor]
        y_centers_per_image = (
            (y_shifts_per_image + 0.5 * expanded_strides_per_image)
            .unsqueeze(0)
            .repeat(num_gt, 1)
        )
        # print(gt_bboxes_per_image)
        # sys.exit()
        # gt框的范围
        gt_bboxes_per_image_l = (
            (gt_bboxes_per_image[:, 0] - 0.5 * gt_bboxes_per_image[:, 2])
            .unsqueeze(1)
            .repeat(1, total_num_anchors)
        )
        gt_bboxes_per_image_r = (
            (gt_bboxes_per_image[:, 0] + 0.5 * gt_bboxes_per_image[:, 2])
            .unsqueeze(1)
            .repeat(1, total_num_anchors)
        )
        gt_bboxes_per_image_t = (
            (gt_bboxes_per_image[:, 1] - 0.5 * gt_bboxes_per_image[:, 3])
            .unsqueeze(1)
            .repeat(1, total_num_anchors)
        )
        gt_bboxes_per_image_b = (