Yolov5中Detect层的输出转化成图片上的预测框的过程

最新推荐文章于 2024-06-16 11:01:20 发布

葬天机

最新推荐文章于 2024-06-16 11:01:20 发布

阅读量795

点赞数 6

文章标签： YOLO 人工智能深度学习

本文链接：https://blog.csdn.net/qq_46027463/article/details/134403951

版权

首先，val模式下的Detect层输出是一个大小为( tensor(bs, 3*(20*20+40*40+80*80), 5+nc), list[3])的元组，以下是val.py中的代码。

        # Inference
        out, train_out = model(im) if training else model(im, augment=augment, val=True)  # inference, loss outputs

out接收了tensor(bs, 3*(20*20+40*40+80*80), 5+nc)预测的信息，其中3*(20*20+40*40+80*80)代表对三个检测层输出的20/40/80特征图的每个像素点预测三个不同尺寸比例的预测框。其中5+nc为xywh+conf+nc。train_out接受的是列表与本文无关，不作讨论。

        # out =元素为形状为(筛选后的n, x1y1x2y2 + rate + class = 6)张量的列表，元素个数为bs
        out = non_max_suppression(out, conf_thres, iou_thres, labels=lb, multi_label=True, agnostic=single_cls)

对输出进行NMS操作，返回的结果out是list，其中的元素是张量，形状为(筛选后的n, x1y1x2y2 + rate + class = 6)，元素个数为bs，n对于每个张量元素各不相同，是从3*(20*20+40*40+80*80)中筛选掉了不符合置信度和iou的预测框得到的，multi_label=True，所以同一个框会有可能多个置信度超过阈值的分类，n也不是该batch内的所有框的数量，而是比它多。

        # Plot images Thread将会创建并启动两个新的线程来执行 plot_images函数，以提高程序的执行效率和响应速度
        if plots and batch_i < 3:
            f = save_dir / f'val_batch{batch_i}_labels.jpg'  # labels
            Thread(target=plot_images, args=(im, targets, paths, f, names), daemon=True).start()
            f = save_dir / f'val_batch{batch_i}_pred.jpg'  # predictions
            Thread(target=plot_images, args=(im, output_to_target(out), paths, f, names), daemon=True).start()

最后一行代码创建一个新的线程来执行plot_images函数，参数为(im, output_to_target(out), paths, f, names)。对于out先执行了output_to_target()函数，得到的返回结果是一个数组，类似列表，其中数组的每一行元素是[bs的序号，类别， xywh， conf]，例如[0, 0, x1, y1 ,w1, h1, 0.5]、[0, 2, x1, y1 ,w1, h1, 0.3]、[0, 0, x2，y2 ,w2, h2, 0.4]......

def plot_images(images, targets, paths=None, fname='images.jpg', names=None, max_size=1920, max_subplots=16):
    # Plot image grid with labels
    if isinstance(images, torch.Tensor):
        images = images.cpu().float().numpy()
    if isinstance(targets, torch.Tensor):
        targets = targets.cpu().numpy()
    if np.max(images[0]) <= 1:
        images *= 255  # de-normalise (optional)
    bs, _, h, w = images.shape  # batch size, _, height, width
    bs = min(bs, max_subplots)  # limit plot images
    ns = np.ceil(bs ** 0.5)  # number of subplots (square)

    # Build Image
    mosaic = np.full((int(ns * h), int(ns * w), 3), 255, dtype=np.uint8)  # init
    for i, im in enumerate(images):
        if i == max_subplots:  # if last batch has fewer images than we expect
            break
        x, y = int(w * (i // ns)), int(h * (i % ns))  # block origin
        im = im.transpose(1, 2, 0)
        mosaic[y:y + h, x:x + w, :] = im

    # Resize (optional)
    scale = max_size / ns / max(h, w)
    if scale < 1:
        h = math.ceil(scale * h)
        w = math.ceil(scale * w)
        mosaic = cv2.resize(mosaic, tuple(int(x * ns) for x in (w, h)))

    # Annotate
    fs = int((h + w) * ns * 0.01)  # font size
    annotator = Annotator(mosaic, line_width=round(fs / 10), font_size=fs, pil=True, example=names)
    for i in range(i + 1):
        x, y = int(w * (i // ns)), int(h * (i % ns))  # block origin 给batch组成的大图中的每一张图像划定边界
        annotator.rectangle([x, y, x + w, y + h], None, (255, 255, 255), width=2)  # borders
        if paths:
            annotator.text((x + 5, y + 5 + h), text=Path(paths[i]).name[:40], txt_color=(220, 220, 220))  # filenames
        if len(targets) > 0:
            ti = targets[targets[:, 0] == i]  # image targets
            boxes = xywh2xyxy(ti[:, 2:6]).T
            classes = ti[:, 1].astype('int')
            labels = ti.shape[1] == 6  # labels if no conf column
            conf = None if labels else ti[:, 6]  # check for confidence presence (label vs pred)

            if boxes.shape[1]:
                if boxes.max() <= 1.01:  # if normalized with tolerance 0.01
                    boxes[[0, 2]] *= w  # scale to pixels
                    boxes[[1, 3]] *= h
                elif scale < 1:  # absolute coords need scale if image scales
                    boxes *= scale
            boxes[[0, 2]] += x
            boxes[[1, 3]] += y
            for j, box in enumerate(boxes.T.tolist()):
                cls = classes[j]
                color = colors(cls)
                cls = names[cls] if names else cls
                if labels or conf[j] > 0.25:  # 0.25 conf thresh
                    label = f'{cls}' if labels else f'{cls} {conf[j]:.1f}'
                    annotator.box_label(box, label, color=color)
    annotator.im.save(fname)  # save

这些代码中只需要关注以下代码的第二行代码及下面的循环

    # Annotate
    fs = int((h + w) * ns * 0.01)  # font size
    annotator = Annotator(mosaic, line_width=round(fs / 10), font_size=fs, pil=True, example=names)
    for i in range(i + 1):
        x, y = int(w * (i // ns)), int(h * (i % ns))  # block origin
        annotator.rectangle([x, y, x + w, y + h], None, (255, 255, 255), width=2)  # borders 图像边界
        if paths:
            annotator.text((x + 5, y + 5 + h), text=Path(paths[i]).name[:40], txt_color=(220, 220, 220))  # filenames
        if len(targets) > 0:
            ti = targets[targets[:, 0] == i]  # image targets  bs的序号==i表示如果是该batch内的信息，则保存在ti中
            boxes = xywh2xyxy(ti[:, 2:6]).T
            classes = ti[:, 1].astype('int')
            labels = ti.shape[1] == 6  # labels if no conf column
            conf = None if labels else ti[:, 6]  # check for confidence presence (label vs pred)

            if boxes.shape[1]:
                if boxes.max() <= 1.01:  # if normalized with tolerance 0.01
                    boxes[[0, 2]] *= w  # scale to pixels 框在这张图片上的原始尺寸大小的宽
                    boxes[[1, 3]] *= h
                elif scale < 1:  # absolute coords need scale if image scales
                    boxes *= scale
            boxes[[0, 2]] += x  # 框在batch拼成的大图上的位置
            boxes[[1, 3]] += y
            for j, box in enumerate(boxes.T.tolist()):
                cls = classes[j]  # boxes与classes的索引是一一对应的
                color = colors(cls)
                cls = names[cls] if names else cls
                if labels or conf[j] > 0.25:  # 0.25 conf thresh
                    label = f'{cls}' if labels else f'{cls} {conf[j]:.1f}'
                    annotator.box_label(box, label, color=color)  # 画框
    annotator.im.save(fname)  # save

葬天机

关注

6
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
Yolov5中Detect层的输出转化成图片上的预测框的过程

对输出进行NMS操作，返回的结果out是list，其中的元素是张量，形状为(筛选后的n, x1y1x2y2 + rate + class = 6)，元素个数为bs，n对于每个张量元素各不相同，是从3*(20*20+40*40+80*80)中筛选掉了不符合置信度和iou的预测框得到的，multi_label=True，所以同一个框会有可能多个置信度超过阈值的分类，n也不是该batch内的所有框的数量，而是比它多。对于out先执行了output_to_target()函数，得到的返回结果是一个。
复制链接

扫一扫