【YOLO系列】YOLOv3代码详解(五)：utils.py脚本

江湖小张

于 2023-12-24 21:54:45 发布

阅读量140

点赞数 1

本文链接：https://blog.csdn.net/m0_46489757/article/details/135187731

版权

目标检测专栏收录该内容

20 篇文章 6 订阅

订阅专栏

前言

以下内容仅为个人在学习人工智能中所记录的笔记，先将目标识别算法yolo系列的整理出来分享给大家，供大家学习参考。

本文仅对YOLOV3代码中关键部分进行了注释，未掌握基础代码的铁汁可以自己百度一下。

若文中内容有误，希望大家批评指正。

资料下载

YOLOV3论文下载地址：YOLOv3：An Incremental Improvement

回顾

YOLO V1：【YOLO系列】YOLO V1论文思想详解

YOLO V2：【YOLO系列】YOLO V2论文思想详解

YOLO V3：【YOLO系列】 YOLOv3论文思想详解

项目地址

YOLOV3 keras版本：下载地址

YOLOV3 Tensorflow版本：下载地址

YOLOV3 Pytorch版本：下载地址

Gitee仓库

YOLOV3 各版本：yolov3各版本

YOLO V3代码详解

YOLO V3代码详解（一）：【YOLO系列】YOLOv3代码详解(一)：主脚本yolo_video.py

YOLO V3代码详解（二）：【YOLO系列】YOLOv3代码详解(二)：检测脚本yolo.py

YOLO V3代码详解（三）：【YOLO系列】YOLOv3代码详解(三)：训练脚本train.py

YOLO V3代码详解（四）：【YOLO系列】YOLOv3代码详解(四)：模型脚本model.py

本文主要基于keras版本进行讲解

话不多说，直接上代码

一、代码详解

1、定义递归操作函数

将funcs中的元素进行某种累积或合并

def compose(*funcs):
    """Compose arbitrarily many functions, evaluated left to right.

    Reference: https://mathieularose.com/function-composition-in-python/
    """
    # return lambda x: reduce(lambda v, f: f(v), funcs, x)
    # 递归操作，将funcs中的元素进行某种累积或合并
    if funcs:
        return reduce(lambda f, g: lambda *a, **kw: g(f(*a, **kw)), funcs)
    else:
        raise ValueError('Composition of empty sequence not supported.')

2、输入图片的尺寸处理

（1）将输入的图片按最长边确定一个比例，然后按比例缩放（采样方法：BICUBIC）图片；

（2）再生成一个用“绝对灰”R128-G128-B128填充的416x416新图片后将缩放后的输入图片粘贴上去，粘贴不到的部分保留为灰色。

def letterbox_image(image, size):
    """resize image with unchanged aspect ratio using padding"""
    # 将输入的图片按最长边确定一个比例，然后按比例缩放（采样方法：BICUBIC）图片
    # 再生成一个用“绝对灰”R128-G128-B128填充的416x416新图片后将缩放后的输入图片粘贴上去，粘贴不到的部分保留为灰色
    iw, ih = image.size
    w, h = size
    # 选择一个最长边作为缩放比例
    scale = min(w/iw, h/ih)
    nw = int(iw*scale)
    nh = int(ih*scale)

    image = image.resize((nw, nh), Image.BICUBIC)
    new_image = Image.new('RGB', size, (128, 128, 128))
    new_image.paste(image, ((w-nw)//2, (h-nh)//2))
    return new_image

3、定义get_random_data()函数

实现图片数据增强

def rand(a=0, b=1):
    # 获得范围为[a,b]的随机数
    return np.random.rand()*(b-a) + a


def get_random_data(annotation_line, input_shape, random=True, max_boxes=20, jitter=.3, hue=.1, sat=1.5, val=1.5, proc_img=True):
    """
    random preprocessing for real-time data augmentation
    通过随机缩放的方式调整图片的尺寸至(416, 416)，随机改变图片的RGB、翻转方式来实现图片数据增强
    """
    line = annotation_line.split()
    image = Image.open(line[0])   # 打开需要训练的图片
    iw, ih = image.size     # 获取图片的宽度与高度
    h, w = input_shape     # (416, 416)
    # 获取Gound Truth框
    box = np.array([np.array(list(map(int, box.split(',')))) for box in line[1:]])

    if not random:
        # 这里是将输入的图片大小调整为(416, 416),方法与yolo.py中提到的一致
        scale = min(w/iw, h/ih)
        nw = int(iw*scale)
        nh = int(ih*scale)
        dx = (w-nw)//2
        dy = (h-nh)//2
        image_data = 0
        if proc_img:
            image = image.resize((nw, nh), Image.BICUBIC)
            new_image = Image.new('RGB', (w, h), (128, 128, 128))
            new_image.paste(image, (dx, dy))
            image_data = np.array(new_image)/255.

        # correct boxes
        # 图片大小被改变后,Gound Truth框也需要调整
        box_data = np.zeros((max_boxes, 5))
        if len(box) > 0:
            np.random.shuffle(box)
            if len(box) > max_boxes: box = box[:max_boxes]
            box[:, [0, 2]] = box[:, [0, 2]]*scale + dx
            box[:, [1, 3]] = box[:, [1, 3]]*scale + dy
            box_data[:len(box)] = box

        return image_data, box_data

    # resize image
    # 随机得到一个新的比例,来缩放图片的尺寸
    # new_ar= w/h * rand(0.7, 1.3)/rand(0.7, 1.3)，rand(0.7, 1.3)=np.random.rand()*(1.3-0.7) + 0.7=[0.7,1.3]
    # new_ar>1说明w>h，new_ar<1说明w<h
    new_ar = w/h * rand(1-jitter, 1+jitter)/rand(1-jitter, 1+jitter)
    scale = rand(.25, 2)
    if new_ar < 1:
        nh = int(scale*h)
        nw = int(nh*new_ar)
    else:
        nw = int(scale*w)
        nh = int(nw/new_ar)
    image = image.resize((nw, nh), Image.BICUBIC)

    # place image
    # 生成一个灰度的(416, 416)图片，再将缩放后的输入图片粘贴到这个灰色图片上，粘贴位置随机生成
    dx = int(rand(0, w-nw))
    dy = int(rand(0, h-nh))
    new_image = Image.new('RGB', (w, h), (128, 128, 128))
    new_image.paste(image, (dx, dy))
    image = new_image

    # flip image or not
    # 随机翻转图片,左边与右边翻转
    flip = rand() < .5
    if flip: image = image.transpose(Image.FLIP_LEFT_RIGHT)

    # distort image
    # 随机生成hsv值
    hue = rand(-hue, hue)
    sat = rand(1, sat) if rand() < .5 else 1/rand(1, sat)
    val = rand(1, val) if rand() < .5 else 1/rand(1, val)
    # 将图片的RGB值转成hsv值
    x = rgb_to_hsv(np.array(image)/255.)
    # 改变图片的hsv值
    x[..., 0] += hue
    x[..., 0][x[..., 0] > 1] -= 1
    x[..., 0][x[..., 0] < 0] += 1
    x[..., 1] *= sat
    x[..., 2] *= val
    # 由于hsv值范围为[0, 1],因此将超出这个范围的值调整至范围内
    x[x > 1] = 1
    x[x < 0] = 0
    # 在将hsv值转成RGB
    image_data = hsv_to_rgb(x)  # numpy array, 0 to 1

    # correct boxes
    # 图片大小、方向被改变后,Gound Truth框也需要调整
    box_data = np.zeros((max_boxes, 5))
    if len(box) > 0:
        # 移动Gound Truth框
        np.random.shuffle(box)
        box[:, [0, 2]] = box[:, [0, 2]]*nw/iw + dx
        box[:, [1, 3]] = box[:, [1, 3]]*nh/ih + dy
        # 如果图片翻转，重新计算Gound Truth框x值，
        if flip: box[:, [0, 2]] = w - box[:, [2, 0]]
        # 判断调整后的Gound Truth框是否超出边界
        box[:, 0:2][box[:, 0:2] < 0] = 0
        box[:, 2][box[:, 2] > w] = w
        box[:, 3][box[:, 3] > h] = h
        # 计算调整后的GT框的高和宽
        box_w = box[:, 2] - box[:, 0]
        box_h = box[:, 3] - box[:, 1]
        # 删除box_w或者box_h<1的框
        box = box[np.logical_and(box_w > 1, box_h > 1)]  # discard invalid box
        if len(box) > max_boxes: box = box[:max_boxes]
        box_data[:len(box)] = box

    return image_data, box_data