物体检测-系列教程16：YOLOV5 源码解析6（马赛克数据增强函数load_mosaic）

最新推荐文章于 2024-04-08 15:15:52 发布

机器学习杨卓越

最新推荐文章于 2024-04-08 15:15:52 发布

阅读量1.4k

点赞数 17

分类专栏： yolo物体检测系列文章标签： YOLO 深度学习人工智能计算机视觉目标检测 pytorch

本文链接：https://blog.csdn.net/weixin_50592077/article/details/136255244

版权

yolo物体检测系列专栏收录该内容

28 篇文章 9 订阅

订阅专栏

😎😎😎物体检测-系列教程总目录

有任何问题欢迎在下面留言
本篇文章的代码运行界面均在Pycharm中进行
本篇文章配套的代码资源已经上传
点我下载源码

9、load_mosaic函数

Mosaic（马赛克）数据增强：将四张不同的图像拼接成一张大图像来增加场景的复杂性和多样性

9.1 load_mosaic函数

def load_mosaic(self, index):
    labels4, segments4 = [], []
    s = self.img_size
    yc, xc = [int(random.uniform(-x, 2 * s + x)) for x in self.mosaic_border]  # mosaic center x, y
    indices = [index] + random.choices(self.indices, k=3)  # 3 additional image indices
    for i, index in enumerate(indices):
        img, _, (h, w) = load_image(self, index)
        if i == 0:  # top left
            img4 = np.full((s * 2, s * 2, img.shape[2]), 114, dtype=np.uint8)
            x1a, y1a, x2a, y2a = max(xc - w, 0), max(yc - h, 0), xc, yc
            x1b, y1b, x2b, y2b = w - (x2a - x1a), h - (y2a - y1a), w, h
        elif i == 1:  # top right
            x1a, y1a, x2a, y2a = xc, max(yc - h, 0), min(xc + w, s * 2), yc
            x1b, y1b, x2b, y2b = 0, h - (y2a - y1a), min(w, x2a - x1a), h
        elif i == 2:  # bottom left
            x1a, y1a, x2a, y2a = max(xc - w, 0), yc, xc, min(s * 2, yc + h)
            x1b, y1b, x2b, y2b = w - (x2a - x1a), 0, w, min(y2a - y1a, h)
        elif i == 3:  # bottom right
            x1a, y1a, x2a, y2a = xc, yc, min(xc + w, s * 2), min(s * 2, yc + h)
            x1b, y1b, x2b, y2b = 0, 0, min(w, x2a - x1a), min(y2a - y1a, h)
        img4[y1a:y2a, x1a:x2a] = img[y1b:y2b, x1b:x2b]
        padw = x1a - x1b
        padh = y1a - y1b
        labels, segments = self.labels[index].copy(), self.segments[index].copy()
        if labels.size:
            labels[:, 1:] = xywhn2xyxy(labels[:, 1:], w, h, padw, padh)
            segments = [xyn2xy(x, w, h, padw, padh) for x in segments]
        labels4.append(labels)
        segments4.extend(segments)
    labels4 = np.concatenate(labels4, 0)
    for x in (labels4[:, 1:], *segments4):
        np.clip(x, 0, 2 * s, out=x)
    img4, labels4 = random_perspective(img4, labels4, segments4,
                                       degrees=self.hyp['degrees'],
                                       translate=self.hyp['translate'],
                                       scale=self.hyp['scale'],
                                       shear=self.hyp['shear'],
                                       perspective=self.hyp['perspective'],
                                       border=self.mosaic_border)
    return img4, labels4

定义函数，接受索引为参数
labels4, segments4，存储拼接后图像的标签和分割信息
s，获取单张图像的目标大小
yc, xc，计算马赛克图像中心点的坐标，但是这个中心点坐标是在一个确定的范围内随机产生的，4张图像可能会相互覆盖，超出边界的会进行裁剪
indices ，随机选择另外三个图像的索引，组成一个列表indices
现在indices 是一个包含4个图像索引的list，遍历这个list

依次遍历计算4张图像的位置坐标和裁剪的区域，构建大图像：（初始化一个大图，计算当前小图像放在大图中什么位置，计算当前小图像取哪一部分放在大图中，可能有些图像大小不足以放到哪个区域就用114填充，如果图像和标签越界了，越界的图像就不要了，越界的框也要修正一下）

img, _, (h, w)，通过当前遍历的索引使用load_image函数加载图像，返回加载后的图像与长宽
如果是第1张图像，即top left左上角：
创建一个大小为(s * 2, s * 2)，通道数与img相同，所有像素值全部为114的大图像
计算第1张图像在马赛克图像中的位置坐标
计算需要从第1张图像中裁剪的区域
如果是第2张图像，即top right右上角：
计算第2张图像在马赛克图像中的位置坐标
计算需要从第2张图像中裁剪的区域
如果是第3张图像，即bottom left左下角：
计算第3张图像在马赛克图像中的位置坐标
计算需要从第3张图像中裁剪的区域
如果是第4张图像，即bottom right右下角：
计算第4张图像在马赛克图像中的位置坐标
计算需要从第4张图像中裁剪的区域
将当前图像进行裁剪后放回大图像中
padw ，计算水平方向上的填充量
padh ，计算垂直方向上的填充量
复制当前图像索引对应的标签和分割信息
如果当前图像有标签：
将标签从归一化的xywh格式使用xywhn2xyxy函数转换为像素级的xyxy格式，并考虑填充调整
对分割信息使用xyn2xy函数进行同样的转换和调整
将当前图像的标签添加到labels4列表中
将当前图像的分割信息添加到segments4列表中
labels4 ，将所有图像的标签合并成一个ndarray
遍历所有标签和分割信息的坐标，准备进行裁剪
使用np.clip函数限制坐标值不超出马赛克图像的范围

做完大图后，可以再对大图进行一些数据增强操作（这里使用的是辅助函数），也有先对小图像进行数据增强后再拼成大图像

对马赛克图像及其标签使用random_perspective函数应用随机透视变换，以进行进一步的数据增强
返回马赛克图像和对应的标签

9.2 load_image函数

def load_image(self, index):
    # loads 1 image from dataset, returns img, original hw, resized hw
    img = self.imgs[index]
    if img is None:  # not cached
        path = self.img_files[index]
        img = cv2.imread(path)  # BGR
        assert img is not None, 'Image Not Found ' + path
        h0, w0 = img.shape[:2]  # orig hw
        r = self.img_size / max(h0, w0)  # resize image to img_size
        if r != 1:  # always resize down, only resize up if training with augmentation
            interp = cv2.INTER_AREA if r < 1 and not self.augment else cv2.INTER_LINEAR
            img = cv2.resize(img, (int(w0 * r), int(h0 * r)), interpolation=interp)
        return img, (h0, w0), img.shape[:2]  # img, hw_original, hw_resized
    else:
        return self.imgs[index], self.img_hw0[index], self.img_hw[index]  # img, hw_original, hw_resized

9.3 xywhn2xyxy函数

def xywhn2xyxy(x, w=640, h=640, padw=0, padh=0):
    # Convert nx4 boxes from [x, y, w, h] normalized to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right
    y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)
    y[:, 0] = w * (x[:, 0] - x[:, 2] / 2) + padw  # top left x
    y[:, 1] = h * (x[:, 1] - x[:, 3] / 2) + padh  # top left y
    y[:, 2] = w * (x[:, 0] + x[:, 2] / 2) + padw  # bottom right x
    y[:, 3] = h * (x[:, 1] + x[:, 3] / 2) + padh  # bottom right y
    return y

9.4 xywhn2xyxy函数

def xyn2xy(x, w=640, h=640, padw=0, padh=0):
    # Convert normalized segments into pixel segments, shape (n,2)
    y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)
    y[:, 0] = w * x[:, 0] + padw  # top left x
    y[:, 1] = h * x[:, 1] + padh  # top left y
    return y

9.5 random_perspective函数

def random_perspective(img, targets=(), segments=(), degrees=10, translate=.1, scale=.1, shear=10, perspective=0.0,
                       border=(0, 0)):
    height = img.shape[0] + border[0] * 2  # shape(h,w,c)
    width = img.shape[1] + border[1] * 2
    C = np.eye(3)
    C[0, 2] = -img.shape[1] / 2  # x translation (pixels)
    C[1, 2] = -img.shape[0] / 2  # y translation (pixels)
    P = np.eye(3)
    P[2, 0] = random.uniform(-perspective, perspective)  # x perspective (about y)
    P[2, 1] = random.uniform(-perspective, perspective)  # y perspective (about x)
    R = np.eye(3)
    a = random.uniform(-degrees, degrees)
    s = random.uniform(1 - scale, 1 + scale)
    R[:2] = cv2.getRotationMatrix2D(angle=a, center=(0, 0), scale=s)
    S = np.eye(3)
    S[0, 1] = math.tan(random.uniform(-shear, shear) * math.pi / 180)  # x shear (deg)
    S[1, 0] = math.tan(random.uniform(-shear, shear) * math.pi / 180)  # y shear (deg)
    T = np.eye(3)
    T[0, 2] = random.uniform(0.5 - translate, 0.5 + translate) * width  # x translation (pixels)
    T[1, 2] = random.uniform(0.5 - translate, 0.5 + translate) * height  # y translation (pixels)
    M = T @ S @ R @ P @ C  # order of operations (right to left) is IMPORTANT
    if (border[0] != 0) or (border[1] != 0) or (M != np.eye(3)).any():  # image changed
        if perspective:
            img = cv2.warpPerspective(img, M, dsize=(width, height), borderValue=(114, 114, 114))
        else:  # affine
            img = cv2.warpAffine(img, M[:2], dsize=(width, height), borderValue=(114, 114, 114))
    n = len(targets)
    if n:
        use_segments = any(x.any() for x in segments)
        new = np.zeros((n, 4))
        if use_segments:  # warp segments
            segments = resample_segments(segments)  # upsample
            for i, segment in enumerate(segments):
                xy = np.ones((len(segment), 3))
                xy[:, :2] = segment
                xy = xy @ M.T  # transform
                xy = xy[:, :2] / xy[:, 2:3] if perspective else xy[:, :2]  # perspective rescale or affine
                new[i] = segment2box(xy, width, height)
        else:  # warp boxes
            xy = np.ones((n * 4, 3))
            xy[:, :2] = targets[:, [1, 2, 3, 4, 1, 4, 3, 2]].reshape(n * 4, 2)  # x1y1, x2y2, x1y2, x2y1
            xy = xy @ M.T  # transform
            xy = (xy[:, :2] / xy[:, 2:3] if perspective else xy[:, :2]).reshape(n, 8)  # perspective rescale or affine
            x = xy[:, [0, 2, 4, 6]]
            y = xy[:, [1, 3, 5, 7]]
            new = np.concatenate((x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T
            new[:, [0, 2]] = new[:, [0, 2]].clip(0, width)
            new[:, [1, 3]] = new[:, [1, 3]].clip(0, height)
        i = box_candidates(box1=targets[:, 1:5].T * s, box2=new.T, area_thr=0.01 if use_segments else 0.10)
        targets = targets[i]
        targets[:, 1:5] = new[i]
    return img, targets