Yolov5-7.0：mosaic数据增强原理解析

码农市民小刘

已于 2024-09-05 12:10:12 修改

阅读量996

点赞数 13

文章标签： YOLO python

于 2024-08-28 16:53:50 首次发布

本文链接：https://blog.csdn.net/weixin_47285222/article/details/141595765

版权

1、mosaic4增强介绍

通过代码和展示图的方式，介绍yolov5中的 mosaic4 数据增强机制。希望大家能以形象、简洁的方式，认识和了解 mosaic4 的实现机制。

yolov5中实现 mosaic4 增强的代码，在utils->dataloaders->load_mosaic函数中。

1.1、load_mosaic

分步详解load_mosaic函数。

# 创建两个空列表
labels4, segments4 = [], []

# 获取设置图片大小的参数s
s = self.img_size

# 数据增强后图片的基准点
yc, xc = (int(random.uniform(-x, 2 * s + x)) for x in self.mosaic_border)

设置图像大小是输入的参数（默认640），即图像缩放处理的标准，而不是每张图真正的size。

self.mosaic_border = (- s/2，- s/2)，表示设置图像一半大小的负值，默认（-320，-320）。

random.uniform（min,max）在指定区间[min,max]内，生成一个随机数。

(xc，yc ) 表示背景图（将四张图拼接到一个指定背景图中）的中心点，后面会详解。这个中心点是随机生成的，并不是传统意义上的中心点，范围在(0.5*s，1.5*s)，默认是[320，960]。

#原列表 + 随机挑选3个索引值，以列表形式返回
indices = [index] + random.choices(self.indices, k=3)

#打乱列表中元素的顺序  
random.shuffle(indices)

self.indices = [ 0,1,2,....n] ，整数序列，n表示训练图像总数量。这里表示用于检索所有图像的索引总列表。

random.choices，从指定序列中，随机选取k个元素，以列表形式返回。表示从索引总列表中，随机选取3个索引值，即用于4拼接的其他三张图的索引值，加上原索引值index，形成一个包含4个索引值的索引子列表。

random.shuffle，打乱序列顺序。表示随机打乱索引子列表中的索引值。

for i, index in enumerate(indices):
    img, _, (h, w) = self.load_image(index)

for循环依次遍历4个索引值，在用索引值加载图像，load_image函数的代码实现放在了1.2小节。

加载后的参数，img是通过cv2方式读取的BGR格式图像，h和w分别表示img的高和宽。

    # place img in img4
    if i == 0: 
       #生成背景大图，像素值为114的画布
       img4 = np.full((s * 2, s * 2, img.shape[2]), 114, dtype=np.uint8)

遍历第一张图像，首先生成背景图，也叫画布。

np.full(shape，fill_value)，创建一个指定形状和填充值的数组。这里表示初始化一个，大小为（2*s，2*s，通道），填充值为114的数组，作为画布。

画布以中心点为基准，划分成四象限，第几象限内放置第几张图像。前面提到画布的中心点，是在(0.5*s，1.5*s)区间内随机生成的，这样做大概率四象限不会被等分。

       # top left ，放置到画布中的位置
       x1a, y1a, x2a, y2a = max(xc - w, 0), max(yc - h, 0), xc, yc

       #原图那些区域要贴入画布中
       x1b, y1b, x2b, y2b = w - (x2a - x1a), h - (y2a - y1a), w, h

第一张图，放在画布的第一象限内，图像右下角与中心点重合。

x1a, y1a, x2a, y2a，画布坐标系下，矩形框的左上角点与右下角点，表示图像要贴在画布中的位置，且控制图像左上角点不能越出象限外。

x1b, y1b, x2b, y2b，在原图坐标系中，确定的矩形框坐标，表示要贴进画布的图像区域，因为有可能不是整图贴入。如果原图的长宽h,w，分别都小于画布中心点坐标（xc,yc），则将整图贴进去；如果大于，则从图像左上角裁剪掉多余部分，然后贴于画布中。

下面四张图展示了可能会出现的任意一种情况，w<xc且h<yc，w>xc且h>yc，h>yc，w>xc

    # top right
    elif i == 1:
        x1a, y1a, x2a, y2a = xc, max(yc - h, 0), min(xc + w, s * 2), yc
        x1b, y1b, x2b, y2b = 0, h - (y2a - y1a), min(w, x2a - x1a), h

第二张图像，放在画布的第二象限内，图像的左下角点与中心点重合。

类比第一张，如果图像大小超过了第二象限范围，则从图像右上角截掉多余部分，示例图如下，黄色x表示截掉部分，红色框以内保留，贴入画布中。

    # bottom left
    elif i == 2:
        x1a, y1a, x2a, y2a = max(xc - w, 0), yc, xc, min(s * 2, yc + h)
        x1b, y1b, x2b, y2b = w - (x2a - x1a), 0, w, min(y2a - y1a, h)

第三张图像，放在画布的第三象限内，图像的右上角与中心点重合。

    # bottom right
    elif i == 3:
        x1a, y1a, x2a, y2a = xc, yc, min(xc + w, s * 2), min(s * 2, yc + h)
        x1b, y1b, x2b, y2b = 0, 0, min(w, x2a - x1a), min(y2a - y1a, h)

第四张图像，放在画布的第四象限内，图像的左上角与中心点重合。

至此，四张图全部归位完毕。理论上，四张图并不需要都被裁剪，因为原图在加载时都已经resize了，最长边为 s，画布中心点虽然是随机的，但画布尺寸（2s，2s）。这样做，无论左右看，还是上下看，一张越界了，另一张一定不会。

总体来看，这样做非常具有随机性，随机挑选要拼接的图像，随机中心点，即拼接时随机一到两张不完整的图像（被裁掉）。

    # 图像填充画布
    img4[y1a:y2a, x1a:x2a] = img[y1b:y2b, x1b:x2b]

    #画布中图像左上角点坐标，与图像左上角点坐标的差值
    padw = x1a - x1b
    padh = y1a - y1b

img4是画布，将图像（裁剪后）贴入画布中。

padw和padh分别指，原图左上角点的横纵坐标，换算到画布坐标系下的横纵坐标偏移量。

    # 加载lable
    labels, segments = self.labels[index].copy(), self.segments[index].copy()    
    # 如果有标签
    if labels.size:
        # 原图像白标签转换为画布标签
        labels[:, 1:] = xywhn2xyxy(labels[:, 1:], w, h, padw, padh)
        segments = [xyn2xy(x, w, h, padw, padh) for x in segments]
    # 每一张图的标签都要添加到列表中
    labels4.append(labels)
    segments4.extend(segments)

加载标签，更新标签，如何更新？将原图坐标换算到画布坐标，具体实现的代码在1.3小节。

换算后，添加到 labels4 总列表中。

 labels4 = np.concatenate(labels4, 0)
 for x in (labels4[:, 1:], *segments4):
 np.clip(x, 0, 2 * s, out=x)

np.concatenate，是 NumPy 中用于沿指定轴连接多个数组的函数。

循环遍历4张图后，合并四张图的所有标签。这里还做了个切片处理np.clip，将标签的坐标值，限定在[0，2*s]区域内，即标签坐标不能越过画布。

从图像拼接的角度理解，对于一些需要裁掉部分内容的图像，且裁掉的内容中包含前景目标，这样在做标签坐标换算时，裁掉目标的画布标签要么是负数，要么是大于2s的，总归是越界的。所以，对于拼接后越界的目标，标签也需要被裁剪，目标全部越界，则直接去掉目标框，部分越界，则标出画布内目标的框，类似于被遮挡的物体，只标注显示出来的图像。

img4, labels4 = random_perspective(img4,
                                   labels4,
                                   segments4,
                                   degrees=self.hyp['degrees'],
                                   translate=self.hyp['translate'],
                                   scale=self.hyp['scale'],
                                   shear=self.hyp['shear'],
                                   perspective=self.hyp['perspective'],
                                   border=self.mosaic_border)  # border to remove

拼接图片（填充后的画布），标签都已准备好。接下来对画布做增强处理。具体实现在1.4小节。

1.2、load_image

根据索引加载图像，并做 resize 处理。

def load_image(self, i):
    im, f, fn = self.ims[i], self.im_files[i], self.npy_files[i],
    #如果为空
    if im is None:
        #如果npy文件存在
        if fn.exists():
            #numpy加载npy文件
            im = np.load(fn)
        else:
            #如果npy文件不存在，直接读取图片，参数图片路径
            im = cv2.imread(f)
            assert im is not None, f'Image Not Found {f}'
        #图片高和宽
        h0, w0 = im.shape[:2] 
        #图片标准大小/真实图片宽高值中较大的值,作为比例
        r = self.img_size / max(h0, w0) 
        #如果比例不等于1
        if r != 1:
            #如果数据增强或者比例大于1，线性插值，否则缩小图像
            interp = cv2.INTER_LINEAR if (self.augment or r > 1) else cv2.INTER_AREA
            #原图重置大小
            im = cv2.resize(im, (int(w0 * r), int(h0 * r)), interpolation=interp)
        #返回值，缩放后图片，原图片宽高，图片缩放后的宽高
        return im, (h0, w0), im.shape[:2]
    return self.ims[i], self.im_hw0[i], self.im_hw[i]

im：图片；f：图片路径；fn：npy文件路径。这里分三种方式，默认是通过图片路径去读取图片。

这里详细说下如何 resize 原图？以图像最长边为基准，等比例缩放，具体实现步骤如下：

r = min(640/h，640/w)，以矩形的最长边为基准，计算缩放比例 r 。
new_h = r*h ；new_w = r*w，缩放后图像的分辨率，等比例缩放，最长边默认为640。

等比例缩放的好处是，目标不会产生形变，尤其针对一些长宽比例失衡的图像。经过上面的处理，所有图像的最长边控制为640大小，短边同比例缩放即可。

1.3、xywhn2xyxy

原图坐标标签，换算到画布坐标标签。

def xywhn2xyxy(x, w=640, h=640, padw=0, padh=0):
    # Convert nx4 boxes from [x, y, w, h] normalized to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right
    y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)
    y[:, 0] = w * (x[:, 0] - x[:, 2] / 2) + padw  # top left x
    y[:, 1] = h * (x[:, 1] - x[:, 3] / 2) + padh  # top left y
    y[:, 2] = w * (x[:, 0] + x[:, 2] / 2) + padw  # bottom right x
    y[:, 3] = h * (x[:, 1] + x[:, 3] / 2) + padh  # bottom right y
    return y

yolo中目标框的坐标标签，是根据图像大小进行归一化的，且是（x_center，y_center，w，h）这样的格式，即由中心点、宽、高确定的矩形框。

首先通过下面公式，将矩形框的表达格式，转换为两点表达式（左上角点和右下角点）。

x1 = x_center - w/2

y1 = y_center - h/2

x2 = x_center + w/2

y2 = y_center + h/2

分别乘以宽和高，从归一化坐标恢复成像素点坐标，加上偏移量，转换到画布坐标。

x1 = w * x1 + padw

y1 = h * y1 + padh

x2 = w * x2 + padw

y2 = h * y2 + padh

2、mosaic9增强介绍

在整体了解了 mosaic4 的实现机制后，mosaic9 中的许多内容是一样的。这里主要介绍下，如何在画布中随机排列9张图。

for i, index in enumerate(indices):
    img, _, (h, w) = self.load_image(index)
    if i == 0:  # center
        img9 = np.full((s * 3, s * 3, img.shape[2]), 114, dtype=np.uint8)  # base image with 4 tiles
        h0, w0 = h, w
        c = s, s, s + w, s + h  # xmin, ymin, xmax, ymax (base) coordinates
    elif i == 1:  # top
        c = s, s - h, s + w, s
    elif i == 2:  # top right
        c = s + wp, s - h, s + wp + w, s
    elif i == 3:  # right
        c = s + w0, s, s + w0 + w, s + h
    elif i == 4:  # bottom right
        c = s + w0, s + hp, s + w0 + w, s + hp + h
    elif i == 5:  # bottom
        c = s + w0 - w, s + h0, s + w0, s + h0 + h
    elif i == 6:  # bottom left
        c = s + w0 - wp - w, s + h0, s + w0 - wp, s + h0 + h
    elif i == 7:  # left
        c = s - w, s + h0 - h, s, s + h0
    elif i == 8:  # top left
        c = s - w, s + h0 - hp - h, s, s + h0 - hp
   
    padx, pady = c[:2]
    x1, y1, x2, y2 = (max(x, 0) for x in c)
    img9[y1:y2, x1:x2] = img[y1 - pady:, x1 - padx:]
    hp, wp = h, w

遍历到第一张图，创建大小为（3*s，3*s，通道），填充之为114的画布。

以等分的九宫格样式排列，按照中->上->右上->右->右下->下->左下->左->左上的顺序拼接。

下面出现的wn，hn，表示第n张图的宽和高。

第一张图，放中间，左上角点（s，s）,右下角点（s+w1，s+h1）。在加载图片时，依旧对图片做了缩放处理，保证最长边为s，短边<s，而每一格区域大小为（s，s），所以这里不涉及裁剪，原图贴入。

第二张图，正上方，左上角（s，s-h2），右下角（s+w2，s）。

第三张图，右上角，左上角（s+w2，s-h3），右下角（s+w2+w3，s）。

第四张图，右面，左上角（s+w1，s）,右下角（s+w1+w4，s+h4）。

依次类推，完整拼图如下所示，真真是无缝拼接。

yc, xc = (int(random.uniform(0, s)) for _ in self.mosaic_border)
img9 = img9[yc:yc + 2 * s, xc:xc + 2 * s]

你以为到这就完了，没有！！！拼接后的画布，继续做处理。

xc,yc 表示随机生成的画布中心点，中心点取值范围（0，s），即左上格子内。

从第一个格子内随机选取一点，从该点开始，向右下方划拉出来（2*s，2*s）大小的矩形，这才是我们最终要的图像，所以还是变相做了裁剪处理。

示例图中，红色框为最后要保留的img9，红框左上角点为随机选出的中心点，框的大小为2s。

3、mosaic增强展示

如何实现mosaic4或mosaic9处理呢，以及展示拼接后的图像？代码如下，照抄不谢。

1、mosaic类的实例化，需要输入的参数：

img_root：所有图片的存放目录，目录标准格式如下。

img_dir/

img1.jpg

img2.jpg

.....

lable_root：标签的存放目录，我这里是yolo格式的标签（txt文档）,文件目录格式同上。

img_size：每张图像缩放后的尺寸，默认640。

2、执行类方法 img_show，需要输入的参数：

index：要拼接图像的索引值，不能超过图像总数量。

type：4或者9，表示四拼接或九拼接处理。

import argparse
import os
import random
import numpy as np
import cv2
import torch

class mosaic():
    def __init__(self,img_root,label_root,img_size):
        self.img_size = img_size
        self.label_root = label_root
        self.mosaic_border = [-self.img_size/2,-self.img_size/2]
        self.img_name = os.listdir(img_root)
        self.indices = range(len(self.img_name))
        self.im_files = [os.path.join(img_root,x) for x in self.img_name]
        self.labels = []
        self.gener_labels()

    def gener_labels(self):
        for n in self.img_name:
            label = []
            t = n.split('.')[0] + ".txt"
            label_file = os.path.join(self.label_root,t)
            with open(label_file,"r") as f:
                line = f.readlines()
                for l in line:
                    l = l.strip().split(' ')
                    label.append([float(s) for s in l ])
            self.labels.append(np.array(label))

    def load_image(self,index):
        img_file = self.im_files[index]
        im = cv2.imread(img_file)
        h0, w0 = im.shape[:2]
        r = self.img_size / max(h0, w0)
        if r != 1:
            interp = cv2.INTER_LINEAR if (r > 1) else cv2.INTER_AREA
            im = cv2.resize(im, (int(w0 * r), int(h0 * r)), interpolation=interp)
            return im, (h0, w0), im.shape[:2]  # im, hw_original, hw_resized
        return im,_,(h0,w0)

    def xywhn2xyxy(self,label, w, h, padw, padh):
        y = label.clone() if isinstance(label, torch.Tensor) else np.copy(label)
        y[:, 0] = w * (label[:, 0] - label[:, 2] / 2) + padw  # top left x
        y[:, 1] = h * (label[:, 1] - label[:, 3] / 2) + padh  # top left y
        y[:, 2] = w * (label[:, 0] + label[:, 2] / 2) + padw  # bottom right x
        y[:, 3] = h * (label[:, 1] + label[:, 3] / 2) + padh  # bottom right y
        return y

    def box_candidates(self,box1, box2, wh_thr=2, ar_thr=100, area_thr=0.1, eps=1e-16):
        w1, h1 = box1[2] - box1[0], box1[3] - box1[1]
        w2, h2 = box2[2] - box2[0], box2[3] - box2[1]
        ar = np.maximum(w2 / (h2 + eps), h2 / (w2 + eps))  # aspect ratio
        return (w2 > wh_thr) & (h2 > wh_thr) & (w2 * h2 / (w1 * h1 + eps) > area_thr) & (ar < ar_thr)  # candidates

    def load_mosaic4(self, index):
        labels4 = []
        s = self.img_size
        indices = [index] + random.choices(self.indices, k=self.type)
        random.shuffle(indices)
        yc, xc = (int(random.uniform(-x, 2 * s + x)) for x in self.mosaic_border)
        for i, index in enumerate(indices):
            img, _, (h, w) = self.load_image(index)
            if i == 0:  # top left
                img4 = np.full((s * 2, s * 2, img.shape[2]), 114, dtype=np.uint8)  # base image with 4 tiles
                x1a, y1a, x2a, y2a = max(xc - w, 0), max(yc - h, 0), xc, yc  # xmin, ymin, xmax, ymax (large image)
                x1b, y1b, x2b, y2b = w - (x2a - x1a), h - (y2a - y1a), w, h  # xmin, ymin, xmax, ymax (small image)
            elif i == 1:  # top right
                x1a, y1a, x2a, y2a = xc, max(yc - h, 0), min(xc + w, s * 2), yc
                x1b, y1b, x2b, y2b = 0, h - (y2a - y1a), min(w, x2a - x1a), h
            elif i == 2:  # bottom left
                x1a, y1a, x2a, y2a = max(xc - w, 0), yc, xc, min(s * 2, yc + h)
                x1b, y1b, x2b, y2b = w - (x2a - x1a), 0, w, min(y2a - y1a, h)
            elif i == 3:  # bottom right
                x1a, y1a, x2a, y2a = xc, yc, min(xc + w, s * 2), min(s * 2, yc + h)
                x1b, y1b, x2b, y2b = 0, 0, min(w, x2a - x1a), min(y2a - y1a, h)

            img4[y1a:y2a, x1a:x2a] = img[y1b:y2b, x1b:x2b]  # img4[ymin:ymax, xmin:xmax]
            padw = x1a - x1b
            padh = y1a - y1b
            labels = self.labels[index].copy()
            if labels.size:
                labels[:, 1:] = self.xywhn2xyxy(labels[:, 1:], w, h, padw, padh)  # normalized xywh to pixel xyxy format
            labels4.append(labels)
        # Concat/clip labels
        labels4 = np.concatenate(labels4, 0)
        for x in (labels4[:, 1:]):
            np.clip(x, 0, 2 * s, out=x)  # clip when using random_perspective()
        return img4, labels4

    def load_mosaic9(self,index):
        labels9 = []
        s = self.img_size
        indices = [index] + random.choices(self.indices, k=8)  # 8 additional image indices
        random.shuffle(indices)
        hp, wp = -1, -1
        for i, index in enumerate(indices):
            img, _, (h, w) = self.load_image(index)
            if i == 0:  # center
                img9 = np.full((s * 3, s * 3, img.shape[2]), 114, dtype=np.uint8)  # base image with 4 tiles
                h0, w0 = h, w
                c = s, s, s + w, s + h  # xmin, ymin, xmax, ymax (base) coordinates
            elif i == 1:  # top
                c = s, s - h, s + w, s
            elif i == 2:  # top right
                c = s + wp, s - h, s + wp + w, s
            elif i == 3:  # right
                c = s + w0, s, s + w0 + w, s + h
            elif i == 4:  # bottom right
                c = s + w0, s + hp, s + w0 + w, s + hp + h
            elif i == 5:  # bottom
                c = s + w0 - w, s + h0, s + w0, s + h0 + h
            elif i == 6:  # bottom left
                c = s + w0 - wp - w, s + h0, s + w0 - wp, s + h0 + h
            elif i == 7:  # left
                c = s - w, s + h0 - h, s, s + h0
            elif i == 8:  # top left
                c = s - w, s + h0 - hp - h, s, s + h0 - hp
            padx, pady = c[:2]
            x1, y1, x2, y2 = (max(x, 0) for x in c)
            labels = self.labels[index].copy()
            if labels.size:
                labels[:, 1:] = self.xywhn2xyxy(labels[:, 1:], w, h, padx, pady)
            labels9.append(labels)
            img9[y1:y2, x1:x2] = img[y1 - pady:, x1 - padx:]
            hp, wp = h, w
        yc, xc = (int(random.uniform(0, s)) for _ in self.mosaic_border)  # mosaic center x, y
        img9 = img9[yc:yc + 2 * s, xc:xc + 2 * s]
        labels9 = np.concatenate(labels9, 0)
        labels9[:, [1, 3]] -= xc
        labels9[:, [2, 4]] -= yc

        for x in (labels9[:, 1:]):
            np.clip(x, 0, 2 * s, out=x)  # clip when using random_perspective()

        return img9, labels9

    def img_show(self,index,type):
        if type == 4:
            img,label = self.load_mosaic4(index)
        elif type == 9:
            img,label = self.load_mosaic9(index)
        else:
            raise "类型参数有误"
        for box in label.tolist():
            x1,y1,x2,y2 = box[1:]
            img = cv2.rectangle(img,(int(x1),int(y1)),(int(x2),int(y2)),color=(0,0,255),thickness=2)
        cv2.imwrite("zhanshi.jpg",img)
        cv2.imshow("img",img.astype(np.uint8))
        cv2.waitKey(0)


def parse_opt():
    parser = argparse.ArgumentParser()
    parser.add_argument('--img_root', type=str, default='F:\Amode\yolov5-7.0\data\mydata\images')
    parser.add_argument('--lable_root', type=str, default= 'F:\Amode\yolov5-7.0\data\mydata\labels')
    parser.add_argument('--img_size', type=int, default=640, help='train, val image size (pixels)')
    return parser.parse_known_args()[0]

if __name__ == "__main__":
    opt = parse_opt()
    m = mosaic(img_root=opt.img_root,label_root=opt.lable_root,img_size=640)
    m.img_show(0,9)

四拼接的展示图如下，我这里将标签画在了画布中，验证下标签换算是否正确。