图像数据增强（自适应锚框）

最新推荐文章于 2023-10-21 17:12:51 发布

大大大管的笔记本

最新推荐文章于 2023-10-21 17:12:51 发布

阅读量1.1k

点赞数 4

文章标签： python 深度学习目标检测图像处理目标跟踪 Powered by 金山文档

本文链接：https://blog.csdn.net/cg1135217680/article/details/128982550

版权

相信很多小伙伴在训练卷积神经网络的时候，总是感觉自己的数据集太少了。通常能拍几百张照片已经算是很不错了，那么应该怎么扩充我们的数据集呢。方法有很多种，最直接的方法就是再多拍些照片呗（开个玩笑）。当然肯定是还有很多其他的方法的，比如爬虫呀，下载别人的数据集进行提取呀，等等。其中一种就是在已有的数据集的基础上进行数据增强。对于神经网络来说，一张图片有遮挡，没有遮挡，目标物体在左上角，在右上角，图像的清晰程度等等，都是一张需要重新学习的图像。

通常，一张图像的增强会有扭曲、翻转、色域变换、添加噪声、移动图像位置等等。那么对于目标检测来说，这里就会有一个小问题，将一张图像进行增强之后，我确实获取了一张图片，但是我之前标注的锚框没有了，我又需要重新标注了。如果我只生成了几张、几十张照片来说，这些都还是小问题啦，无非就是麻烦一点。但是如果是将原来几百张的数据集进行翻倍呢，那工作量是不是太大了。那么，有没有一种方法不需要重新标注锚框呢，直接运行程序，刷刷刷，就生成了一堆图片和锚框。

方法当然是有的啦。我们对图像进行的数据增强，只有当对图像进行移动，扭曲的时候，锚框会跟着变化。那么只需要在增强的同时变换锚框的位置信息了。这条博客呢，就给出了一种数据增强，并且会自动生成锚框（.mxl文件）的方法。

在运行程序前，需要生成2007_train.txt文件，文件来源可以参考（Bubbliiiing的博客_CSDN博客-神经网络学习小记录,睿智的目标检测,有趣的数据结构算法领域博主）咱们导师博客中的任意一个训练自己数据集的目标检测项目。通常是需要运行voc_annotation.py文件获得。

接下来呢，是在我的数据增强文件中提到的数据增强方法，流程分别为：裁剪---添加高斯噪声---随即遮挡---对图像进行缩放并且进行长和宽的扭曲---移动图像---翻转图像---色域扭曲。以上流程，均为随机生成。以下为随机裁剪的代码：（以下代码中iw，ih分别为原始图像的大小，nw，nh为新图像的大小）

# 随机裁剪
image = np.asarray(image)
x_min, y_min = min(box[:, 0]), min(box[:, 1])  # 获取box中xy最小值和最大值
x_max, y_max = max(box[:, 2]), max(box[:, 3])
rx_min, ry_min = abs(int(rand(0, x_min))), abs(int(rand(0, y_min)))
rx_max, ry_max = abs(int(rand(0, ih - x_max))), abs(int(rand(0, iw - y_max)))
image = image[ry_min:(y_max + ry_max), rx_min: (x_max + rx_max)]
image = Image.fromarray(image)

随机裁剪需要在各个锚框的范围以外进行裁剪，避免裁剪到锚框。（这里可能会有小伙伴有疑问，裁剪到锚框，行不行呢。其实也可以，但是不好把握裁剪的范围，裁剪多了，人也很难分辨出物体的）

裁剪之后，一定要对锚框位置进行修改。这一点很容易，只需要减去，随机生成的xy的最小值（也就是裁剪后图像左上角在原来图像中的坐标），代码段：

# 随机裁剪后，改变box中的值
box[:, 0] = box[:, 0] - rx_min
box[:, 1] = box[:, 1] - ry_min
box[:, 2] = box[:, 2] - rx_min
box[:, 3] = box[:, 3] - ry_min
iw, ih = image.size  # 更新图像大小

接下来的操作都是在裁剪后的图像上进行的。下面这段代码是用来添加高斯噪声的。这里也可以使用其他噪声。代码：

# 添加高斯噪声
image = np.asarray(image)/255  # 图片灰度标准化
mean = rand()*0.5
sigma = rand()*0.5
noise = np.random.normal(mean, sigma, image.shape).astype(dtype=np.float32)  # 产生高斯噪声
output = image + noise
output = np.clip(output, 0, 1)
image = np.uint8(output*255)
image = Image.fromarray(image)

随机遮挡的代码：

# 随机遮挡
n = int(rand(0, 5))
for num in range(n):
    n_image = image.copy()
    color_r = int(rand(0, 255))
    color_g = int(rand(0, 255))
    color_b = int(rand(0, 255))
    img = Image.new('RGB', (iw, ih), (color_r, color_g, color_b))
    # i，j为遮挡块大小
    i = int(rand(0, 100))
    j = int(rand(0, 100))
    img2 = img.crop((0, 0, i, j))
    y1 = int(rand(0, iw - 180))
    y2 = int(rand(0, ih - 180))
    n_image.paste(img2, (y1, y2))
    image = n_image

接下来的操作，需要对裁剪后图像进行缩放和长和宽方向上进行扭曲，代码：

# 对图像进行缩放并且进行长和宽的扭曲
new_ar = w / h * rand(1 - jitter, 1 + jitter) / rand(1 - jitter, 1 + jitter)
scale = rand(.2, 3)
if new_ar < 1:
    nh = int(scale * h)
    nw = int(nh * new_ar)
else:
    nw = int(scale * w)
    nh = int(nw / new_ar)
if nh >= nw:
    image = image.resize((int(nw * w / nh), h), Image.BICUBIC)  # 重新设定图片的长和宽
else:
    image = image.resize((w, int(nh * h / nw)), Image.BICUBIC)  # 重新设定图片的长和宽

下面是移动图像的代码，需要注意的是移动图像时，重新生成的锚框的最小值不能小于0，最大值不能大于设定图像的大小。代码：

# 将图像多余的部分加上灰条
while True:
    dx = int(rand(0, w - nw))
    dy = int(rand(0, h - nh))
    if nh >= nw:
        xmax = max(box[:, 2]) * int(nw * w / nh) / iw + dx  # 获取新的真实框的x最大值
        ymax = max(box[:, 3]) * h / ih + dy  # 获取新的真实框的y最大值
        xmin = min(box[:, 0]) * int(nw * w / nh) / iw + dx  # 获取新的真实框的x最小值
        ymin = min(box[:, 1]) * h / ih + dy  # 获取新的真实框的y最小值
    else:
        xmax = max(box[:, 2]) * w / iw + dx  # 获取新的真实框的x最大值
        ymax = max(box[:, 3]) * int(nh * w / nw) / ih + dy  # 获取新的真实框的y最大值
        xmin = min(box[:, 0]) * w / iw + dx  # 获取新的真实框的x最小值
        ymin = min(box[:, 1]) * int(nh * w / nw) / ih + dy  # 获取新的真实框的y最小值
    if xmax <= w and ymax <= h and xmin >= 0 and ymin >= 0:
        break
new_image = Image.new('RGB', (w, h), (128, 128, 128))
new_image.paste(image, (dx, dy))
image = new_image

翻转和色域扭曲就比较简单了，两个就写到一起了。但是如果翻转了，不要忘记后面改变锚框的位置哈。代码：

# 翻转图像
flip = rand() < .5
if flip:
    image = image.transpose(Image.FLIP_LEFT_RIGHT)

# 色域扭曲
hue = rand(-hue, hue)
sat = rand(1, sat) if rand() < .5 else 1 / rand(1, sat)
val = rand(1, val) if rand() < .5 else 1 / rand(1, val)
x = rgb_to_hsv(np.array(image) / 255.)
x[..., 0] += hue
x[..., 0][x[..., 0] > 1] -= 1
x[..., 0][x[..., 0] < 0] += 1
x[..., 1] *= sat
x[..., 2] *= val
x[x > 1] = 1
x[x < 0] = 0
image_data = hsv_to_rgb(x)  # numpy array, 0 to 1

下面是完整代码。有几处需要修改路径，第157行（打开2007_train.txt的位置）、175行（生成.xml文件中位置信息）、210行（保存增强后图像的位置）。

from PIL import Image, ImageOps
import numpy as np
from matplotlib.colors import rgb_to_hsv, hsv_to_rgb


def rand(a=0, b=1):  # 随机生成一个数
    return np.random.rand() * (b - a) + a


def get_random_data(annotation_line, input_shape, random=True, max_boxes=20, jitter=.5, hue=.1, sat=1.5, val=1.5,
                    proc_img=True):
    '''random preprocessing for real-time data augmentation'''
    line = annotation_line.split()
    image = Image.open(line[0])  # 打开图片
    image = ImageOps.exif_transpose(image)
    img_name = line[0].split('/')
    img_name = img_name[-1].split('.')
    img_name = img_name[0]  # 获取图片名
    iw, ih = image.size  # 获取图片宽高
    h, w = input_shape  # 设定图片的大小
    box = np.array([np.array(list(map(int, box.split(',')))) for box in line[1:]])  # 图片中所有锚框值

    # 随机裁剪
    image = np.asarray(image)
    x_min, y_min = min(box[:, 0]), min(box[:, 1])  # 获取box中xy最小值和最大值
    x_max, y_max = max(box[:, 2]), max(box[:, 3])
    rx_min, ry_min = abs(int(rand(0, x_min))), abs(int(rand(0, y_min)))
    rx_max, ry_max = abs(int(rand(0, ih - x_max))), abs(int(rand(0, iw - y_max)))
    image = image[ry_min:(y_max + ry_max), rx_min: (x_max + rx_max)]
    image = Image.fromarray(image)

    # 随机裁剪后，改变box中的值
    box[:, 0] = box[:, 0] - rx_min
    box[:, 1] = box[:, 1] - ry_min
    box[:, 2] = box[:, 2] - rx_min
    box[:, 3] = box[:, 3] - ry_min
    iw, ih = image.size  # 更新图像大小

    # 添加高斯噪声
    image = np.asarray(image)/255  # 图片灰度标准化
    mean = rand()*0.5
    sigma = rand()*0.5
    noise = np.random.normal(mean, sigma, image.shape).astype(dtype=np.float32)  # 产生高斯噪声
    output = image + noise
    output = np.clip(output, 0, 1)
    image = np.uint8(output*255)
    image = Image.fromarray(image)

    # 随机遮挡
    n = int(rand(0, 5))
    for num in range(n):
        n_image = image.copy()
        color_r = int(rand(0, 255))
        color_g = int(rand(0, 255))
        color_b = int(rand(0, 255))
        img = Image.new('RGB', (iw, ih), (color_r, color_g, color_b))
        # i，j为遮挡块大小
        i = int(rand(0, 100))
        j = int(rand(0, 100))
        img2 = img.crop((0, 0, i, j))
        y1 = int(rand(0, iw - 180))
        y2 = int(rand(0, ih - 180))
        n_image.paste(img2, (y1, y2))
        image = n_image

    # 对图像进行缩放并且进行长和宽的扭曲
    new_ar = w / h * rand(1 - jitter, 1 + jitter) / rand(1 - jitter, 1 + jitter)
    scale = rand(.2, 3)
    if new_ar < 1:
        nh = int(scale * h)
        nw = int(nh * new_ar)
    else:
        nw = int(scale * w)
        nh = int(nw / new_ar)
    if nh >= nw:
        image = image.resize((int(nw * w / nh), h), Image.BICUBIC)  # 重新设定图片的长和宽
    else:
        image = image.resize((w, int(nh * h / nw)), Image.BICUBIC)  # 重新设定图片的长和宽

    # 将图像多余的部分加上灰条
    while True:
        dx = int(rand(0, w - nw))
        dy = int(rand(0, h - nh))
        if nh >= nw:
            xmax = max(box[:, 2]) * int(nw * w / nh) / iw + dx  # 获取新的真实框的x最大值
            ymax = max(box[:, 3]) * h / ih + dy  # 获取新的真实框的y最大值
            xmin = min(box[:, 0]) * int(nw * w / nh) / iw + dx  # 获取新的真实框的x最小值
            ymin = min(box[:, 1]) * h / ih + dy  # 获取新的真实框的y最小值
        else:
            xmax = max(box[:, 2]) * w / iw + dx  # 获取新的真实框的x最大值
            ymax = max(box[:, 3]) * int(nh * w / nw) / ih + dy  # 获取新的真实框的y最大值
            xmin = min(box[:, 0]) * w / iw + dx  # 获取新的真实框的x最小值
            ymin = min(box[:, 1]) * int(nh * w / nw) / ih + dy  # 获取新的真实框的y最小值
        if xmax <= w and ymax <= h and xmin >= 0 and ymin >= 0:
            break
    new_image = Image.new('RGB', (w, h), (128, 128, 128))
    new_image.paste(image, (dx, dy))
    image = new_image

    # 翻转图像
    flip = rand() < .5
    if flip:
        image = image.transpose(Image.FLIP_LEFT_RIGHT)

    # 色域扭曲
    hue = rand(-hue, hue)
    sat = rand(1, sat) if rand() < .5 else 1 / rand(1, sat)
    val = rand(1, val) if rand() < .5 else 1 / rand(1, val)
    x = rgb_to_hsv(np.array(image) / 255.)
    x[..., 0] += hue
    x[..., 0][x[..., 0] > 1] -= 1
    x[..., 0][x[..., 0] < 0] += 1
    x[..., 1] *= sat
    x[..., 2] *= val
    x[x > 1] = 1
    x[x < 0] = 0
    image_data = hsv_to_rgb(x)  # numpy array, 0 to 1

    # 将box进行调整
    box_data = np.zeros((len(box), 5))
    if len(box) > 0:
        np.random.shuffle(box)
        if nh >= nw:
            box[:, [0, 2]] = box[:, [0, 2]] * int(nw * w / nh) / iw + dx  # 获取新的真实框的宽
            box[:, [1, 3]] = box[:, [1, 3]] * h / ih + dy  # 获取新的真实框的高
        else:
            box[:, [0, 2]] = box[:, [0, 2]] * w / iw + dx  # 获取新的真实框的宽
            box[:, [1, 3]] = box[:, [1, 3]] * int(nh * w / nw) / ih + dy  # 获取新的真实框的高
        if flip:
            box[:, [0, 2]] = w - box[:, [2, 0]]  # 如果进行过翻转，则宽值为：总宽减现宽
        box[:, 0:2][box[:, 0:2] < 0] = 0
        box[:, 2][box[:, 2] > w] = w
        box[:, 3][box[:, 3] > h] = h  # 上面两个作用为，超过总宽、高则边框调为最大值
        box_w = box[:, 2] - box[:, 0]
        box_h = box[:, 3] - box[:, 1]
        box = box[np.logical_and(box_w > 1, box_h > 1)]  # 丢弃无效框，即长宽小于1个像素的图片
        if len(box) > max_boxes:
            box = box[:max_boxes]  # 一张图片限制20个框
        box_data[:len(box)] = box

    return image_data, box_data, img_name


def normal_(annotation_line, input_shape):
    '''random preprocessing for real-time data augmentation'''
    line = annotation_line.split()
    img_name = line[0].split('/')
    img_name = img_name[-1].split('.')
    image = Image.open(line[0])
    box = np.array([np.array(list(map(int, box.split(',')))) for box in line[1:]])
    return image, box, img_name # 返回图片+真实框+图片名字


if __name__ == "__main__":
    i = 0
    while True:
        with open("./set/2007_train.txt") as f:  # 需要修改路径
            lines = f.readlines()
        a = np.random.randint(0, len(lines))
        line = lines[a]  # 随机取一张图片的地址

        img_shape = 416  # 设置增强后图片的大小

        # 下一步，通过循环自动生成xml文件。
        image_data, box_data, img_name = get_random_data(line, [img_shape, img_shape])
        img = Image.fromarray((image_data * 255).astype(np.uint8))
        img_xml = open('./set/ann/%s.xml' % img_name)                # 原xml文件地址
        save_xml = open('./set/ann1/%s%s.xml' % (i, img_name), 'w')    # 储存xml文件地址

        xml_lines = img_xml.readlines()[:]  # 读取原xml文件中的数据
        start_index = []

        xml_lines[2] = '\t<filename>%s%s.jpg</filename>\n' % (i, img_name)
        # -------------路径需要修改--------------
        xml_lines[3] = '\t<path>E:\program/yolov4\VOCdevkit\VOC2007\JPEGImages/%s%s.jpg</path>\n' % (i, img_name)

        xml_lines[8] = '\t\t<width>%d</width>\n' % img_shape
        xml_lines[9] = '\t\t<height>%d</height>\n' % img_shape

        while '\t\t<bndbox>\n' in xml_lines:
            b = xml_lines.index('\t\t<bndbox>\n')
            start_index.append(b)
            xml_lines[b] = 'delete'

        box_data1 = []
        for j in range(len(box_data)):
                box_data1.append(box_data[j])
                for x in range(len(box_data1)):
                    size = []
                    left, top, right, bottom = box_data1[x][0:4]
                    e = '\t\t<bndbox>\n'
                    a = '\t\t\t<xmin>%s</xmin>\n' % int(left)
                    b = '\t\t\t<ymin>%s</ymin>\n' % int(top)
                    c = '\t\t\t<xmax>%s</xmax>\n' % int(right)
                    d = '\t\t\t<ymax>%s</ymax>\n' % int(bottom)
                    f = '\t\t</bndbox>\n'
                    size.append(e)
                    size.append(a)
                    size.append(b)
                    size.append(c)
                    size.append(d)
                    size.append(f)

                    xml_lines[start_index[x]: start_index[x]+6] = size[:]

        for a in xml_lines:
            save_xml.write(a)

        # 储存数据增强后的结果
        img.save('./set/img1/%s%s.jpg' % (i, img_name))  # 需要修改路径

        i += 1
        if i >= 1:  # i为生成图片数量
            break

简单放几张图，对比一下：