计算机视觉技巧合集（五）完全版dataset类之目标检测篇

飞天小老虎66

于 2023-06-28 12:45:39 发布

阅读量196

点赞数

分类专栏：计算机视觉技巧合集文章标签：计算机视觉目标检测人工智能

本文链接：https://blog.csdn.net/qq_40691600/article/details/131434351

版权

计算机视觉技巧合集专栏收录该内容

8 篇文章 1 订阅

订阅专栏

前2篇介绍了常用的数据增强方法，那么如何将这些方法用在dataset类中呢？这一篇将具体地实现一个使用数据增强的目标检测dataset类。

这种定制的dataset类的设计思路一般是先分别读取出图像和标签，然后设计一个最简单的dataset类，可以完成输出单个样本的任务，最后将数据增强方法加入其中，并根据不同的数据增强方法的特征以及之间相互影响的关系，对多个数据增强方法排列组合，构成一个合理的数据增强流水线。这样做的话，我觉得应该是可以比较清晰地实现一个自己的dataset类。

本文中的dataset类参考了YOLOv5-5.0的datasets使用的数据增强方法，并按照其顺序进行排列，从而实现了一个读取VOC数据集的dataset类，读取COCO数据集的dataset类同理。

基本的流程如下：

先判断是否使用mosaic数据增强方法，如果使用，那么就获得mosaic数据增强后的图像和标签，如果不使用，那么就使用scale函数获得缩放到样本大小的图像和标签。
判断是否使用其它数据增强方法，如果使用，那么就进行其它类型的数据增强，如果不使用，那么就直接返回图像和标签。

示例代码如下：

import os
import numpy as np
import xml.etree.ElementTree as ET
from torch.utils.data import Dataset
from whole_dataset.augment import *
import random

class_names = [ 'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor' ]

# 读取VOC数据集，较《计算机视觉技巧合集（二）如何读取数据之目标检测篇-补充1》做了一点修改
def load_data_from_txt(text, img_root, anno_root, remove_difficult=False, img_paths=None, anno_paths=None):

    if img_paths is None and anno_paths is None:
        # 读取文本文件，获取图像名称列表
        with open(text, 'r') as f:
            img_names = f.readlines()

        # 获取标注文件路径列表，由于读取的图像名称是"2008_000013\n"这种形式，因此使用strip()函数去除'\n'
        anno_paths = [os.path.join(anno_root, img_name.strip()+".xml") for img_name in img_names]

        # 获取图像路径列表
        img_paths = [os.path.join(img_root, img_name.strip()+".jpg") for img_name in img_names]

    all_labels = []
    for anno_path in anno_paths:
        target = ET.parse(anno_path)
        root = target.getroot()

        # 获得图像的高和宽
        size = root.find("size")
        h = int(size.find("height").text)
        w = int(size.find("width").text)

        # 获取这张图像中全部的标签(类别+真实框)
        labels = []
        for object in root.iter("object"):

            # 获得辨认难度
            difficult = int(object.find("difficult").text) == 1

            # remove_difficult置1且difficult为1，那么跳过
            if difficult and remove_difficult:
                continue

            # 获取类别索引
            cls_name = object.find("name").text.strip()
            cls_index = int(class_names.index(cls_name))

            # 获取全部的标注真实框
            bndbox = object.find("bndbox")
            bbox = []
            points = ['xmin', 'ymin', 'xmax', 'ymax']
            for point in points:
                pt = float(bndbox.find(point).text)
                bbox.append(pt)

            # 添加标签
            label = [cls_index] + bbox
            labels.append(label)

        # 保证每张图像都有对应的标签，没有标签的图像生成一个背景类（类别0）的标签，便于之后进行坐标转换
        if len(labels) == 0:
            labels = np.zeros((1, 5))
        else:
            labels = np.array(labels, dtype=np.float32)

        # 返回全部标签
        all_labels.append(labels)

    return img_paths, all_labels

# 定制的dataset类
class DetectDataset(Dataset):
    def __init__(self, img_paths, labels, augment=True, img_size=640):
        # 初始化图像路径变量
        self.img_paths = img_paths
        # 初始化标签变量
        self.labels = labels

        # 图像的数量
        self.num_imgs = len(img_paths)
        self.indices = range(self.num_imgs)

        # 超参数配置
        # 是否进行数据增强
        self.augment = augment
        self.img_size = img_size
        # 使用mosaic方法的阈值
        self.use_mosaic = True
        self.mosaic_value = 0.5
        # 使用mixup方法的阈值
        self.mixup_value = 0.5
        # 缩放的超参数
        self.scale_fill = False
        # 随机仿射的超参数
        self.degrees = 0.373
        self.translate = 0.245
        self.scale = 0.898
        self.shear = 0.602
        self.perspective = 0.0
        # hsv超参数
        self.hsv_h = 0.5
        self.hsv_s = 0.5
        self.hsv_v = 0.5
        # 翻转超参数
        self.flipupdown = 0.5
        self.flipleftright = 0.5

    def __len__(self):
        return len(self.img_paths)

    def __getitem__(self, idx):

        img = None
        labels = None

        # 是否进行mosaic数据增强
        mosaic_flag = self.use_mosaic and random.random() < self.mosaic_value

        if mosaic_flag:
            # mosaic
            img, labels = mosaic(self.img_size, self.img_paths, self.labels, idx, self.indices)
            # test
            show_img_boxes("mosaic1 img", img, labels)
            # 随机仿射
            img, labels = random_perspective(img, labels,
                               degrees=self.degrees,
                               translate=self.translate,
                               scale=self.scale,
                               shear=self.shear,
                               perspective=self.perspective,
                               border = (-self.img_size // 2, -self.img_size // 2))
            # test
            show_img_boxes("perspective1 img", img, labels)

            # 随机进行mixup数据增强
            if random.random() < self.mixup_value:
                # mosaic
                img2, labels2 = mosaic(self.img_size, self.img_paths, self.labels, random.randint(0, self.num_imgs-1), self.indices)
                # test
                show_img_boxes("mosaic2 img", img2, labels2)
                # 随机仿射
                img2, labels2 = random_perspective(img2, labels2,
                                                   degrees=self.degrees,
                                                   translate=self.translate,
                                                   scale=self.scale,
                                                   shear=self.shear,
                                                   perspective=self.perspective,
                                                   border = (-self.img_size // 2, -self.img_size // 2))
                # test
                show_img_boxes("perspective2 img", img2, labels2)
                # mixup
                img, labels = mixup(img, labels, img2, labels2)
                # test
                show_img_boxes("mixup img", img, labels)
        else:
            # 读取图像
            img, origin_h, origin_w, (scale_h, scale_w) = load_img(self.img_paths[idx], self.img_size)
            # 读取标签
            labels = self.labels[idx]
            ratio = scale_h / origin_h
            labels[:, 1] = labels[:, 1] * ratio
            labels[:, 2] = labels[:, 2] * ratio
            labels[:, 3] = labels[:, 3] * ratio
            labels[:, 4] = labels[:, 4] * ratio

            # 缩放图像
            img, labels = scale(img, labels, new_shape=(self.img_size, self.img_size), scaleFill=self.scale_fill)
            # test
            show_img_boxes("scale img", img, labels)

        if self.augment:
            # 已使用了mosaic增强就不进行随机仿射了
            if not mosaic_flag:
                img, labels = random_perspective(img, labels,
                                                   degrees=self.degrees,
                                                   translate=self.translate,
                                                   scale=self.scale,
                                                   shear=self.shear,
                                                   perspective=self.perspective)

                # test
                show_img_boxes("perspective3 img", img, labels)

            # 色域增强
            augment_hsv(img, hgain=self.hsv_h, sgain=self.hsv_s, vgain=self.hsv_v)

            # test
            show_img_boxes("hsv img", img, labels)

            # 翻转变换
            img, labels = hrizontal_flip(img, labels, p=self.flipleftright)
            img, labels = vertical_flip(img, labels, p=self.flipupdown)

            # test
            show_img_boxes("flip img", img, labels)

        # 先BGR2RGB，再(C, H, W)
        img = img[:, :, ::-1].transpose(2, 0, 1)
        img = np.ascontiguousarray(img)

        return img, labels


if __name__ == "__main__":

    text_path = r"G:\datasets\VOCdevkit\VOC2012\ImageSets\Main\train.txt"
    img_root = r"G:\datasets\VOCdevkit\VOC2012\JPEGImages"
    anno_root = r"G:\datasets\VOCdevkit\VOC2012\Annotations"

    # 去除没有标签的图像，适用于没有背景类的模型，对于将背景作为一个类别的模型，可以将remove_flag置为False
    img_paths, anno_paths = remove_imgs(text_path, img_root, anno_root, remove_flag=True)

    # 读取图像路径和对应的标签
    img_paths, all_labels = load_data_from_txt(text_path, img_root, anno_root, remove_difficult=True, img_paths=img_paths, anno_paths=anno_paths)
    print(f"图像总数: {len(img_paths)}")
    print(f"标签总数: {len(all_labels)}")


    train_dataset = DetectDataset(img_paths, all_labels, augment=True, img_size=640)

    # 展示前2个样本对
    for index, data in enumerate(train_dataset):
        img, label = data
        img = img.transpose(1, 2, 0)
        img = img[:, :, ::-1]
        img = np.ascontiguousarray(img)
        show_img_boxes(str(index), img, label)

        if index == 1:
            break

不使用mosaic，不使用augmentation

self.use_mosaic = False
augment=False

程序运行结果如下：
在这里插入图片描述
Figure0：图像的高和宽是640 * 640

Figure1：图像的高和宽是640 * 640

不使用mosaic，使用augmentation

self.use_mosaic = False
augment=True

程序运行结果如下：

在这里插入图片描述
Figure2：图像的高和宽是640 * 640

在这里插入图片描述
Figure3：图像的高和宽是640 * 640

使用mosaic，不使用augmentation

self.use_mosaic = True
augment=False

程序运行结果如下：
在这里插入图片描述
Figure4：图像的高和宽是640 * 640

Figure5：图像的高和宽是640 * 640

使用mosaic，使用augmentation

self.use_mosaic = True
augment=True

程序运行结果如下：
在这里插入图片描述
Figure6：图像的高和宽是640 * 640

Figure7：图像的高和宽是640 * 640

augment.py文件如下所示：

import cv2
import matplotlib.pyplot as plt
import os
import numpy as np
import xml.etree.ElementTree as ET
import random
import math

class_names = [ 'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog','horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor' ]

def load_box(anno_path):
    target = ET.parse(anno_path)
    root = target.getroot()

    # 获得图像的高和宽
    size = root.find("size")
    h = int(size.find("height").text)
    w = int(size.find("width").text)

    # 获取这张图像中全部的标签(类别+真实框)
    labels = []
    for object in root.iter("object"):

        # 获得辨认难度
        difficult = int(object.find("difficult").text) == 1

        # 获取类别索引
        cls_name = object.find("name").text.strip()
        cls_index = int(class_names.index(cls_name))

        # 获取全部的标注真实框
        bndbox = object.find("bndbox")
        bbox = []
        points = ['xmin', 'ymin', 'xmax', 'ymax']
        for point in points:
            pt = float(bndbox.find(point).text)
            bbox.append(pt)

        # 添加标签
        label = [cls_index] + bbox
        labels.append(label)

    labels = np.array(labels, dtype=np.float32)

    return labels

# 去除没有真实框的图像
def remove_imgs(text, img_root, anno_root, remove_flag=True):

    # 读取文本文件，获取图像名称列表
    with open(text, 'r') as f:
        img_names = f.readlines()

    # 获取标注文件路径列表，由于读取的图像名称是"2008_000013\n"这种形式，因此使用strip()函数去除'\n'
    anno_paths = [os.path.join(anno_root, img_name.strip() + ".xml") for img_name in img_names]

    # 获取图像路径列表
    img_paths = [os.path.join(img_root, img_name.strip() + ".jpg") for img_name in img_names]

    for index, anno_path in enumerate(anno_paths):
        boxes = load_box(anno_path)
        if boxes.size == 0 and remove_flag:
            img_paths.remove(img_paths[index])
            anno_paths.remove(anno_path)

    return img_paths, anno_paths

def generate_random_color():

    color_list = []
    for _ in range(20):
        r = random.randint(0, 255)
        g = random.randint(0, 255)
        b = random.randint(0, 255)
        color_list.append((r, g, b))
    return color_list

def show_img_boxes(title, img, boxes):
    img = img.astype(np.uint8)
    color_list = generate_random_color()

    for category_index, x1, y1, x2, y2 in boxes:
        color = color_list[int(category_index)]
        cv2.rectangle(img, (int(x1), int(y1)), (int(x2), int(y2)), color, 2)  # 绘制矩形框
        cv2.putText(img, class_names[int(category_index)], (int(x1), int(y1) - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)  # 添加标签

    # cv2.imshow("image", img)
    # cv2.waitKey(0)
    cv2.imwrite(os.path.join("./imgs", title+".jpg"), img)

# 随机裁剪
def random_crop(img, boxes, crop_size):
    # 计算裁剪后图像的左上角坐标范围
    height, width,_ = img.shape
    x_left = width - crop_size[1]
    y_left = height - crop_size[0]
    # 随机生成左上角坐标
    x_random_left = random.randint(0, x_left)
    y_random_left = random.randint(0, y_left)
    # 裁剪图像
    img_cropped = img[y_random_left:y_random_left+crop_size[0], x_random_left:x_random_left+crop_size[1]]
    # 调整真实框的坐标
    boxes_cropped = boxes.copy()
    boxes_cropped[:, 1] = np.maximum(boxes[:, 1] - x_random_left, 0)
    boxes_cropped[:, 2] = np.maximum(boxes[:, 2] - y_random_left, 0)
    boxes_cropped[:, 3] = np.minimum(boxes[:, 3] - x_random_left, crop_size[1])
    boxes_cropped[:, 4] = np.minimum(boxes[:, 4] - y_random_left, crop_size[0])

    return img_cropped, boxes_cropped

# 随机水平翻转
def hrizontal_flip(img, boxes, p = 0.5):
    height, width, _ = img.shape
    img_Horizontal_flip = img.copy()
    boxes_horizontal_flip = boxes.copy()

    if random.random() <= p:
        img_Horizontal_flip = np.fliplr(img)
        boxes_horizontal_flip[:, [1, 3]] = width - boxes[:, [1, 3]]
    return img_Horizontal_flip, boxes_horizontal_flip

# 随机垂直翻转
def vertical_flip(img, boxes, p = 0.5):
    height, width, _ = img.shape
    img_vertical_flip = img.copy()
    boxes_vertical_flip = boxes.copy()

    if random.random() <= p:
        img_vertical_flip = np.flipud(img)
        boxes_vertical_flip[:, [2, 4]] = height - boxes[:, [2, 4]]
    return img_vertical_flip, boxes_vertical_flip

# 缩放 (代码实现参考自https://github.com/ultralytics/yolov5/blob/master/utils/augmentations.py#L111)，配合随机裁剪使用，先随机裁剪再缩放回统一的样本大小
def scale(img, boxes, new_shape=(640, 640), color=(114, 114, 114), scaleFill=False, scaleup=True):

    shape = img.shape[:2]

    # 缩放比例
    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
    # 只缩小，不放大，为了更好的测试精确
    if not scaleup:
        r = min(r, 1.0)

    # 高和宽的缩放比例
    ratio = r, r
    # 缩放图像的高和宽
    new_unpad = int(round(shape[0] * r)), int(round(shape[1] * r))
    # 填充的高和宽灰边大小
    ph, pw = new_shape[0] - new_unpad[0], new_shape[1] - new_unpad[1]
    # 直接缩放图像，不填充
    if scaleFill:
        ph, pw = 0.0, 0.0
        new_unpad = (new_shape[0], new_shape[1])
        ratio = new_shape[0] / shape[0], new_shape[1] / shape[1]  # height, width ratios

    # 因为高和宽各有2条边，所以除以2
    ph /= 2
    pw /= 2

    # 缩放图像
    if shape != new_unpad:
        img = cv2.resize(img, (new_unpad[1], new_unpad[0]), interpolation=cv2.INTER_LINEAR)

    # 填充灰边
    top, bottom = int(round(ph - 0.1)), int(round(ph + 0.1))
    left, right = int(round(pw - 0.1)), int(round(pw + 0.1))
    img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border

    # 缩放框，因为有1边会填充，因此 x * r + pw, y * r + ph
    boxes_scaled = boxes.copy()
    boxes_scaled[:, 1] = boxes[:, 1] * ratio[1] + pw
    boxes_scaled[:, 2] = boxes[:, 2] * ratio[0] + ph
    boxes_scaled[:, 3] = boxes[:, 3] * ratio[1] + pw
    boxes_scaled[:, 4] = boxes[:, 4] * ratio[0] + ph

    return img, boxes_scaled

# 颜色变换
def augment_hsv(img, hgain=0.5, sgain=0.5, vgain=0.5):
    # 生成随机增强幅度
    r = np.random.uniform(-1, 1, 3) * [hgain, sgain, vgain] + 1
    # 将图像从BGR色彩空间转换为HSV色彩空间，并分离出H（色调）、S（饱和度）和V（亮度）通道
    hue, sat, val = cv2.split(cv2.cvtColor(img, cv2.COLOR_BGR2HSV))
    dtype = img.dtype  # uint8

    # 生成LUT（Look-Up Table）以进行颜色增强
    x = np.arange(0, 256, dtype=np.int16)
    lut_hue = ((x * r[0]) % 180).astype(dtype)
    lut_sat = np.clip(x * r[1], 0, 255).astype(dtype)
    lut_val = np.clip(x * r[2], 0, 255).astype(dtype)

    # 使用LUT对HSV通道进行增强
    img_hsv = cv2.merge((cv2.LUT(hue, lut_hue), cv2.LUT(sat, lut_sat), cv2.LUT(val, lut_val))).astype(dtype)
    # 将增强后的图像从HSV色彩空间转换回BGR色彩空间
    cv2.cvtColor(img_hsv, cv2.COLOR_HSV2BGR, dst=img)

# 读取图像
def load_img(img_path, img_size):
    img = cv2.imread(img_path)
    h, w = img.shape[0], img.shape[1]
    r = img_size / max(h, w)
    if r !=1:
        img = cv2.resize(img, (int(w*r), int(h*r)), interpolation=cv2.INTER_AREA)
    return img, h, w, img.shape[:2]

# mosaic(拼接)
def mosaic(img_size, img_paths, all_labels, index, indices):

    mosaic_border = [-img_size // 2, -img_size // 2]
    yc, xc = (int(random.uniform(-x, 2 * img_size + x)) for x in mosaic_border)

    index4 = [index] + random.sample(indices, 3)

    label4 = []

    img4 = np.full((img_size * 2, img_size * 2, 3), 114, dtype=np.uint8)
    for i, index in enumerate(index4):
        img, origin_h, origin_w, (scale_h, scale_w) = load_img(img_paths[index], img_size)
        labels = all_labels[index]

        h, w = img.shape[0], img.shape[1]
        if i == 0:
            # 画布上图像的位置
            x1a, y1a, x2a, y2a = max(xc - w, 0), max(yc - h, 0), xc, yc
            # 截取的原图区域
            x1b, y1b, x2b, y2b = w - (x2a - x1a), h - (y2a - y1a), w, h
        elif i == 1:
            x1a, y1a, x2a, y2a = xc, max(yc - h, 0), min(xc + w, img_size * 2), yc
            x1b, y1b, x2b, y2b = 0, h - (y2a - y1a), min(w, x2a - x1a), h
        elif i == 2:
            x1a, y1a, x2a, y2a = max(xc - w, 0), yc, xc, min(yc + h, img_size * 2)
            x1b, y1b, x2b, y2b = w - (x2a - x1a), 0, w, min(y2a - y1a, h)
        elif i == 3:
            x1a, y1a, x2a, y2a = xc, yc, min(w + xc, img_size * 2), min(yc + h, img_size * 2)
            x1b, y1b, x2b, y2b = 0, 0, min(x2a - x1a, w), min(y2a - y1a, h)
        # h,w
        img4[y1a:y2a, x1a:x2a] = img[y1b:y2b, x1b:x2b]
        padw = x1a - x1b
        padh = y1a - y1b

        ratio = scale_h / origin_h
        boxes_pad = labels.copy()
        boxes_pad[:, 1] = labels[:, 1] * ratio + padw
        boxes_pad[:, 2] = labels[:, 2] * ratio + padh
        boxes_pad[:, 3] = labels[:, 3] * ratio + padw
        boxes_pad[:, 4] = labels[:, 4] * ratio + padh

        label4.append(boxes_pad)

    label4 = np.concatenate(label4, 0)

    for label in label4[:, 1:]:
        np.clip(label, 0, 640 * 2, out=label)

    return img4, label4


# mixup
def mixup(img1, label1, img2, label2):
    r = np.random.beta(32.0, 32.0)
    img = (img1 * r + img2 * (1 - r)).astype(np.uint8)
    labels = np.concatenate((label1, label2), 0)
    return img, labels

# 一般perspective: 0.0均设为0.0
def random_perspective(img, targets=(), segments=(), degrees=10, translate=.1, scale=.1, shear=10, perspective=0.0,
                       border=(0, 0), use_segments=True):

    # 获得样本的高和宽
    height = img.shape[0] + border[0] * 2
    width = img.shape[1] + border[1] * 2

    # 平移矩阵 C，用于将图像的中心点移动到原点(0,0)
    C = np.eye(3)
    C[0, 2] = -img.shape[1] / 2  # x translation (pixels)
    C[1, 2] = -img.shape[0] / 2  # y translation (pixels)

    # 透视变换矩阵 P，通过随机生成的透视参数对图像进行投影变换
    P = np.eye(3)
    P[2, 0] = random.uniform(-perspective, perspective)  # x perspective (about y)
    P[2, 1] = random.uniform(-perspective, perspective)  # y perspective (about x)

    # 旋转和缩放变换矩阵 R，通过随机生成的角度和尺度对图像进行旋转和缩放变换。角度 a 控制旋转的角度，尺度 s 控制缩放的比例。
    R = np.eye(3)
    a = random.uniform(-degrees, degrees)
    # a += random.choice([-180, -90, 0, 90])  # add 90deg rotations to small rotations
    s = random.uniform(1 - scale, 1 + scale)
    # s = 2 ** random.uniform(-scale, scale)
    R[:2] = cv2.getRotationMatrix2D(angle=a, center=(0, 0), scale=s)

    # 剪切变换矩阵 S，通过随机生成的剪切参数对图像进行剪切变换
    S = np.eye(3)
    S[0, 1] = math.tan(random.uniform(-shear, shear) * math.pi / 180)  # x shear (deg)
    S[1, 0] = math.tan(random.uniform(-shear, shear) * math.pi / 180)  # y shear (deg)

    # 平移变换矩阵 T，通过随机生成的平移参数对图像进行平移变换
    T = np.eye(3)
    T[0, 2] = random.uniform(0.5 - translate, 0.5 + translate) * width  # x translation (pixels)
    T[1, 2] = random.uniform(0.5 - translate, 0.5 + translate) * height  # y translation (pixels)

    # 合并旋转矩阵
    M = T @ S @ R @ P @ C  # 操作顺序是从右到左的（非常重要）
    if (border[0] != 0) or (border[1] != 0) or (M != np.eye(3)).any():  # 图像发生了变化
        # 透视变换
        if perspective:
            img = cv2.warpPerspective(img, M, dsize=(width, height), borderValue=(114, 114, 114))
        # 仿射变换
        else:
            img = cv2.warpAffine(img, M[:2], dsize=(width, height), borderValue=(114, 114, 114))

    # 变换真实框坐标
    n = len(targets)
    if n:
        new = np.zeros((n, 4))
        xy = np.ones((n * 4, 3))
        xy[:, :2] = targets[:, [1, 2, 3, 4, 1, 4, 3, 2]].reshape(n * 4, 2)  # x1y1, x2y2, x1y2, x2y1
        # 真实框的顶点像素做跟图像一样的变换
        xy = xy @ M.T
        # 透视变换或仿射变换
        xy = (xy[:, :2] / xy[:, 2:3] if perspective else xy[:, :2]).reshape(n, 8)  #

        # 最小外包矩形框
        x = xy[:, [0, 2, 4, 6]]
        y = xy[:, [1, 3, 5, 7]]
        new = np.concatenate((x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T

        # 将坐标限制在图像内
        new[:, [0, 2]] = new[:, [0, 2]].clip(0, width)
        new[:, [1, 3]] = new[:, [1, 3]].clip(0, height)

        # 设置一些过滤条件，过滤掉不合适的框
        i = box_candidates(box1=targets[:, 1:5].T * s, box2=new.T, area_thr=0.01 if use_segments else 0.10)
        targets = targets[i]
        targets[:, 1:5] = new[i]

    return img, targets


# 通过设定高、宽的阈值，设定高宽比的阈值，设定区域面积比的阈值来筛选可以使用的框
# eps是为了防止除以0
def box_candidates(box1, box2, wh_thr=2, ar_thr=20, area_thr=0.1, eps=1e-16):  # box1(4,n), box2(4,n)
    # Compute candidate boxes: box1 before augment, box2 after augment, wh_thr (pixels), aspect_ratio_thr, area_ratio
    w1, h1 = box1[2] - box1[0], box1[3] - box1[1]
    w2, h2 = box2[2] - box2[0], box2[3] - box2[1]
    ar = np.maximum(w2 / (h2 + eps), h2 / (w2 + eps))  # aspect ratio
    return (w2 > wh_thr) & (h2 > wh_thr) & (w2 * h2 / (w1 * h1 + eps) > area_thr) & (ar < ar_thr)  # candidates

def bbox_ioa(box1, box2):
    box2 = box2.transpose()

    # 获取box1和box2的左上角和右下角坐标
    b1_x1, b1_y1, b1_x2, b1_y2 = box1[0], box1[1], box1[2], box1[3]
    b2_x1, b2_y1, b2_x2, b2_y2 = box2[0], box2[1], box2[2], box2[3]

    # 计算box1和box2的重叠区域大小
    inter_area = (np.minimum(b1_x2, b2_x2) - np.maximum(b1_x1, b2_x1)).clip(0) * \
                 (np.minimum(b1_y2, b2_y2) - np.maximum(b1_y1, b2_y1)).clip(0)

    # 计算box2的区域大小
    box2_area = (b2_x2 - b2_x1) * (b2_y2 - b2_y1) + 1e-16

    # 重叠率
    return inter_area / box2_area

def cutout(image, labels):
    h, w = image.shape[:2]

    #权重*对应的图像大小
    scales = [0.5] * 1 + [0.25] * 2 + [0.125] * 4 + [0.0625] * 8 + [0.03125] * 16
    for s in scales:
        mask_h = random.randint(1, int(h * s))
        mask_w = random.randint(1, int(w * s))

        # 遮盖区域
        xmin = max(0, random.randint(0, w) - mask_w // 2)
        ymin = max(0, random.randint(0, h) - mask_h // 2)
        xmax = min(w, xmin + mask_w)
        ymax = min(h, ymin + mask_h)

        # 使用随机的颜色覆盖
        image[ymin:ymax, xmin:xmax] = [random.randint(64, 191) for _ in range(3)]

        # 返回保留的框
        if len(labels) and s > 0.03:
            box = np.array([xmin, ymin, xmax, ymax], dtype=np.float32)
            # 重叠率
            ioa = bbox_ioa(box, labels[:, 1:5])
            # 保留重叠率在60%以下的框
            labels = labels[ioa < 0.60]

    return labels

飞天小老虎66

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
计算机视觉技巧合集（五）完全版dataset类之目标检测篇

前2篇介绍了常用的数据增强方法，那么如何将这些方法用在dataset类中呢？这一篇将具体地实现一个使用数据增强的目标检测dataset类。这种定制的dataset类的设计思路一般是先分别读取出图像和标签，然后设计一个最简单的dataset类，可以完成输出单个样本的任务，最后将数据增强方法加入其中，并根据不同的数据增强方法的特征以及之间相互影响的关系，对多个数据增强方法排列组合，构成一个合理的数据增强流水线。这样做的话，我觉得应该是可以比较清晰地实现一个自己的dataset类。
复制链接

扫一扫