前2篇介绍了常用的数据增强方法,那么如何将这些方法用在dataset类中呢?这一篇将具体地实现一个使用数据增强的目标检测dataset类。
这种定制的dataset类的设计思路一般是先分别读取出图像和标签,然后设计一个最简单的dataset类,可以完成输出单个样本的任务,最后将数据增强方法加入其中,并根据不同的数据增强方法的特征以及之间相互影响的关系,对多个数据增强方法排列组合,构成一个合理的数据增强流水线。这样做的话,我觉得应该是可以比较清晰地实现一个自己的dataset类。
本文中的dataset类参考了YOLOv5-5.0的datasets使用的数据增强方法,并按照其顺序进行排列,从而实现了一个读取VOC数据集的dataset类,读取COCO数据集的dataset类同理。
基本的流程如下:
- 先判断是否使用mosaic数据增强方法,如果使用,那么就获得mosaic数据增强后的图像和标签,如果不使用,那么就使用scale函数获得缩放到样本大小的图像和标签。
- 判断是否使用其它数据增强方法,如果使用,那么就进行其它类型的数据增强,如果不使用,那么就直接返回图像和标签。
示例代码如下:
import os
import numpy as np
import xml.etree.ElementTree as ET
from torch.utils.data import Dataset
from whole_dataset.augment import *
import random
class_names = [ 'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor' ]
# 读取VOC数据集,较《计算机视觉技巧合集(二)如何读取数据之目标检测篇-补充1》做了一点修改
def load_data_from_txt(text, img_root, anno_root, remove_difficult=False, img_paths=None, anno_paths=None):
if img_paths is None and anno_paths is None:
# 读取文本文件,获取图像名称列表
with open(text, 'r') as f:
img_names = f.readlines()
# 获取标注文件路径列表,由于读取的图像名称是"2008_000013\n"这种形式,因此使用strip()函数去除'\n'
anno_paths = [os.path.join(anno_root, img_name.strip()+".xml") for img_name in img_names]
# 获取图像路径列表
img_paths = [os.path.join(img_root, img_name.strip()+".jpg") for img_name in img_names]
all_labels = []
for anno_path in anno_paths:
target = ET.parse(anno_path)
root = target.getroot()
# 获得图像的高和宽
size = root.find("size")
h = int(size.find("height").text)
w = int(size.find("width").text)
# 获取这张图像中全部的标签(类别+真实框)
labels = []
for object in root.iter("object"):
# 获得辨认难度
difficult = int(object.find("difficult").text) == 1
# remove_difficult置1且difficult为1,那么跳过
if difficult and remove_difficult:
continue
# 获取类别索引
cls_name = object.find("name").text.strip()
cls_index = int(class_names.index(cls_name))
# 获取全部的标注真实框
bndbox = object.find("bndbox")
bbox = []
points = ['xmin', 'ymin', 'xmax', 'ymax']
for point in points:
pt = float(bndbox.find(point).text)
bbox.append(pt)
# 添加标签
label = [cls_index] + bbox
labels.append(label)
# 保证每张图像都有对应的标签,没有标签的图像生成一个背景类(类别0)的标签,便于之后进行坐标转换
if len(labels) == 0:
labels = np.zeros((1, 5))
else:
labels = np.array(labels, dtype=np.float32)
# 返回全部标签
all_labels.append(labels)
return img_paths, all_labels
# 定制的dataset类
class DetectDataset(Dataset):
def __init__(self, img_paths, labels, augment=True, img_size=640):
# 初始化图像路径变量
self.img_paths = img_paths
# 初始化标签变量
self.labels = labels
# 图像的数量
self.num_imgs = len(img_paths)
self.indices = range(self.num_imgs)
# 超参数配置
# 是否进行数据增强
self.augment = augment
self.img_size = img_size
# 使用mosaic方法的阈值
self.use_mosaic = True
self.mosaic_value = 0.5
# 使用mixup方法的阈值
self.mixup_value = 0.5
# 缩放的超参数
self.scale_fill = False
# 随机仿射的超参数
self.degrees = 0.373
self.translate = 0.245
self.scale = 0.898
self.shear = 0.602
self.perspective = 0.0
# hsv超参数
self.hsv_h = 0.5
self.hsv_s = 0.5
self.hsv_v = 0.5
# 翻转超参数
self.flipupdown = 0.5
self.flipleftright = 0.5
def __len__(self):
return len(self.img_paths)
def __getitem__(self, idx):
img = None
labels = None
# 是否进行mosaic数据增强
mosaic_flag = self.use_mosaic and random.random() < self.mosaic_value
if mosaic_flag:
# mosaic
img, labels = mosaic(self.img_size, self.img_paths, self.labels, idx, self.indices)
# test
show_img_boxes("mosaic1 img", img, labels)
# 随机仿射
img, labels = random_perspective(img, labels,
degrees=self.degrees,
translate=self.translate,
scale=self.scale,
shear=self.shear,
perspective=self.perspective,
border = (-self.img_size // 2, -self.img_size // 2))
# test
show_img_boxes("perspective1 img", img, labels)
# 随机进行mixup数据增强
if random.random() < self.mixup_value:
# mosaic
img2, labels2 = mosaic(self.img_size, self.img_paths, self.labels, random.randint(0, self.num_imgs-1), self.indices)
# test
show_img_boxes("mosaic2 img", img2, labels2)
# 随机仿射
img2, labels2 = random_perspective(img2, labels2,
degrees=self.degrees,
translate=self.translate,
scale=self.scale,
shear=self.shear,
perspective=self.perspective,
border = (-self.img_size // 2, -self.img_size // 2))
# test
show_img_boxes("perspective2 img", img2, labels2)
# mixup
img, labels = mixup(img, labels, img2, labels2)
# test
show_img_boxes("mixup img", img, labels)
else:
# 读取图像
img, origin_h, origin_w, (scale_h, scale_w) = load_img(self.img_paths[idx], self.img_size)
# 读取标签
labels = self.labels[idx]
ratio = scale_h / origin_h
labels[:, 1] = labels[:, 1] * ratio
labels[:, 2] = labels[:, 2] * ratio
labels[:, 3] = labels[:, 3] * ratio
labels[:, 4] = labels[:, 4] * ratio
# 缩放图像
img, labels = scale(img, labels, new_shape=(self.img_size, self.img_size), scaleFill=self.scale_fill)
# test
show_img_boxes("scale img", img, labels)
if self.augment:
# 已使用了mosaic增强就不进行随机仿射了
if not mosaic_flag:
img, labels = random_perspective(img, labels,
degrees=self.degrees,
translate=self.translate,
scale=self.scale,
shear=self.shear,
perspective=self.perspective)
# test
show_img_boxes("perspective3 img", img, labels)
# 色域增强
augment_hsv(img, hgain=self.hsv_h, sgain=self.hsv_s, vgain=self.hsv_v)
# test
show_img_boxes("hsv img", img, labels)
# 翻转变换
img, labels = hrizontal_flip(img, labels, p=self.flipleftright)
img, labels = vertical_flip(img, labels, p=self.flipupdown)
# test
show_img_boxes("flip img", img, labels)
# 先BGR2RGB,再(C, H, W)
img = img[:, :, ::-1].transpose(2, 0, 1)
img = np.ascontiguousarray(img)
return img, labels
if __name__ == "__main__":
text_path = r"G:\datasets\VOCdevkit\VOC2012\ImageSets\Main\train.txt"
img_root = r"G:\datasets\VOCdevkit\VOC2012\JPEGImages"
anno_root = r"G:\datasets\VOCdevkit\VOC2012\Annotations"
# 去除没有标签的图像,适用于没有背景类的模型,对于将背景作为一个类别的模型,可以将remove_flag置为False
img_paths, anno_paths = remove_imgs(text_path, img_root, anno_root, remove_flag=True)
# 读取图像路径和对应的标签
img_paths, all_labels = load_data_from_txt(text_path, img_root, anno_root, remove_difficult=True, img_paths=img_paths, anno_paths=anno_paths)
print(f"图像总数: {len(img_paths)}")
print(f"标签总数: {len(all_labels)}")
train_dataset = DetectDataset(img_paths, all_labels, augment=True, img_size=640)
# 展示前2个样本对
for index, data in enumerate(train_dataset):
img, label = data
img = img.transpose(1, 2, 0)
img = img[:, :, ::-1]
img = np.ascontiguousarray(img)
show_img_boxes(str(index), img, label)
if index == 1:
break
不使用mosaic,不使用augmentation
self.use_mosaic = False
augment=False
程序运行结果如下:
Figure0:图像的高和宽是640 * 640
Figure1:图像的高和宽是640 * 640
不使用mosaic,使用augmentation
self.use_mosaic = False
augment=True
程序运行结果如下:
Figure2:图像的高和宽是640 * 640
Figure3:图像的高和宽是640 * 640
使用mosaic,不使用augmentation
self.use_mosaic = True
augment=False
程序运行结果如下:
Figure4:图像的高和宽是640 * 640
Figure5:图像的高和宽是640 * 640
使用mosaic,使用augmentation
self.use_mosaic = True
augment=True
程序运行结果如下:
Figure6:图像的高和宽是640 * 640
Figure7:图像的高和宽是640 * 640
augment.py文件如下所示:
import cv2
import matplotlib.pyplot as plt
import os
import numpy as np
import xml.etree.ElementTree as ET
import random
import math
class_names = [ 'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog','horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor' ]
def load_box(anno_path):
target = ET.parse(anno_path)
root = target.getroot()
# 获得图像的高和宽
size = root.find("size")
h = int(size.find("height").text)
w = int(size.find("width").text)
# 获取这张图像中全部的标签(类别+真实框)
labels = []
for object in root.iter("object"):
# 获得辨认难度
difficult = int(object.find("difficult").text) == 1
# 获取类别索引
cls_name = object.find("name").text.strip()
cls_index = int(class_names.index(cls_name))
# 获取全部的标注真实框
bndbox = object.find("bndbox")
bbox = []
points = ['xmin', 'ymin', 'xmax', 'ymax']
for point in points:
pt = float(bndbox.find(point).text)
bbox.append(pt)
# 添加标签
label = [cls_index] + bbox
labels.append(label)
labels = np.array(labels, dtype=np.float32)
return labels
# 去除没有真实框的图像
def remove_imgs(text, img_root, anno_root, remove_flag=True):
# 读取文本文件,获取图像名称列表
with open(text, 'r') as f:
img_names = f.readlines()
# 获取标注文件路径列表,由于读取的图像名称是"2008_000013\n"这种形式,因此使用strip()函数去除'\n'
anno_paths = [os.path.join(anno_root, img_name.strip() + ".xml") for img_name in img_names]
# 获取图像路径列表
img_paths = [os.path.join(img_root, img_name.strip() + ".jpg") for img_name in img_names]
for index, anno_path in enumerate(anno_paths):
boxes = load_box(anno_path)
if boxes.size == 0 and remove_flag:
img_paths.remove(img_paths[index])
anno_paths.remove(anno_path)
return img_paths, anno_paths
def generate_random_color():
color_list = []
for _ in range(20):
r = random.randint(0, 255)
g = random.randint(0, 255)
b = random.randint(0, 255)
color_list.append((r, g, b))
return color_list
def show_img_boxes(title, img, boxes):
img = img.astype(np.uint8)
color_list = generate_random_color()
for category_index, x1, y1, x2, y2 in boxes:
color = color_list[int(category_index)]
cv2.rectangle(img, (int(x1), int(y1)), (int(x2), int(y2)), color, 2) # 绘制矩形框
cv2.putText(img, class_names[int(category_index)], (int(x1), int(y1) - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2) # 添加标签
# cv2.imshow("image", img)
# cv2.waitKey(0)
cv2.imwrite(os.path.join("./imgs", title+".jpg"), img)
# 随机裁剪
def random_crop(img, boxes, crop_size):
# 计算裁剪后图像的左上角坐标范围
height, width,_ = img.shape
x_left = width - crop_size[1]
y_left = height - crop_size[0]
# 随机生成左上角坐标
x_random_left = random.randint(0, x_left)
y_random_left = random.randint(0, y_left)
# 裁剪图像
img_cropped = img[y_random_left:y_random_left+crop_size[0], x_random_left:x_random_left+crop_size[1]]
# 调整真实框的坐标
boxes_cropped = boxes.copy()
boxes_cropped[:, 1] = np.maximum(boxes[:, 1] - x_random_left, 0)
boxes_cropped[:, 2] = np.maximum(boxes[:, 2] - y_random_left, 0)
boxes_cropped[:, 3] = np.minimum(boxes[:, 3] - x_random_left, crop_size[1])
boxes_cropped[:, 4] = np.minimum(boxes[:, 4] - y_random_left, crop_size[0])
return img_cropped, boxes_cropped
# 随机水平翻转
def hrizontal_flip(img, boxes, p = 0.5):
height, width, _ = img.shape
img_Horizontal_flip = img.copy()
boxes_horizontal_flip = boxes.copy()
if random.random() <= p:
img_Horizontal_flip = np.fliplr(img)
boxes_horizontal_flip[:, [1, 3]] = width - boxes[:, [1, 3]]
return img_Horizontal_flip, boxes_horizontal_flip
# 随机垂直翻转
def vertical_flip(img, boxes, p = 0.5):
height, width, _ = img.shape
img_vertical_flip = img.copy()
boxes_vertical_flip = boxes.copy()
if random.random() <= p:
img_vertical_flip = np.flipud(img)
boxes_vertical_flip[:, [2, 4]] = height - boxes[:, [2, 4]]
return img_vertical_flip, boxes_vertical_flip
# 缩放 (代码实现参考自https://github.com/ultralytics/yolov5/blob/master/utils/augmentations.py#L111),配合随机裁剪使用,先随机裁剪再缩放回统一的样本大小
def scale(img, boxes, new_shape=(640, 640), color=(114, 114, 114), scaleFill=False, scaleup=True):
shape = img.shape[:2]
# 缩放比例
r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
# 只缩小,不放大,为了更好的测试精确
if not scaleup:
r = min(r, 1.0)
# 高和宽的缩放比例
ratio = r, r
# 缩放图像的高和宽
new_unpad = int(round(shape[0] * r)), int(round(shape[1] * r))
# 填充的高和宽灰边大小
ph, pw = new_shape[0] - new_unpad[0], new_shape[1] - new_unpad[1]
# 直接缩放图像,不填充
if scaleFill:
ph, pw = 0.0, 0.0
new_unpad = (new_shape[0], new_shape[1])
ratio = new_shape[0] / shape[0], new_shape[1] / shape[1] # height, width ratios
# 因为高和宽各有2条边,所以除以2
ph /= 2
pw /= 2
# 缩放图像
if shape != new_unpad:
img = cv2.resize(img, (new_unpad[1], new_unpad[0]), interpolation=cv2.INTER_LINEAR)
# 填充灰边
top, bottom = int(round(ph - 0.1)), int(round(ph + 0.1))
left, right = int(round(pw - 0.1)), int(round(pw + 0.1))
img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color) # add border
# 缩放框,因为有1边会填充,因此 x * r + pw, y * r + ph
boxes_scaled = boxes.copy()
boxes_scaled[:, 1] = boxes[:, 1] * ratio[1] + pw
boxes_scaled[:, 2] = boxes[:, 2] * ratio[0] + ph
boxes_scaled[:, 3] = boxes[:, 3] * ratio[1] + pw
boxes_scaled[:, 4] = boxes[:, 4] * ratio[0] + ph
return img, boxes_scaled
# 颜色变换
def augment_hsv(img, hgain=0.5, sgain=0.5, vgain=0.5):
# 生成随机增强幅度
r = np.random.uniform(-1, 1, 3) * [hgain, sgain, vgain] + 1
# 将图像从BGR色彩空间转换为HSV色彩空间,并分离出H(色调)、S(饱和度)和V(亮度)通道
hue, sat, val = cv2.split(cv2.cvtColor(img, cv2.COLOR_BGR2HSV))
dtype = img.dtype # uint8
# 生成LUT(Look-Up Table)以进行颜色增强
x = np.arange(0, 256, dtype=np.int16)
lut_hue = ((x * r[0]) % 180).astype(dtype)
lut_sat = np.clip(x * r[1], 0, 255).astype(dtype)
lut_val = np.clip(x * r[2], 0, 255).astype(dtype)
# 使用LUT对HSV通道进行增强
img_hsv = cv2.merge((cv2.LUT(hue, lut_hue), cv2.LUT(sat, lut_sat), cv2.LUT(val, lut_val))).astype(dtype)
# 将增强后的图像从HSV色彩空间转换回BGR色彩空间
cv2.cvtColor(img_hsv, cv2.COLOR_HSV2BGR, dst=img)
# 读取图像
def load_img(img_path, img_size):
img = cv2.imread(img_path)
h, w = img.shape[0], img.shape[1]
r = img_size / max(h, w)
if r !=1:
img = cv2.resize(img, (int(w*r), int(h*r)), interpolation=cv2.INTER_AREA)
return img, h, w, img.shape[:2]
# mosaic(拼接)
def mosaic(img_size, img_paths, all_labels, index, indices):
mosaic_border = [-img_size // 2, -img_size // 2]
yc, xc = (int(random.uniform(-x, 2 * img_size + x)) for x in mosaic_border)
index4 = [index] + random.sample(indices, 3)
label4 = []
img4 = np.full((img_size * 2, img_size * 2, 3), 114, dtype=np.uint8)
for i, index in enumerate(index4):
img, origin_h, origin_w, (scale_h, scale_w) = load_img(img_paths[index], img_size)
labels = all_labels[index]
h, w = img.shape[0], img.shape[1]
if i == 0:
# 画布上图像的位置
x1a, y1a, x2a, y2a = max(xc - w, 0), max(yc - h, 0), xc, yc
# 截取的原图区域
x1b, y1b, x2b, y2b = w - (x2a - x1a), h - (y2a - y1a), w, h
elif i == 1:
x1a, y1a, x2a, y2a = xc, max(yc - h, 0), min(xc + w, img_size * 2), yc
x1b, y1b, x2b, y2b = 0, h - (y2a - y1a), min(w, x2a - x1a), h
elif i == 2:
x1a, y1a, x2a, y2a = max(xc - w, 0), yc, xc, min(yc + h, img_size * 2)
x1b, y1b, x2b, y2b = w - (x2a - x1a), 0, w, min(y2a - y1a, h)
elif i == 3:
x1a, y1a, x2a, y2a = xc, yc, min(w + xc, img_size * 2), min(yc + h, img_size * 2)
x1b, y1b, x2b, y2b = 0, 0, min(x2a - x1a, w), min(y2a - y1a, h)
# h,w
img4[y1a:y2a, x1a:x2a] = img[y1b:y2b, x1b:x2b]
padw = x1a - x1b
padh = y1a - y1b
ratio = scale_h / origin_h
boxes_pad = labels.copy()
boxes_pad[:, 1] = labels[:, 1] * ratio + padw
boxes_pad[:, 2] = labels[:, 2] * ratio + padh
boxes_pad[:, 3] = labels[:, 3] * ratio + padw
boxes_pad[:, 4] = labels[:, 4] * ratio + padh
label4.append(boxes_pad)
label4 = np.concatenate(label4, 0)
for label in label4[:, 1:]:
np.clip(label, 0, 640 * 2, out=label)
return img4, label4
# mixup
def mixup(img1, label1, img2, label2):
r = np.random.beta(32.0, 32.0)
img = (img1 * r + img2 * (1 - r)).astype(np.uint8)
labels = np.concatenate((label1, label2), 0)
return img, labels
# 一般perspective: 0.0均设为0.0
def random_perspective(img, targets=(), segments=(), degrees=10, translate=.1, scale=.1, shear=10, perspective=0.0,
border=(0, 0), use_segments=True):
# 获得样本的高和宽
height = img.shape[0] + border[0] * 2
width = img.shape[1] + border[1] * 2
# 平移矩阵 C,用于将图像的中心点移动到原点(0,0)
C = np.eye(3)
C[0, 2] = -img.shape[1] / 2 # x translation (pixels)
C[1, 2] = -img.shape[0] / 2 # y translation (pixels)
# 透视变换矩阵 P,通过随机生成的透视参数对图像进行投影变换
P = np.eye(3)
P[2, 0] = random.uniform(-perspective, perspective) # x perspective (about y)
P[2, 1] = random.uniform(-perspective, perspective) # y perspective (about x)
# 旋转和缩放变换矩阵 R,通过随机生成的角度和尺度对图像进行旋转和缩放变换。角度 a 控制旋转的角度,尺度 s 控制缩放的比例。
R = np.eye(3)
a = random.uniform(-degrees, degrees)
# a += random.choice([-180, -90, 0, 90]) # add 90deg rotations to small rotations
s = random.uniform(1 - scale, 1 + scale)
# s = 2 ** random.uniform(-scale, scale)
R[:2] = cv2.getRotationMatrix2D(angle=a, center=(0, 0), scale=s)
# 剪切变换矩阵 S,通过随机生成的剪切参数对图像进行剪切变换
S = np.eye(3)
S[0, 1] = math.tan(random.uniform(-shear, shear) * math.pi / 180) # x shear (deg)
S[1, 0] = math.tan(random.uniform(-shear, shear) * math.pi / 180) # y shear (deg)
# 平移变换矩阵 T,通过随机生成的平移参数对图像进行平移变换
T = np.eye(3)
T[0, 2] = random.uniform(0.5 - translate, 0.5 + translate) * width # x translation (pixels)
T[1, 2] = random.uniform(0.5 - translate, 0.5 + translate) * height # y translation (pixels)
# 合并旋转矩阵
M = T @ S @ R @ P @ C # 操作顺序是从右到左的(非常重要)
if (border[0] != 0) or (border[1] != 0) or (M != np.eye(3)).any(): # 图像发生了变化
# 透视变换
if perspective:
img = cv2.warpPerspective(img, M, dsize=(width, height), borderValue=(114, 114, 114))
# 仿射变换
else:
img = cv2.warpAffine(img, M[:2], dsize=(width, height), borderValue=(114, 114, 114))
# 变换真实框坐标
n = len(targets)
if n:
new = np.zeros((n, 4))
xy = np.ones((n * 4, 3))
xy[:, :2] = targets[:, [1, 2, 3, 4, 1, 4, 3, 2]].reshape(n * 4, 2) # x1y1, x2y2, x1y2, x2y1
# 真实框的顶点像素做跟图像一样的变换
xy = xy @ M.T
# 透视变换或仿射变换
xy = (xy[:, :2] / xy[:, 2:3] if perspective else xy[:, :2]).reshape(n, 8) #
# 最小外包矩形框
x = xy[:, [0, 2, 4, 6]]
y = xy[:, [1, 3, 5, 7]]
new = np.concatenate((x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T
# 将坐标限制在图像内
new[:, [0, 2]] = new[:, [0, 2]].clip(0, width)
new[:, [1, 3]] = new[:, [1, 3]].clip(0, height)
# 设置一些过滤条件,过滤掉不合适的框
i = box_candidates(box1=targets[:, 1:5].T * s, box2=new.T, area_thr=0.01 if use_segments else 0.10)
targets = targets[i]
targets[:, 1:5] = new[i]
return img, targets
# 通过设定高、宽的阈值,设定高宽比的阈值,设定区域面积比的阈值来筛选可以使用的框
# eps是为了防止除以0
def box_candidates(box1, box2, wh_thr=2, ar_thr=20, area_thr=0.1, eps=1e-16): # box1(4,n), box2(4,n)
# Compute candidate boxes: box1 before augment, box2 after augment, wh_thr (pixels), aspect_ratio_thr, area_ratio
w1, h1 = box1[2] - box1[0], box1[3] - box1[1]
w2, h2 = box2[2] - box2[0], box2[3] - box2[1]
ar = np.maximum(w2 / (h2 + eps), h2 / (w2 + eps)) # aspect ratio
return (w2 > wh_thr) & (h2 > wh_thr) & (w2 * h2 / (w1 * h1 + eps) > area_thr) & (ar < ar_thr) # candidates
def bbox_ioa(box1, box2):
box2 = box2.transpose()
# 获取box1和box2的左上角和右下角坐标
b1_x1, b1_y1, b1_x2, b1_y2 = box1[0], box1[1], box1[2], box1[3]
b2_x1, b2_y1, b2_x2, b2_y2 = box2[0], box2[1], box2[2], box2[3]
# 计算box1和box2的重叠区域大小
inter_area = (np.minimum(b1_x2, b2_x2) - np.maximum(b1_x1, b2_x1)).clip(0) * \
(np.minimum(b1_y2, b2_y2) - np.maximum(b1_y1, b2_y1)).clip(0)
# 计算box2的区域大小
box2_area = (b2_x2 - b2_x1) * (b2_y2 - b2_y1) + 1e-16
# 重叠率
return inter_area / box2_area
def cutout(image, labels):
h, w = image.shape[:2]
#权重*对应的图像大小
scales = [0.5] * 1 + [0.25] * 2 + [0.125] * 4 + [0.0625] * 8 + [0.03125] * 16
for s in scales:
mask_h = random.randint(1, int(h * s))
mask_w = random.randint(1, int(w * s))
# 遮盖区域
xmin = max(0, random.randint(0, w) - mask_w // 2)
ymin = max(0, random.randint(0, h) - mask_h // 2)
xmax = min(w, xmin + mask_w)
ymax = min(h, ymin + mask_h)
# 使用随机的颜色覆盖
image[ymin:ymax, xmin:xmax] = [random.randint(64, 191) for _ in range(3)]
# 返回保留的框
if len(labels) and s > 0.03:
box = np.array([xmin, ymin, xmax, ymax], dtype=np.float32)
# 重叠率
ioa = bbox_ioa(box, labels[:, 1:5])
# 保留重叠率在60%以下的框
labels = labels[ioa < 0.60]
return labels