【图普科技】边界框的数据增强(二) ——缩放和平移

最新推荐文章于 2024-01-15 16:13:40 发布

图普科技

最新推荐文章于 2024-01-15 16:13:40 发布

阅读量1k

点赞数

分类专栏：图像识别深度学习文章标签：深度学习图像识别数据增强

图像识别同时被 2 个专栏收录

4 篇文章 0 订阅

订阅专栏

深度学习

2 篇文章 0 订阅

订阅专栏

本文由【图普科技】编译。

这是我们根据目标检测任务调整图像增强技术系列文章的第二部分。在这一部分中,我们将介绍如何实现缩放和平移的数据增强技术, 以及图像增强后，如果出现边界框在图像区域之外的情况该如何处理。

在上一部分中,我们介绍了实现增强的一般方法以及水平翻转的增强技术.

GitHub Repo

本文的所有内容以及所有的增强库都可以在下面的 GitHub Repo中找到。

https://github.com/Paperspace/DataAugmentationForObjectDetection

文档

可以通过在浏览器中打开 docs/build/html/index.html或在此链接中找到此项目的文档。

本系列的第1部分是本文的先决条件, 强烈建议您先阅读它。

本系列包括4个部分。
第1部分：基本设计和水平翻转

第2部分：缩放和平移

第3部分：旋转和裁剪

第4部分：所有技术整合

在这篇文章中, 我们将实现缩放和平移这两种增强技术。

缩放

缩放和平移后图像效果与下图相似。

左: 原始图像右: 应用了缩放增强技术。

设计决策

我们首先需要考虑的是缩放增强技术的参数。显而易见，我们要确定的是原始图像尺寸的缩放因子。因此,它必须是大于-1的值，不能缩小到小于其自身的尺寸。
我们可以选择通过控制高度和宽度缩放因子的一致来保持一定长宽比。但是, 我们可以允许缩放因子不同, 这不仅会产生缩放的图片增强效果, 而且可以更改图像的长宽比。我们引入了一个布尔变量diff,可以关闭/启用此功能。
在使用缩放技术实现随机图像增强时, 我们需要一个从某个时间间隔随机取得的样本缩放因子。我们处理它的方式是，用户需要提供缩放因子scale的采样范围。如果用户只提供一个浮点型的scale ，它必须是正的，而缩放因子的采样范围为 (-scale, scale)。

现在让我们来定义__init__ 方法。

class RandomScale(object):
    """Randomly scales an image    
    
    Bounding boxes which have an area of less than 25% in the remaining in the 
    transformed image is dropped. The resolution is maintained, and the remaining
    area if any is filled by black color.
    
    Parameters
    ----------
    scale: float or tuple(float)
        if **float**, the image is scaled by a factor drawn 
        randomly from a range (1 - `scale` , 1 + `scale`). If **tuple**,
        the `scale` is drawn randomly from values specified by the 
        tuple
        
    Returns
    -------
    
    numpy.ndaaray
        Scaled image in the numpy format of shape `HxWxC`
numpy.ndarray
        Tranformed bounding box co-ordinates of the format `n x 4` where n is 
        number of bounding boxes and 4 represents `x1,y1,x2,y2` of the box
        
    """

    def __init__(self, scale = 0.2, diff = False):
        self.scale = scale

        
        if type(self.scale) == tuple:
            assert len(self.scale) == 2, "Invalid range"
            assert self.scale[0] > -1, "Scale factor can't be less than -1"
            assert self.scale[1] > -1, "Scale factor can't be less than -1"
        else:
            assert self.scale > 0, "Please input a positive float"
            self.scale = (max(-1, -self.scale), self.scale)
        
        self.diff = diff

这里的一个经验教训是不应采样__init__函数的最终参数。如果你采样了__init__函数的最终参数，在每次调用该函数时，参数的值都相同。如此一来，该增强技术就无法设定随机元素

将其传递给__call__ 函数将导致在每次应用增强技术时，参数具有不同的值（范围内）。

该增强技术的逻辑

缩放平移的逻辑相当简单。我们使用 opencv 函数cv2.resize来缩放我们的图像内容, 并通过缩放因子来缩放边界框。

但是, 我们将保持图像的大小不变。因此，如果我们缩小的话, 将会产生多余的空白区域。我们将这部分用黑色表示。最终的输出将类似于上面显示的增强图像。

img_shape = img.shape
        
if self.diff:
	scale_x = random.uniform(*self.scale)
	scale_y = random.uniform(*self.scale)
else:
	scale_x = random.uniform(*self.scale)
	scale_y = scale_x

resize_scale_x = 1 + scale_x
resize_scale_y = 1 + scale_y

img=  cv2.resize(img, None, fx = resize_scale_x, fy = resize_scale_y)

bboxes[:,:4] *= [resize_scale_x, resize_scale_y, resize_scale_x, resize_scale_y]

首先, 我们创建一个黑色图像，大小与我们的原始图像相同。

canvas = np.zeros(img_shape, dtype = np.uint8)

然后, 我们需要确定缩放后图像的大小。如果超过了原始图像的尺寸（放大时），则需要裁剪超出原始尺寸的部分。然后，我们“粘贴”调整后的图像在画布上。

y_lim = int(min(resize_scale_y,1)*img_shape[0])
x_lim = int(min(resize_scale_x,1)*img_shape[1])

canvas[:y_lim,:x_lim,:] =  img[:y_lim,:x_lim,:]

img = canvas

边界框裁剪

放大后，目标可能从图像中被“挤出”了，我们要做的最后一件事就是处理这种情况。例如，放大因子为0.1时，最终图像的尺寸是原始图像的1.1倍。这就使得原图像中的足球被挤出，不再存在于我们的图像中。

在放大的情况下，足球被挤出图像

这样的情况不仅是在放大的时候会出现,在应用平移等其他增强技术的过程中也出现。

因此，我们在辅助文件 bbox_utils.py 中定义了一个函数 clip_box ，此函数将根据边界框总面积处于图像边界内的百分比来剪除边界框。此百分比是一个可控制的参数，可在文件bbox_utils.py 中定义,

def clip_box(bbox, clip_box, alpha):
    """Clip the bounding boxes to the borders of an image
    
    Parameters
    ----------
    
    bbox: numpy.ndarray
        Numpy array containing bounding boxes of shape `N X 4` where N is the 
        number of bounding boxes and the bounding boxes are represented in the
        format `x1 y1 x2 y2`
    
    clip_box: numpy.ndarray
        An array of shape (4,) specifying the diagonal co-ordinates of the image
        The coordinates are represented in the format `x1 y1 x2 y2`
        
    alpha: float
        If the fraction of a bounding box left in the image after being clipped is 
        less than `alpha` the bounding box is dropped. 
    
    Returns
    -------
    
    numpy.ndarray
        Numpy array containing **clipped** bounding boxes of shape `N X 4` where N is the 
        number of bounding boxes left are being clipped and the bounding boxes are represented in the
        format `x1 y1 x2 y2` 
    
    """
    ar_ = (bbox_area(bbox))
    x_min = np.maximum(bbox[:,0], clip_box[0]).reshape(-1,1)
    y_min = np.maximum(bbox[:,1], clip_box[1]).reshape(-1,1)
    x_max = np.minimum(bbox[:,2], clip_box[2]).reshape(-1,1)
    y_max = np.minimum(bbox[:,3], clip_box[3]).reshape(-1,1)
    
    bbox = np.hstack((x_min, y_min, x_max, y_max, bbox[:,4:]))
    
    delta_area = ((ar_ - bbox_area(bbox))/ar_)
    
    mask = (delta_area < (1 - alpha)).astype(int)
    
    bbox = bbox[mask == 1,:]


    return bbox

上述函数修改了bboxes 数组，并移除了那些在图像增强后有太多区域损失了的边界框。

然后, 我们只需在我们 RandomScale 中的__call__ 方法中调用此函数, 以确保所有边界框都已经被剪除。在这里，我们删除的是那些图像边界内的边界框部分占整个边界框面积的比例低于25% 的边界框。

bboxes = clip_box(bboxes, [0,0,1 + img_shape[1], img_shape[0]], 0.25)

为了计算边界框的面积, 我们还定义了一个函数bbox_area。

最后, 我们完整的 _ call _ 方法如下所示:

def __call__(self, img, bboxes):


	#Chose a random digit to scale by 

	img_shape = img.shape

	if self.diff:
		scale_x = random.uniform(*self.scale)
		scale_y = random.uniform(*self.scale)
	else:
		scale_x = random.uniform(*self.scale)
		scale_y = scale_x

    resize_scale_x = 1 + scale_x
    resize_scale_y = 1 + scale_y

    img=  cv2.resize(img, None, fx = resize_scale_x, fy = resize_scale_y)

    bboxes[:,:4] *= [resize_scale_x, resize_scale_y, resize_scale_x, resize_scale_y]



    canvas = np.zeros(img_shape, dtype = np.uint8)

    y_lim = int(min(resize_scale_y,1)*img_shape[0])
    x_lim = int(min(resize_scale_x,1)*img_shape[1])

    print(y_lim, x_lim)

    canvas[:y_lim,:x_lim,:] =  img[:y_lim,:x_lim,:]

    img = canvas
    bboxes = clip_box(bboxes, [0,0,1 + img_shape[1], img_shape[0]], 0.25)


    return img, bboxes

平移

我们接下来要讲的增强技术是平移，效果如下所示。

与缩放时一样，增强参数是原始图像尺寸的平移因子。上述的设计决策也同样适用。

除了确保该平移因子不小于-1，还应确保它不大于1，否则你只能得到一个黑色图片，原始图像内容被整体转移了。

class RandomTranslate(object):
    """Randomly Translates the image    
    
    
    Bounding boxes which have an area of less than 25% in the remaining in the 
    transformed image is dropped. The resolution is maintained, and the remaining
    area if any is filled by black color.
    
    Parameters
    ----------
    translate: float or tuple(float)
        if **float**, the image is translated by a factor drawn 
        randomly from a range (1 - `translate` , 1 + `translate`). If **tuple**,
        `translate` is drawn randomly from values specified by the 
        tuple
        
    Returns
    -------
    
    numpy.ndaaray
        Translated image in the numpy format of shape `HxWxC`
    
    numpy.ndarray
        Tranformed bounding box co-ordinates of the format `n x 4` where n is 
        number of bounding boxes and 4 represents `x1,y1,x2,y2` of the box
        
    """

    def __init__(self, translate = 0.2, diff = False):
        self.translate = translate
        
        if type(self.translate) == tuple:
            assert len(self.translate) == 2, "Invalid range"  
            assert self.translate[0] > 0 & self.translate[0] < 1
            assert self.translate[1] > 0 & self.translate[1] < 1


        else:
            assert self.translate > 0 & self.translate < 1
            self.translate = (-self.translate, self.translate)
            
            
        self.diff = diff

增强技术逻辑

这种增强的逻辑比缩放要棘手得多。所以, 我们要对它做一些解释。

我们首先设置变量。

def __call__(self, img, bboxes):        
    #Chose a random digit to scale by 
    img_shape = img.shape

    #translate the image

    #percentage of the dimension of the image to translate
    translate_factor_x = random.uniform(*self.translate)
translate_factor_y = random.uniform(*self.translate)

if not self.diff:
    translate_factor_y = translate_factor_x

现在, 当我们平移图像时, 会留下一些尾随空白。我们将把它用黑色表示。你可以在上面的平移演示图像中很容易地找出这些黑色区域。

我们首先初始化一个黑色图像, 其大小与我们原来的图像相同。

canvas = np.zeros(img_shape)

然后，我们有两个任务。首先，如第一个图像所示，确定图像将被粘贴在黑画布的哪部分。

然后，确定图像的哪一部分将被粘贴。

左:图像中将被粘贴的区域。右: 将要粘贴图像的画布区域,

因此, 我们首先要找出将要粘贴到画布图像上的图像区域。(左边图像上的紫色区域)。

#get the top-left corner co-ordinates of the shifted image 
corner_x = int(translate_factor_x*img.shape[1])
corner_y = int(translate_factor_y*img.shape[0])

mask = img[max(-corner_y, 0):min(img.shape[0], -corner_y + img_shape[0]), max(-corner_x, 0):min(img.shape[1], -corner_x + img_shape[1]),:]

现在，让我们确定将要“粘贴”图像mask的画布区域。

orig_box_cords =  [max(0,corner_y), max(corner_x,0), min(img_shape[0], corner_y + img.shape[0]), min(img_shape[1],corner_x + img.shape[1])]

canvas[orig_box_cords[0]:orig_box_cords[2], orig_box_cords[1]:orig_box_cords[3],:] = mask
img = canvas

边界框的平移相对简单，只需要偏移边界框的角。我们同样删除那些图像边界内的边界框部分占整个边界框面积的比例低于25% 的边界框。

bboxes[:,:4] += [corner_x, corner_y, corner_x, corner_y]

bboxes = clip_box(bboxes, [0,0,img_shape[1], img_shape[0]], 0.25)

总之, 我们的调用函数如下所示。

def __call__(self, img, bboxes):        
    #Chose a random digit to scale by 
    img_shape = img.shape
    
    #translate the image
    
    #percentage of the dimension of the image to translate
    translate_factor_x = random.uniform(*self.translate)
    translate_factor_y = random.uniform(*self.translate)
    
    if not self.diff:
        translate_factor_y = translate_factor_x
        
    canvas = np.zeros(img_shape).astype(np.uint8)


    corner_x = int(translate_factor_x*img.shape[1])
    corner_y = int(translate_factor_y*img.shape[0])
    
    
    
    #change the origin to the top-left corner of the translated box
    orig_box_cords =  [max(0,corner_y), max(corner_x,0), min(img_shape[0], corner_y + img.shape[0]), min(img_shape[1],corner_x + img.shape[1])]

    mask = img[max(-corner_y, 0):min(img.shape[0], -corner_y + img_shape[0]), max(-corner_x, 0):min(img.shape[1], -corner_x + img_shape[1]),:]
    canvas[orig_box_cords[0]:orig_box_cords[2], orig_box_cords[1]:orig_box_cords[3],:] = mask
    img = canvas
    
    bboxes[:,:4] += [corner_x, corner_y, corner_x, corner_y]
    
    
    bboxes = clip_box(bboxes, [0,0,img_shape[1], img_shape[0]], 0.25)
    

    

    
    return img, bboxes

测试

如上例所示，您可以在样本图像或您自己的图像上（前提是您已以正确的格式存储它）测试这些增强技术。

from data_aug.bbox_utils import *
import matplotlib.pyplot as plt 

scale = RandomScale(0.2, diff = True)  
translate = RandomTranslate(0.2, diff = True)

img, bboxes = translate(img, bboxes)
img,bboxes = scale(img, bboxes)

plt.imshow(draw_rect(img, bboxes))

最终的结果可能是这样的。