[翻系列]检测框的数据增强3：旋转与剪切

最新推荐文章于 2023-11-24 20:21:13 发布

亚里仕多德

最新推荐文章于 2023-11-24 20:21:13 发布

阅读量509

点赞数

分类专栏：人工ZZ看世界文章标签：计算机视觉深度学习 python

原文链接：https://blog.paperspace.com/data-augmentation-for-object-detection-rotation-and-shearing/

版权

人工ZZ看世界专栏收录该内容

4 篇文章 0 订阅

订阅专栏

本文介绍了如何利用OpenCV的AffineMatrix实现图像旋转和剪切功能，包括计算旋转矩阵、调整图像尺寸、处理检测框变化和确保不失真。通过实例演示和关键代码片段，读者将学会如何在对象检测任务中应用这些数据增强技术。

摘要由CSDN通过智能技术生成

原文链接：https://blog.paperspace.com/

在这里插入图片描述

伙计们又是一周不见！本期我们会继续利用Affine Matrix来实现旋转和剪切功能。

在开始之前，如果你还看我之前的两篇博文，我十分推荐阅读前两篇内容，因为本篇的方法都是基于它们实现的。

1. Part 1: Basic Design and Horizontal Flipping 中文版

2. Part 2: Scaling and Translation 中文版

代码地址

本章用的方法和所有数据增强方法都放在下面这个链接中

https://github.com/Paperspace/DataAugmentationForObjectDetection

好啦，让我们开始吧！

旋转

旋转最后的结果如图所示

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-iZmJM6Iq-1619440149185)(https://blog.paperspace.com/content/images/2018/09/rotate_pic.png)]

旋转是图像数据增强中“最有意思”的方法之一，调整呼吸，端好心态，认真学习吧。

在我们去解释那些恶心的代码的时候，就先由本人做一些恶心的开胃菜

Affine Transformation: 如果一条直线在变换前变换后仍平行，我们将其定义为Affine Transformation：大小变换，位移，旋转等等都是其中的特例。

在计算机图形学中，我们使用一个叫 transformation matrix的玩意，这玩意十分方便的帮助我们处理各种affine transformation。

我们不会在这里长篇大论其中的原理，如果你有兴趣，可以到文章底部看看我提供的链接。现在，我们只要把它当作是一个矩阵，只要使用矩阵乘法就可以实现原始图与变换图中点坐标的变换。
$T_p = M*[x\ y\ 1]^T$
transformation matrix的大小是2x3，直接与T_p相乘就能实现坐标的转化。其中的T_p中的1是为了方便下面要介绍到的剪切功能的实现。

利用transformation matrix可以很快计算以图像为中心旋转θ角的点的坐标，其中这玩意长下面这样：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-yESpVZHF-1619440149188)(https://blog.paperspace.com/content/images/2018/09/image-1.png)]

Image source: https://cristianpb.github.io/blog/image-rotation-opencv. Scale is 1.

幸好，我们不用去利用Affine Matrix实现图像旋转。OpenCV已经提供了快速通道：cv2.wrapAffine，我们只需要提供transformation matix即可。有了上面这些基础，让我们进入代码部分。

首先，仍是定义__init__函数

def __init__(self, angle = 10):
    self.angle = angle
    
    if type(self.angle) == tuple:
        assert len(self.angle) == 2, "Invalid range"   
    else:
        self.angle = (-self.angle, self.angle)

旋转图像

现在，要做的第一件事就是以图像的中心为旋转点旋转θ度，因此就有了旋转矩阵，在OpenCV中是getRotationMatrix2D

(h, w) = image.shape[:2]
(cX, cY) = (w // 2, h // 2)
M = cv2.getRotationMatrix2D((cX, cY), angle, 1.0)

因此，我们只要一键粘贴使用wrapAffine函数即可

image = cv2.warpAffine(image, M, (w, h))

第三个参数(w,h)代表输出的大小。如果保持相同的分辨率，一经旋转，难免图像的大小或者说内容会超出原来的维度，OpenCV会将超出的部分删去，就像下面这样：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-MnXIe582-1619440149191)(https://blog.paperspace.com/content/images/2018/09/opencv_cut_rotate.png)]

OpenCV 旋转的问题

显而易见，我们损失了一些信息。怎么办呢？再再次感谢聪明的OpenCV开发人员考虑到了这个问题，也就是参数(w,h)，只要我们计算出最小包括所有信息的维度，那么信息损失的问题就解决了。

想法来自Adrian Rosebrock的博客,感谢。

怎么找到新的维度呢？用一点几何知识和如下图就能解决问题

Image source: https://cristianpb.github.io/blog/image-rotation-opencv

其中
$N_w = h∗sin(θ)+w∗cos(θ) \\ N_h = h∗cos(θ)+w∗sin(θ)$
这样我们就得到了新的维度，其中的角度我们可以从Affine Matrix中得到，所以代码如下：

cos = np.abs(M[0, 0])
sin = np.abs(M[0, 1])

# compute the new bounding dimensions of the image
nW = int((h * sin) + (w * cos))
nH = int((h * cos) + (w * sin))

**不要松懈！**这里其实还有问题，图片的中心没有移动呀，之前已经讲过了Affine Matrix的是以图片中心旋转的，因此我们要保证Affine Matrix中的中心是(nW/2, nH/2)，这里我们只用在矩阵里加上差值即可，差值为nW/2 - cX, nH/2 - cH：

# adjust the rotation matrix to take into account translation
M[0, 2] += (nW / 2) - cX
M[1, 2] += (nH / 2) - cY

综上所述，我们定义rotate_im函数放在bbox_util.py完成图像的旋转。

def rotate_im(image, angle):
    """Rotate the image.
    
    Rotate the image such that the rotated image is enclosed inside the tightest
    rectangle. The area not occupied by the pixels of the original image is colored
    black. 
    
    Parameters
    ----------
    
    image : numpy.ndarray
        numpy image
    
    angle : float
        angle by which the image is to be rotated
    
    Returns
    -------
    
    numpy.ndarray
        Rotated Image
    
    """
    # grab the dimensions of the image and then determine the
    # centre
    (h, w) = image.shape[:2]
    (cX, cY) = (w // 2, h // 2)

    # grab the rotation matrix (applying the negative of the
    # angle to rotate clockwise), then grab the sine and cosine
    # (i.e., the rotation components of the matrix)
    M = cv2.getRotationMatrix2D((cX, cY), angle, 1.0)
    cos = np.abs(M[0, 0])
    sin = np.abs(M[0, 1])

    # compute the new bounding dimensions of the image
    nW = int((h * sin) + (w * cos))
    nH = int((h * cos) + (w * sin))

    # adjust the rotation matrix to take into account translation
    M[0, 2] += (nW / 2) - cX
    M[1, 2] += (nH / 2) - cY

    # perform the actual rotation and return the image
    image = cv2.warpAffine(image, M, (nW, nH))

#    image = cv2.resize(image, (w,h))
    return image

旋转检测框

这是这个方法最困难的部分了。首先我们首先让检测框跟随图像旋转，然后找到其最小平行的矩形框。

咱们来看看图更好的说明。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-V1VfDAXl-1619440149217)(https://blog.paperspace.com/content/images/2018/09/rotate_box.png)]

我的建议是，咱们还是搞到四个点的坐标吧。其实两个点也成，不过四个点会让代码写的更加方便。

所以，在bbox_utils.py写一个get_corners去获得四个点的坐标。

def get_corners(bboxes):
    
    """Get corners of bounding boxes
    
    Parameters
    ----------
    
    bboxes: numpy.ndarray
        Numpy array containing bounding boxes of shape `N X 4` where N is the 
        number of bounding boxes and the bounding boxes are represented in the
        format `x1 y1 x2 y2`
    
    returns
    -------
    
    numpy.ndarray
        Numpy array of shape `N x 8` containing N bounding boxes each described by their 
        corner co-ordinates `x1 y1 x2 y2 x3 y3 x4 y4`      
        
    """
    width = (bboxes[:,2] - bboxes[:,0]).reshape(-1,1)
    height = (bboxes[:,3] - bboxes[:,1]).reshape(-1,1)
    
    x1 = bboxes[:,0].reshape(-1,1)
    y1 = bboxes[:,1].reshape(-1,1)
    
    x2 = x1 + width
    y2 = y1 
    
    x3 = x1
    y3 = y1 + height
    
    x4 = bboxes[:,2].reshape(-1,1)
    y4 = bboxes[:,3].reshape(-1,1)
    
    corners = np.hstack((x1,y1,x2,y2,x3,y3,x4,y4))
    
    return corners

然后我们在bbox_util.py再定义rotate_box函数利用四个点八个坐标值x1,y1,x2,y2,x3,y3,x4,y4获取旋转后的检测框，也就是用刚刚提到了Affine transformation，想起来了嘛！

def rotate_box(corners,angle,  cx, cy, h, w):
    
    """Rotate the bounding box.
    
    
    Parameters
    ----------
    
    corners : numpy.ndarray
        Numpy array of shape `N x 8` containing N bounding boxes each described by their 
        corner co-ordinates `x1 y1 x2 y2 x3 y3 x4 y4`
    
    angle : float
        angle by which the image is to be rotated
        
    cx : int
        x coordinate of the center of image (about which the box will be rotated)
        
    cy : int
        y coordinate of the center of image (about which the box will be rotated)
        
    h : int 
        height of the image
        
    w : int 
        width of the image
    
    Returns
    -------
    
    numpy.ndarray
        Numpy array of shape `N x 8` containing N rotated bounding boxes each described by their 
        corner co-ordinates `x1 y1 x2 y2 x3 y3 x4 y4`
    """

    corners = corners.reshape(-1,2)
    corners = np.hstack((corners, np.ones((corners.shape[0],1), dtype = type(corners[0][0]))))
    
    M = cv2.getRotationMatrix2D((cx, cy), angle, 1.0)
    
    
    cos = np.abs(M[0, 0])
    sin = np.abs(M[0, 1])
    
    nW = int((h * sin) + (w * cos))
    nH = int((h * cos) + (w * sin))
    # adjust the rotation matrix to take into account translation
    M[0, 2] += (nW / 2) - cx
    M[1, 2] += (nH / 2) - cy
    # Prepare the vector to be transformed
    calculated = np.dot(M,corners.T).T
    
    calculated = calculated.reshape(-1,8)
    
    return calculated

最后，就是定义我们要的内接矩形啦，代码放在了get_enclosing_box

def get_enclosing_box(corners):
    """Get an enclosing box for ratated corners of a bounding box
    
    Parameters
    ----------
    
    corners : numpy.ndarray
        Numpy array of shape `N x 8` containing N bounding boxes each described by their 
        corner co-ordinates `x1 y1 x2 y2 x3 y3 x4 y4`  
    
    Returns 
    -------
    
    numpy.ndarray
        Numpy array containing enclosing bounding boxes of shape `N X 4` where N is the 
        number of bounding boxes and the bounding boxes are represented in the
        format `x1 y1 x2 y2`
        
    """
    x_ = corners[:,[0,2,4,6]]
    y_ = corners[:,[1,3,5,7]]
    
    xmin = np.min(x_,1).reshape(-1,1)
    ymin = np.min(y_,1).reshape(-1,1)
    xmax = np.max(x_,1).reshape(-1,1)
    ymax = np.max(y_,1).reshape(-1,1)
    
    final = np.hstack((xmin, ymin, xmax, ymax,corners[:,8:]))
    
    return final

但是但是，我们最终要的仍是左上角和右下角坐标，在__call__提取了这些信息并进行总和。

def __call__(self, img, bboxes):

    angle = random.uniform(*self.angle)

    w,h = img.shape[1], img.shape[0]
    cx, cy = w//2, h//2

    img = rotate_im(img, angle)

    corners = get_corners(bboxes)

    corners = np.hstack((corners, bboxes[:,4:]))


    corners[:,:8] = rotate_box(corners[:,:8], angle, cx, cy, h, w)

    new_bbox = get_enclosing_box(corners)


    scale_factor_x = img.shape[1] / w

    scale_factor_y = img.shape[0] / h

    img = cv2.resize(img, (w,h))

    new_bbox[:,:4] /= [scale_factor_x, scale_factor_y, scale_factor_x, scale_factor_y] 

    bboxes  = new_bbox

    bboxes = clip_box(bboxes, [0,0,w, h], 0.25)

    return img, bboxes

你有偷懒不看__call__函数么！最后我们使用了clip_box将图片从nw,nh缩放回w,h大小，并且利用IoU把一些过小的框删去。

剪切

剪切也可以用affine transformation实现，效果大致如下图所示：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-tRtSQOi7-1619440149221)(https://blog.paperspace.com/content/images/2018/09/shear_box.png)]

大致就是变成平行四边形的效果，其相应的Affine Matrix如下图所示：
在这里插入图片描述

通过这个上方矩阵我们可以实现横向的剪切，所有的像素的横坐标将从x变为x + alpha*y，alpha就是相应的系数，我们先写初始化函数

class RandomShear(object):
    """Randomly shears an image in horizontal direction   
    
    
    Bounding boxes which have an area of less than 25% in the remaining in the 
    transformed image is dropped. The resolution is maintained, and the remaining
    area if any is filled by black color.
    
    Parameters
    ----------
    shear_factor: float or tuple(float)
        if **float**, the image is sheared horizontally by a factor drawn 
        randomly from a range (-`shear_factor`, `shear_factor`). If **tuple**,
        the `shear_factor` is drawn randomly from values specified by the 
        tuple
        
    Returns
    -------
    
    numpy.ndaaray
        Sheared image in the numpy format of shape `HxWxC`
    
    numpy.ndarray
        Tranformed bounding box co-ordinates of the format `n x 4` where n is 
        number of bounding boxes and 4 represents `x1,y1,x2,y2` of the box
        
    """

    def __init__(self, shear_factor = 0.2):
        self.shear_factor = shear_factor
        
        if type(self.shear_factor) == tuple:
            assert len(self.shear_factor) == 2, "Invalid range for scaling factor"   
        else:
            self.shear_factor = (-self.shear_factor, self.shear_factor)
        
        shear_factor = random.uniform(*self.shear_factor)

实现逻辑

既然我们只对横坐标x进行了变换，那么检测框其实也只用变换x为x = x + alpha*y就足够了，因此__call__函数写作：

def __call__(self, img, bboxes):

    shear_factor = random.uniform(*self.shear_factor)

    w,h = img.shape[1], img.shape[0]

    if shear_factor < 0:
        img, bboxes = HorizontalFlip()(img, bboxes)

    M = np.array([[1, abs(shear_factor), 0],[0,1,0]])

    nW =  img.shape[1] + abs(shear_factor*img.shape[0])

    bboxes[:,[0,2]] += ((bboxes[:,[1,3]]) * abs(shear_factor) ).astype(int) 


    img = cv2.warpAffine(img, M, (int(nW), img.shape[0]))

    if shear_factor < 0:
    	img, bboxes = HorizontalFlip()(img, bboxes)

    img = cv2.resize(img, (w,h))

    scale_factor_x = nW / w

    bboxes[:,:4] /= [scale_factor_x, 1, scale_factor_x, 1] 


    return img, bboxes

这里我们来思考一个问题，如果alpha为正，有可能左上角的点在横轴上超越右下角的点成为右上角点么？如果alpha为负，相反的情况会出现么？

答案是第一种不会，第二种情况则会出现。

那么alpha为负怎么处理呢？其中一个方法是我们生成右上角点和左下角点，然后进行变化再转换回左上和右下。

另一种更优雅的做法是：

把图像和检测框横向翻转
alpha加负号变正数做剪切
将图像和检测框在做一次横向翻转

上面的做法是否成立呢？我建议你自己拿笔算一算！

相应的代码为

if shear_factor < 0:
	img, bboxes = HorizontalFlip()(img, bboxes)

测试

现在我们完成了旋转和剪切的代码，让我们来看看效果怎么样。

from data_aug.bbox_utils import *
import matplotlib.pyplot as plt

rotate = RandomRotate(20)  
shear = RandomShear(0.7)

img, bboxes = rotate(img, bboxes)
img,bboxes = shear(img, bboxes)

plt.imshow(draw_rect(img, bboxes))

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Gona3TMF-1619440149237)(https://blog.paperspace.com/content/images/2018/09/rotate_shear.png)]