【图普科技】边界框的数据增强(三) ——旋转和裁剪

最新推荐文章于 2024-03-28 15:44:49 发布

图普科技

最新推荐文章于 2024-03-28 15:44:49 发布

阅读量1.4k

点赞数

分类专栏：图像识别文章标签：数据增强目标增强图像识别

图像识别专栏收录该内容

4 篇文章 0 订阅

订阅专栏

本文由【图普科技】编译，点击www.tuputech.com，探索图像识别技术的最新应用。

这是基于目标检测任务调整图像增强技术系列文章的第二部分。在这一部分中,我们将介绍如何使用OpenCV的仿射变换特性来实现旋转和裁剪图像以及边界框。

GitHub Repo

本文的所有内容以及所有的增强库都可以在下面的 GitHub Repo中找到。

https://github.com/Paperspace/DataAugmentationForObjectDetection

文档

可以通过在浏览器中打开 docs/build/html/index.html或在此链接中找到此项目的文档。

本系列包括4个部分。

第1部分：基本设计和水平翻转

第2部分：缩放和平移

第3部分：旋转和裁剪

第4部分：所有技术整合

本部分默认你已经阅读了前面两部分的文章，因为我们将使用前面文章中介绍的功能。

旋转

旋转后图像效果与下图相似

旋转是最难以处理的数据增强技术之一。很快你就会知道原因。

在我们弄清楚代码之前，我想在这里定义一些术语。

仿射变换：一种图像变换技术，使得图像中的平行线在变换后仍保持平行。缩放、平移、和旋转都是仿射变换的实例。

在计算机图形学中，我们也使用变换矩阵的概念，这是执行仿射变换的一种非常方便的工具。

我们不会对变换矩阵进行详细讨论，因为这会使我们偏离我们的任务。所以，我在文章的末尾提供了一个链接，你可以在这里读到更多关于它的信息。与此同时，可以将变换矩阵看作一个矩阵，可以通过乘以一个点的坐标来产生变换后的点。

变换矩阵是一个2×3矩阵，乘以[ x y 1 ]，其中( x，y )是点的坐标。设置1的想法是为了方便裁剪，你可以在下面的链接中读到更多关于它的信息。用3×1矩阵乘以2×3矩阵，我们会得到一个包含新点坐标的2×1矩阵。

变换矩阵也可以用于获得围绕图像中心旋转后的点的坐标。将一个点旋转θ角度的变换矩阵如下所示：

图像来源: https://cristianpb.github.io/blog/image-rotation-opencv. Scale is 1.

幸运的是，我们不必对它进行编码。OpenCV已经提供了内置的cv2.warpAffine函数来实现。因此，有了必要的理论知识后，我们就可以开始了。

我们从定义__init__ 函数开始。


def __init__(self, angle = 10):

    self.angle = angle

   

    if type(self.angle) == tuple:

        assert len(self.angle) == 2, "Invalid range"  

    else:

        self.angle = (-self.angle, self.angle)

旋转图像

现在，我们要做的第一件事是围绕中心旋转一个角度θ。因此，我们需要使用变换矩阵。为此，在本文中我们将使用OpenCV的 getRotationMatrix2D函数。

(h, w) = image.shape[:2]

(cX, cY) = (w // 2, h // 2)

M = cv2.getRotationMatrix2D((cX, cY), angle, 1.0)

现在，我们可以简单地通过使用warpAffine函数来获得旋转图像。

image = cv2.warpAffine(image, M, (w, h))

函数的第三个参数是(w,h)，这是因为我们希望保持原始分辨率。但是你想象一下，如果旋转后的图像尺寸不同，一旦它们超过了原始尺寸，OpenCV会简单地裁剪它们，如下例所示。

利用OpenCV进行旋转的副作用

如此一来，我们在这里就丢失了一些信息。那么，我们如何解决这一问题呢？幸运的是，OpenCV为我们提供了该函数的一个参数，可以帮助我们决定最终图像的尺寸。如果我们能把它的值控制在(w,h)和刚好容纳旋转后图像的尺寸之间，我们就成功了。

这一灵感来自于Adrian Rosebrock在其博客PyImageSearch上发表的一篇文章。

现在的问题是我们如何找到这个新的尺寸。三角学中的某些知识可以帮我们完成这项工作，如下图所示：

图像来源: https://cristianpb.github.io/blog/image-rotation-opencv

其中

Nw=h∗sin(θ)+w∗cos(θ)

Nh=h∗cos(θ)+w∗sin(θ)

现在，我们计算新的宽度和高度。注意，我们可以从变换矩阵中得到 sin(θ) 和 cos(θ)的值。

cos = np.abs(M[0, 0])

sin = np.abs(M[0, 1])



# compute the new bounding dimensions of the image

nW = int((h * sin) + (w * cos))

nH = int((h * cos) + (w * sin))

还是有部分信息丢失了。但有一点可以肯定，图像的中心不会移动，因为它是旋转轴。然而，由于图像的宽度和高度现在分别是nW, nH，中心一定位于nW/2, nH/2。为了确保这种情况发生，我们必须将图像平移nW/2 - cX, nH/2 - cH，其中cX, cH是之前的中心。

# adjust the rotation matrix to take into account translation

M[0, 2] += (nW / 2) - cX

M[1, 2] += (nH / 2) - cY

总而言之，我们将能实现图像旋转的代码放在函数rotate_im中，并将其放在bbox_util.py中

def rotate_im(image, angle):

    """Rotate the image.

   

    Rotate the image such that the rotated image is enclosed inside the tightest

    rectangle. The area not occupied by the pixels of the original image is colored

    black.

   

    Parameters

    ----------

   

    image : numpy.ndarray

        numpy image

   

    angle : float

        angle by which the image is to be rotated

   

    Returns

    -------

   

    numpy.ndarray

        Rotated Image

   

    """

    # grab the dimensions of the image and then determine the

    # centre

    (h, w) = image.shape[:2]

    (cX, cY) = (w // 2, h // 2)



    # grab the rotation matrix (applying the negative of the

    # angle to rotate clockwise), then grab the sine and cosine

    # (i.e., the rotation components of the matrix)

    M = cv2.getRotationMatrix2D((cX, cY), angle, 1.0)

    cos = np.abs(M[0, 0])

    sin = np.abs(M[0, 1])



    # compute the new bounding dimensions of the image

    nW = int((h * sin) + (w * cos))

    nH = int((h * cos) + (w * sin))



    # adjust the rotation matrix to take into account translation

    M[0, 2] += (nW / 2) - cX

    M[1, 2] += (nH / 2) - cY



    # perform the actual rotation and return the image

    image = cv2.warpAffine(image, M, (nW, nH))



#    image = cv2.resize(image, (w,h))

    return image

旋转边界框

这是这次数据增强中最具挑战性的部分。我们首先需要旋转边界框，形成一个倾斜的矩形框。然后，我们必须找到平行于包含倾斜矩形框的图像的每条边的最紧凑的矩形。

最终边界框（仅显示了一个图像）

现在，为了得到旋转后的边界框，如中间图像所示，我们需要有一个框的所有四个角的所有坐标。

实际上，我们可以仅使用两个角来得到最终的边界框，但如此一来就需要应用更多三角学知识来计算最终边界框的尺寸(如上面最右侧的图像，黑色)。如果知道中间图像边界框的四个角，这样计算起来更容易一些，只不过代码变得更加复杂而已。

因此，首先，我们在文件bbox_utils.py中编写get_corners函数，以获得所有4个角。

def get_corners(bboxes):

   

    """Get corners of bounding boxes

   

    Parameters

    ----------

   

    bboxes: numpy.ndarray

        Numpy array containing bounding boxes of shape `N X 4` where N is the

        number of bounding boxes and the bounding boxes are represented in the

        format `x1 y1 x2 y2`

   

    returns

    -------

   

    numpy.ndarray

        Numpy array of shape `N x 8` containing N bounding boxes each described by their

        corner co-ordinates `x1 y1 x2 y2 x3 y3 x4 y4`     

       

    """

    width = (bboxes[:,2] - bboxes[:,0]).reshape(-1,1)

    height = (bboxes[:,3] - bboxes[:,1]).reshape(-1,1)

   

    x1 = bboxes[:,0].reshape(-1,1)

    y1 = bboxes[:,1].reshape(-1,1)

   

    x2 = x1 + width

    y2 = y1

   

    x3 = x1

    y3 = y1 + height

   

    x4 = bboxes[:,2].reshape(-1,1)

    y4 = bboxes[:,3].reshape(-1,1)

   

    corners = np.hstack((x1,y1,x2,y2,x3,y3,x4,y4))

   

    return corners

这一步完成后，现在我们用8个坐标x1、y1、x2、y2、x3、y3、x4、y4来描述每个边界框。我们现在需定义文件bbox_utils.py中的rotate_box函数，该函数通过给我们提供变换点来旋转边界框。为此，我们需使用变换矩阵。

def rotate_box(corners,angle,  cx, cy, h, w):

   

    """Rotate the bounding box.

   

   

    Parameters

    ----------

   

    corners : numpy.ndarray

        Numpy array of shape `N x 8` containing N bounding boxes each described by their

        corner co-ordinates `x1 y1 x2 y2 x3 y3 x4 y4`

   

    angle : float

        angle by which the image is to be rotated

       

    cx : int

        x coordinate of the center of image (about which the box will be rotated)

       

    cy : int

        y coordinate of the center of image (about which the box will be rotated)

       

    h : int

        height of the image

       

    w : int

        width of the image

   

    Returns

    -------

   

    numpy.ndarray

        Numpy array of shape `N x 8` containing N rotated bounding boxes each described by their

        corner co-ordinates `x1 y1 x2 y2 x3 y3 x4 y4`

    """



    corners = corners.reshape(-1,2)

    corners = np.hstack((corners, np.ones((corners.shape[0],1), dtype = type(corners[0][0]))))

   

    M = cv2.getRotationMatrix2D((cx, cy), angle, 1.0)

   

   

    cos = np.abs(M[0, 0])

    sin = np.abs(M[0, 1])

   

    nW = int((h * sin) + (w * cos))

    nH = int((h * cos) + (w * sin))

    # adjust the rotation matrix to take into account translation

    M[0, 2] += (nW / 2) - cx

    M[1, 2] += (nH / 2) - cy

    # Prepare the vector to be transformed

    calculated = np.dot(M,corners.T).T

   

    calculated = calculated.reshape(-1,8)

   

    return calculated

现在，最后一件事是定义一个函数get_enclosing_box，该函数可以让我们得到了之前所讨论的最紧凑的边界框。

def get_enclosing_box(corners):

    """Get an enclosing box for ratated corners of a bounding box

   

    Parameters

    ----------

   

    corners : numpy.ndarray

        Numpy array of shape `N x 8` containing N bounding boxes each described by their

        corner co-ordinates `x1 y1 x2 y2 x3 y3 x4 y4` 

   

    Returns

    -------

   

    numpy.ndarray

        Numpy array containing enclosing bounding boxes of shape `N X 4` where N is the

        number of bounding boxes and the bounding boxes are represented in the

        format `x1 y1 x2 y2`

       

    """

    x_ = corners[:,[0,2,4,6]]

    y_ = corners[:,[1,3,5,7]]

   

    xmin = np.min(x_,1).reshape(-1,1)

    ymin = np.min(y_,1).reshape(-1,1)

    xmax = np.max(x_,1).reshape(-1,1)

    ymax = np.max(y_,1).reshape(-1,1)

   

    final = np.hstack((xmin, ymin, xmax, ymax,corners[:,8:]))

   

    return final

其中每个边界框由4个坐标或者说两个角确定。使用所有这些辅助函数，我们最终成功编写了__call__函数。

def __call__(self, img, bboxes):



    angle = random.uniform(*self.angle)



    w,h = img.shape[1], img.shape[0]

    cx, cy = w//2, h//2



    img = rotate_im(img, angle)



    corners = get_corners(bboxes)



    corners = np.hstack((corners, bboxes[:,4:]))





    corners[:,:8] = rotate_box(corners[:,:8], angle, cx, cy, h, w)



    new_bbox = get_enclosing_box(corners)





    scale_factor_x = img.shape[1] / w



    scale_factor_y = img.shape[0] / h



    img = cv2.resize(img, (w,h))



    new_bbox[:,:4] /= [scale_factor_x, scale_factor_y, scale_factor_x, scale_factor_y]



    bboxes  = new_bbox



    bboxes = clip_box(bboxes, [0,0,w, h], 0.25)



    return img, bboxes

请注意，在函数结束时，我们会重新缩放图像和边界框，这样我们的最终尺寸就是w,h，而不是nW, nH。这样做只是为了保持图像的尺寸一致。我们还可以裁剪那些转换后从图像中消失的边界框。

裁剪

裁剪是另一种边界框变换技术，可以借助于变换矩阵来完成。裁剪后的图像效果看起来如下所示。

在此种方式中，我们将矩形图像转换成类似平行四边形的图像。裁剪中使用的变换矩阵如下所示：

以上是水平裁剪的一个例子。在这种情况下，坐标为x，y的像素被移动到x + alpha*y, y的位置。alpha 是裁剪因子。因此，我们将__init__函数定义为：

class RandomShear(object):

    """Randomly shears an image in horizontal direction  

   

   

    Bounding boxes which have an area of less than 25% in the remaining in the

    transformed image is dropped. The resolution is maintained, and the remaining

    area if any is filled by black color.

   

    Parameters

    ----------

    shear_factor: float or tuple(float)

        if **float**, the image is sheared horizontally by a factor drawn

        randomly from a range (-`shear_factor`, `shear_factor`). If **tuple**,

        the `shear_factor` is drawn randomly from values specified by the

        tuple

       

    Returns

    -------

   

    numpy.ndaaray

        Sheared image in the numpy format of shape `HxWxC`

   

    numpy.ndarray

        Tranformed bounding box co-ordinates of the format `n x 4` where n is

        number of bounding boxes and 4 represents `x1,y1,x2,y2` of the box

       

    """



    def __init__(self, shear_factor = 0.2):

        self.shear_factor = shear_factor

       

        if type(self.shear_factor) == tuple:

            assert len(self.shear_factor) == 2, "Invalid range for scaling factor"  

        else:

            self.shear_factor = (-self.shear_factor, self.shear_factor)

       

        shear_factor = random.uniform(*self.shear_factor)

增强逻辑

由于我们在此仅涉及水平裁剪，所以我们只需要根据等式x = x + alpha*y来改变边界框的角的x坐标。我们调用的函数如下所示。

def __call__(self, img, bboxes):



    shear_factor = random.uniform(*self.shear_factor)



    w,h = img.shape[1], img.shape[0]



    if shear_factor < 0:

        img, bboxes = HorizontalFlip()(img, bboxes)



    M = np.array([[1, abs(shear_factor), 0],[0,1,0]])



    nW =  img.shape[1] + abs(shear_factor*img.shape[0])



    bboxes[:,[0,2]] += ((bboxes[:,[1,3]]) * abs(shear_factor) ).astype(int)





    img = cv2.warpAffine(img, M, (int(nW), img.shape[0]))



    if shear_factor < 0:

           img, bboxes = HorizontalFlip()(img, bboxes)



    img = cv2.resize(img, (w,h))



    scale_factor_x = nW / w



    bboxes[:,:4] /= [scale_factor_x, 1, scale_factor_x, 1]





    return img, bboxes

一个有趣的例子是反向裁剪。反向裁剪需要更多的代码才能实现。如果我们用正向裁剪的方法来进行反向裁剪，我们得到的边界框一定会更小。这是因为，为了使方程式有效，边界框的坐标必须是x1，y1，x2，y2的格式，其中x2是我们裁剪方向上更远的角的坐标。

这在正向裁剪的情况下有效，因为在我们的默认设置中，x2是边界框右下角的x坐标，而x1是左上角。裁剪方向是正的，或者说是从左到右。

当我们使用反向裁剪时，裁剪方向是从右向左，而x2的反方向不比x1更远。解决这一问题的一种方法可能是获得另一组角。应用裁剪变换技术，然后转向另一组角。

我们可以这么做，但有更好的方法。以下是如何用裁剪因子 -alpha进行反向裁剪。

水平翻转图像和边界框。
应用裁剪因子为alpha的正向裁剪变换
再次水平翻转图像和边界框.

我希望你拿出一张纸和一支笔来验证上述方法为什么有效！你将会看到上述函数中出现两个处理反向裁剪的代码行。

if shear_factor < 0:

               img, bboxes = HorizontalFlip()(img, bboxes)

测试

现在，我们已经完成了旋转和裁剪增强，是时候测试它们了.

from data_aug.bbox_utils import *

import matplotlib.pyplot as plt



rotate = RandomRotate(20) 

shear = RandomShear(0.7)



img, bboxes = rotate(img, bboxes)

img,bboxes = shear(img, bboxes)



plt.imshow(draw_rect(img, bboxes))