图像数据增强 (Data Augmentation in Computer Vision)

最新推荐文章于 2024-04-16 09:33:39 发布

幼稚园的扛把子～

最新推荐文章于 2024-04-16 09:33:39 发布

阅读量1.4k

点赞数

分类专栏：笔记

本文链接：https://blog.csdn.net/qq_38765642/article/details/110393386

版权

笔记专栏收录该内容

70 篇文章 5 订阅

订阅专栏

图像数据增强 (Data Augmentation in Computer Vision)------线下增强

笔者最近一直在整理图像数据增强技术，现在把数据增强部分的水平翻转、缩放、翻转进行整理。开始学习吧~

1. 简单理解

深层神经网络一般都需要大量的训练数据才能获得比较理想的结果。在数据量有限的情况下，可以通过数据增强（Data Augmentation）来增加训练样本的多样性，提高模型鲁棒性，避免过拟合。

在计算机视觉中，典型的数据增强方法有翻转（Flip），旋转（Rotat ），缩放（Scale），随机裁剪或补零（Random Crop or Pad），色彩抖动（Color jittering），加噪声（Noise）

笔者在跟进图像中的人体姿态检测的研究。因此本文的数据增强仅使用——翻转（Flip），旋转（Rotate ），以及缩放（Scale）这三个方法整理

2. 翻转（fillip）

翻转也叫镜像，分为水平翻转和垂直翻转
假如你输入的图像为彩色图像，cv2.imread()读取图像为HWC格式

# 水平翻转
flip_h =  img[:,::-1]
# 垂直翻转
flip_v =  img[::-1]
# 水平垂直同时翻转
flip_hv =  img[::-1, ::-1]

直接上代码简单直观：利用numpy的索引实现翻转

import cv2
import numpy as np
from matplotlib import pyplot as plt

img = cv2.imread('1.jpg')
height, width, channel = img.shape

# 水平翻转
flip_h =  img[:,::-1]   #::-1从最后一个元素到第一个元素复制一遍，即倒序。

# 垂直翻转
flip_v =  img[::-1]

plt.subplot(311)
plt.title('src')
plt.imshow(flip_h)

plt.subplot(312)
plt.title('Horizontally')
plt.imshow(flip_h)

plt.subplot(313)
plt.title('Vertically')
plt.imshow(flip_v)
plt.show()

结果：
在这里插入图片描述
其实也可以利用OpenCV的wrapAffine或者cv2.flip函数实现翻转进行图像的翻转，这里不展示

3. 缩放（Scale）

可以利用resize（）函数，也可以自定义缩放尺度
缩放只是调整图像大小.为此，OpenCV附带了一个函数cv.resize().
cv2.resize(src, dsize[, dst[, fx[, fy[, interpolation]]]])

参数解析

src 输入图片
dsize 输出图片的尺寸
dst 输出图片
fx x轴的缩放因子
fy y轴的缩放因子
interpolation 插值方式
INTER_NEAREST - 最近邻插值
INTER_LINEAR - 线性插值（默认）
INTER_AREA - 区域插值
INTER_CUBIC - 三次样条插值
INTER_LANCZOS4 - Lanczos插值

image.shape–([3, 256, 256])一个视频序列中的一帧图片，裁剪后输入网络为256*256
bbox.shape–([4,])人体检测框，用于裁剪
x.shape–([1,13]) 人体13个关键点的所有x坐标值
y.shape–([1,13])人体13个关键点的所有y坐标值
f_xy–缩放倍数

含有人体的图片进行缩放：

def scale(image, bbox, x, y, f_xy):
         (h, w, _) = image.shape
          h, w = int(h * f_xy), int(w * f_xy)
          image = resize(image, (h, w), preserve_range=True, anti_aliasing=True, mode='constant').astype(np.uint8)
  
          x = x * f_xy
          y = y * f_xy
          bbox = bbox * f_xy
  
          x = np.clip(x, 0, w)
          y = np.clip(y, 0, h)

          return image, bbox, x, y

利用wrapAffine实现缩放：
数学原理
对图像的伸缩变换的变换矩阵M为
在这里插入图片描述
其中，

fx：代表x轴的焦距(缩放因子)

fy：代表y轴的焦距(缩放因子)

则可以得出以下式子：
在这里插入图片描述
具体代码实现：

'''
使用仿射矩阵实现
'''
import numpy as np
import cv2

img = cv2.imread('1.jpg')

height,width,channel = img.shape

# x轴焦距 1.5倍
fx = 1.5
# y轴焦距 2倍
fy = 2

# 声明变换矩阵 向右平移10个像素， 向下平移30个像素
M = np.float32([[fx, 0, 0], [0, fy, 0]])

# 进行2D 仿射变换
resized = cv2.warpAffine(img, M, (int(width*fx), int(height*fy)))
cv2.imwrite('resize_raw.jpg', resized)

原图：
在这里插入图片描述

缩放结果为：
在这里插入图片描述

可以看到，w’=w1.5=24121.5=3618 h’=h2=3662=732

4. 旋转（rotate）

angle–旋转角度

4.1 数学原理

利用getRotationMatrix2D实现旋转
opencv中getRotationMatrix2D函数可以直接帮我们生成M旋转矩阵，而不需要我们在程序里计算三角函数：

getRotationMatrix2D(center, angle, scale)

参数解析

center 旋转中心点 (cx, cy) 你可以随意指定
angle 旋转的角度单位是角度逆时针方向为正方向，角度为正值代表逆时针
scale 缩放倍数. 值等于1.0代表尺寸不变

该函数返回的就是仿射变换矩阵M

import cv2
import numpy as np

# 获取旋转矩阵
rotateMatrix = cv2.getRotationMatrix2D((100, 200), 90, 1.0)       旋转中心点，旋转角度，旋转缩放倍数

#设置numpy矩阵的打印格式
np.set_printoptions(precision=2,suppress=True)
print(rotateMatrix)

结果为：

[[   0.    1. -100.]
 [  -1.    0.  300.]]

4.2 利用wrapAffine实现旋转

围绕原点进行旋转
在这里插入图片描述
数学原理推导

由此我们得出：

所以旋转对应的变换矩阵为：

注意：

注意，这里我们进行公式推导的时候，参照的原点是在左下角，而在OpenCV中图像的原点在图像的左上角，所以我们在代码里面对theta角度θ取反。

围绕任意点进行旋转
数学原理推导
那么如何围绕任意点进行旋转呢？

可以先把当前的旋转中心点平移到原点处，在原点处旋转后再平移回去。
假定旋转中心为 (cx,cy)：
在这里插入图片描述
其中

所以：

绕着任意角度旋转的源码：

# -*- coding: utf-8 -*-
'''
围绕画面中的任意一点旋转
'''
import numpy as np
import cv2
from math import cos,sin,radians
from matplotlib import pyplot as plt

img = cv2.imread('lena1.jpg')

height, width, channel = img.shape

theta = 45
# 获取旋转矩阵
def getRotationMatrix2D(theta, cx=0, cy=0):
    # radians（）把角度值转换为弧度值
    # 因为图像的左上角是原点 需要×-1，因为推导公式是基于左下角的
    theta = radians(-1 * theta)

    # 获取旋转矩阵
    M = np.float32([
        [cos(theta), -sin(theta), (1-cos(theta))*cx + sin(theta)*cy],
        [sin(theta), cos(theta), -sin(theta)*cx + (1-cos(theta))*cy]])
    return M

# 求得图片中心点， 作为旋转的轴心
cx = int(width / 2)
cy = int(height / 2)

# 进行2D 仿射变换
# 围绕图片中心点 逆时针旋转30度
M = getRotationMatrix2D(30, cx=cx, cy=cy)
rotated_30 = cv2.warpAffine(img, M, (width, height))

# 围绕图片中心点 逆时针旋转45度
M = getRotationMatrix2D(45, cx=cx, cy=cy)
rotated_45 = cv2.warpAffine(img, M, (width, height))

# 围绕围绕图片中心点 逆时针旋转60度
M = getRotationMatrix2D(60, cx=cx, cy=cy)
rotated_60 = cv2.warpAffine(img, M, (width, height))

plt.subplot(221)
plt.title("Src Image")
plt.imshow(img[:, :, ::-1])   # BGR转换为RGB格式

plt.subplot(222)
plt.title("Rotated 30 Degree")
plt.imshow(rotated_30[:,:,::-1])

plt.subplot(223)
plt.title("Rotated 45 Degree")
plt.imshow(rotated_45[:,:,::-1])

plt.subplot(224)
plt.title("Rotated 60 Degree")
plt.imshow(rotated_60[:,:,::-1])

plt.show()

在这里插入图片描述

含有人体关键点的图片的角度旋转
图像要旋转，边界框box也要旋转，关键点也要旋转

def rotate(image, bbox, x, y, angle):
        # image - -(256, 256, 3)                  #图像大小
        # bbox - -(4,)                                  #box的大小（x，y，w， h）
        # x - -[126 129 124 117 107  99 128 107 108 105 137 155 122  99]       13个关键点的坐标
        # y - -[209 176 136 123 178 225  65  47  46  24  44  64  49  54]
        # angle - --8.165648811999333                                                    旋转角度，弧度制
        # center of image [128,128]                                                  
        o_x, o_y = (np.array(image.shape[:2][::-1]) - 1) / 2.             #获取图像的中心点
        width,height = image.shape[0],image.shape[1]                   #图像的宽与高
        x1 = x                                                                                  
        y1 = height - y                                                              
        o_x = o_x
        o_y = height - o_y
        image = rotate(image, angle, preserve_range=True).astype(np.uint8)
        r_x, r_y = o_x, o_y
        angle_rad = (np.pi * angle) /180.0                                                                   #转化为弧度制
        x = r_x + np.cos(angle_rad) * (x1 - o_x) - np.sin(angle_rad) * (y1 - o_y)
        y = r_y + np.sin(angle_rad) * (x1 - o_x) + np.cos(angle_rad) * (y1 - o_y)
        x = x
        y = height - y
        bbox[0] = r_x + np.cos(angle_rad) * (bbox[0] - o_x) + np.sin(angle_rad) * (bbox[1] - o_y)
        bbox[1] = r_y + -np.sin(angle_rad) * (bbox[0] - o_x) + np.cos(angle_rad) * (bbox[1] - o_y)
        bbox[2] = r_x + np.cos(angle_rad) * (bbox[2] - o_x) + np.sin(angle_rad) * (bbox[3] - o_y)
        bbox[3] = r_y + -np.sin(angle_rad) * (bbox[2] - o_x) + np.cos(angle_rad) * (bbox[3] - o_y)
        return image, bbox, x.astype(np.int), y.astype(np.int)

5. 仿射变换

实际上，之前讲过的缩放、旋转以及翻转它们本质上都属于图像的仿射变换，现在我们讨论一个函数，算是对仿射变换的一个总结。
图像的仿射变换涉及到图像的形状位置角度的变化，是深度学习预处理中常到的功能，在此简单回顾一下。仿射变换具体到图像中的应用，主要是对图像的缩放scale，旋转rotate，剪切shear，翻转flip和平移translate的组合。在OpenCV中，仿射变换的矩阵是一个2×3的矩阵，其中左边的2×2子矩阵是线性变换矩阵，右边的2×1的两项是平移项：

图像上的仿射变换, 其实就是图片中的一个像素点，通过某种变换，移动到另外一个地方。

从数学上来讲，就是一个向量空间进行一次线形变换并加上平移向量，从而变换到另外一个向量空间的过程。

向量空间m : m=(x,y)

向量空间n : n=(x′,y′)

向量空间从m到n的变换 n=A∗m+b

整理得到:
在这里插入图片描述
将A跟b 组合在一起就组成了仿射矩阵 M。它的维度是2∗3

使用不同的矩阵M，就获得了不同的2D仿射变换效果。

参考：
python之详细图像仿射变换讲解（图像平移、旋转、缩放、翻转），一文就够了，赶紧码住
 图像数据增强 (Data Augmentation in Computer Vision)

幼稚园的扛把子～

关注

0
点赞
踩
7

收藏

觉得还不错? 一键收藏
1
评论
图像数据增强 (Data Augmentation in Computer Vision)

图像数据增强 (Data Augmentation in Computer Vision)------线下增强笔者最近一直在整理图像数据增强技术，现在把数据增强部分的水平翻转、缩放、翻转进行整理。开始学习吧~1. 简单理解深层神经网络一般都需要大量的训练数据才能获得比较理想的结果。在数据量有限的情况下，可以通过数据增强（Data Augmentation）来增加训练样本的多样性，提高模型鲁棒性，避免过拟合。在计算机视觉中，典型的数据增强方法有翻转（Flip），旋转（Rotat ），缩放（Scale
复制链接

扫一扫