keras的图像预处理全攻略（一）——基本的图像变换(Image Transform)方法

最新推荐文章于 2024-07-03 11:51:18 发布

LoveMIss-Y

最新推荐文章于 2024-07-03 11:51:18 发布

阅读量1.6w

点赞数 21

分类专栏： keras 深度学习文章标签： keras 深度学习图像预处理 image preprocessing 仿射变换、透视变换

本文链接：https://blog.csdn.net/qq_27825451/article/details/90037062

版权

深度学习同时被 2 个专栏收录

85 篇文章 264 订阅

订阅专栏

keras

12 篇文章 36 订阅

订阅专栏

keras中主要提供了主要的四个模块：

（1）......\Lib\site-packages\keras\preprocessing\image.py (这个不完全，完全的详细的参考下面的序号2)

（2）......\Lib\site-packages\keras_preprocessing\image.py

（3）......\Lib\site-packages\keras\applications\imagenet_utils.py (这个不详细，里面就两个方法，详细的参考下面的序号4)

（4）......\Lib\site-packages\keras_applications\imagenet_utils.py（这其实是一个专门用来处理ImageNet数据的相关工具，里面包含了5个方法，这里就先不去一个一个讨论了）

其实就是原来老一些版本的keras中都是在keras里面的两个文件中进行处理的，新的keras版本逐渐分别迁移到了keras_preprocessing和keras_applications这两个包里面了。

鉴于篇幅较多，里面涉及的函数使用较为繁杂，本次将通过系列文章来阐述说明，本篇为系列文章第一篇，介绍图像预处理相关的方法与类，本文介绍......\Lib\site-packages\keras_preprocessing\image.py 这一个文件里面的内容

一、图像的读、写、存等基本操作

1、list_pictures()

原型为：list_pictures(directory, ext='jpg|jpeg|bmp|png|ppm')

此方法会将一个文件下面的所有图片名称返回，以一个列表的形式，第一个参数为目录名称，第二个为图片的相关拓展名，如下：

from keras_preprocessing import image

img_directory="data"
img_list=image.list_pictures(img_directory)
print(img_list)
'''运行结果为：
['data\\2001.png', 'data\\2002.jpg', 'data\\2003.jpg', 'data\\2004.jpg', 'data\\2005.png']
'''

2、load_img()

函数原型为：

def load_img(path, grayscale=False, color_mode='rgb', target_size=None,
             interpolation='nearest'):
    """将图片加载成PIL格式的对象.

    #参数列表：
        path: 图片的路径名称，包涵图片名称.
        color_mode:  颜色模式，"grayscale", "rbg", "rgba"三种格式之一默认是"rgb"模式.
        target_size: 默认为None，即保持图像的原始大小，也可以是一个(img_height, img_width)形 
                     式的元组类型，表示新的我要加载进来之后图像的大小.
        interpolation: 当target_size不是原始大小的时候，会对图片进行重采样，支持下面几个参数
                       "nearest", "bilinear", 和 "bicubic".如果PIL 版本在 3.4.0或者是更新的 
                       时候,还支持 "box" and "hamming" 这两个参数. 默认情况下使用“nearest”

    # 返回值
        返回PIL.Image类的一个实例，所以具备Image类相关的方法和属性.
'''
        PIL是老版本的一个python 图像标准库，主要支持python2.x版本，但是在python3中，
已经更名为pillow库，但是在site-package文件夹之下依然还是使用PIL的文件夹名称，
可以通过PIL.__version__查看安装的版本，我安装的是6.0.0版本。
'''

img_01=image.load_img("data/2001.png")  #img_01 是一个PIL.Image类的实例对象
print(img_01.format)  # JPEG
print(img_01.mode)    #RGB
print(img_01.size)    #(500,537)
print(img_01.info)    #以字典形式返回图片信息

3、save_img()

def save_img(path,
             x,
             data_format='channels_last',
             file_format=None,
             scale=True,
             **kwargs):
    """将一个numpy数组保存为图像.

    # 参数
        path: Path or file object.
        x: Numpy array.
        data_format: Image data format,
                either "channels_first" or "channels_last".
        file_format: 图像文件的格式. 如果是默认值None, 图片格式就是图片拓展名默认的格式.
            如果想要重新指定格式，则需要使用该参数；
        scale: 是否需要将像素的值缩放到 `[0, 255]`之间.默认是True,即进行缩放
        **kwargs: Additional keyword arguments passed to `PIL.Image.save()`.
    """

下面关键看一下使用scale参数与不使用它的区别，自己定义一个numpy数组，定义一张4x4大小的图片，如下：

#自己定义的一个（4,4,3）的numpy数组
img_num=np.array([[[10,30,60],[100,120,150],[77,99,130],[200,30,59]],
                  [[40,10,160],[150,120,150],[77,99,130],[100,30,59]],
                  [[20,90,210],[100,220,150],[37,199,230],[210,90,99]],
                  [[100,40,40],[200,50,20],[157,9,140],[50,230,119]]])

image.save_img("data/2006.jpg",img_num,scale=True)  #scale默认为true
image.save_img("data/2007.jpg",img_num,scale=False) #scale设置为false

结果如下：

左边是经过缩放到【0,255】的，右边是没有缩放的，可以看出，还是有区别的，下面我们来读取图像进行验证。

img_06=image.load_img("data/2006.jpg")  #img_01 是一个PIL.Image类的实例对象
img_07=image.load_img("data/2007.jpg")
a_06=image.img_to_array(img_06)
a_07=image.img_to_array(img_07)

print(a_06)
print('==============================================')
print(a_07)

4、img_to_array()

def img_to_array(img, data_format='channels_last', dtype='float32'):
    """将PIL.Image图像对象转化为numpy数组

    # 参数
        img: PIL.Image 类的实例
        data_format: Image data format,
            either "channels_first" or "channels_last".
        dtype: Dtype to use for the returned array.

    # 返回值
        A 3D Numpy array.

5、array_to_img()

def array_to_img(x, data_format='channels_last', scale=True, dtype='float32'):
    """将一个3-D的numpy数组转化为一个 PIL.Image对象

    # Arguments
        x: Input Numpy array.
        data_format: Image data format.
            either "channels_first" or "channels_last".
        scale: Whether to rescale image values
            to be within `[0, 255]`.
        dtype: Dtype to use.

    # 返回值
        A PIL Image instance（一个PIL.Image的实例对象）.
    """

二、图像的裁剪、旋转等预处理操作

1、随机旋转random_rotation()

函数原型如下：

def random_rotation(x, rg, row_axis=1, col_axis=2, channel_axis=0,
                    fill_mode='nearest', cval=0.):
    """随机旋转一个图像.

    # Arguments
        x: 输入数组，必须是3D的.
        rg: 旋转角度，比如90,120等等.
        row_axis: 
        col_axis: 
        channel_axis: 
            如果是（channel，row，col）的格式，则分别取值1,2,0，这也是默认情况）
            如果是（row，col，channel）的格式，则分别取值0,1,2，这也是默认情况）
  
        fill_mode: (one of `{'constant', 'nearest', 'reflect', 'wrap'}`).
        cval: 边界之外的像素点的取值
    # Returns
        旋转之后的3D张量tensor
'''
需要注意的是，既然称之为“随机旋转”，所以并不是旋转一定的角度，这个角度是随机的，具体是多少，在下面范围【-rg,rg】这个区间内随机取值，如下：
'''

from keras_preprocessing import image
import numpy as np

img_02=image.load_img("data/2002.jpg")  #读取图片，转化成PIL.Image对象
num_02=image.img_to_array(img_02)       #将PIL.Image对象转化成numpy数组
num_02_=image.random_rotation(num_02,90,row_axis=0,col_axis=1,channel_axis=2)  #随机旋转【-90,90】
image.save_img("data/2002_.jpg",num_02_) #保存旋转之后的图片，在这里连续运行两次，得到的结果是不一样的

第一幅是原图，后面两次为随机旋转之后的图片。

2、随即平移random_shift()

def random_shift(x, wrg, hrg, row_axis=1, col_axis=2, channel_axis=0,
                 fill_mode='nearest', cval=0.):
    """对一张图片进行随机的空间平移.

    # Arguments
        x: 3D数组.
        wrg: 宽度方向的移动范围.注意这里的取值虽然没有范围，但是一般约束在【0,1】之间，因为他指的是平移的像素占宽度的比值，如果太大，图像完全移动到了看不见的位置，下面的hrg也是一样的
        hrg: 高度方向的移动范围.
       其他参数同上面是一样的
    # Returns
        Shifted Numpy image tensor.
    """

img_03=image.load_img("data/2003.jpg")  #读取图片，转化成PIL.Image对象
num_03=image.img_to_array(img_03)       #将PIL.Image对象转化成numpy数组
num_03_=image.random_shift(num_03,0.2,0.3,row_axis=0,col_axis=1,channel_axis=2)  #随机平移
image.save_img("data/2003_.jpg",num_03_) #保存平移之后的图片，在这里连续运行两次，得到的结果是不一样的

左边是原始图像，右边是经过评议之后的图像，可以看出往左下角进行了一定的平移。

3、随机错切random_shear()

def random_shear(x, intensity, row_axis=1, col_axis=2, channel_axis=0,
                 fill_mode='nearest', cval=0.):
    """随机剪切.

    # Arguments
        x: Input tensor. Must be 3D.
        intensity: Transformation intensity in degrees.
      
    # Returns
        Sheared Numpy image tensor.
    """

img_04=image.load_img("data/2004.jpg")  #读取图片，转化成PIL.Image对象
num_04=image.img_to_array(img_04)       #将PIL.Image对象转化成numpy数组
num_04_=image.random_shear(num_04,60,row_axis=0,col_axis=1,channel_axis=2)  #随机剪切
image.save_img("data/2004_.jpg",num_04_) #保存变化之后的图片，

4、随机缩放random_zoom()

def random_zoom(x, zoom_range, row_axis=1, col_axis=2, channel_axis=0,
                fill_mode='nearest', cval=0.):
    """随机缩放.

    # Arguments
        x: Input tensor. Must be 3D.
        zoom_range: Tuple of floats; zoom range for width and height.即（width，height）
        
    # Returns
        Zoomed Numpy image tensor.
    """

img_05=image.load_img("data/2005.png")  #读取图片，转化成PIL.Image对象
num_05=image.img_to_array(img_05)       #将PIL.Image对象转化成numpy数组
num_05_=image.random_zoom(num_05,(0.5,0.5),row_axis=0,col_axis=1,channel_axis=2)  
image.save_img("data/2005_.jpg",num_05_) #保存旋转之后的图片

5、transform_matrix_offset_center(matrix, x, y)方法

根据一个“仿射矩阵”来计算变换之后的图像

这里面涉及到仿射变换的数学思想，这里就不再展开讨论了。

6、仿射变换apply_affine_transform（）

def apply_affine_transform(x, theta=0, tx=0, ty=0, shear=0, zx=1, zy=1,
                           row_axis=0, col_axis=1, channel_axis=2,
                           fill_mode='nearest', cval=0.):
    """Applies an affine transformation specified by the parameters given.

    # Arguments
        x: 2D numpy array, single image.
        theta: Rotation angle in degrees.
        tx: Width shift.
        ty: Heigh shift.
        shear: Shear angle in degrees.
        zx: Zoom in x direction.
        zy: Zoom in y direction
        row_axis: Index of axis for rows in the input image.
        col_axis: Index of axis for columns in the input image.
        channel_axis: Index of axis for channels in the input image.
        fill_mode: Points outside the boundaries of the input
            are filled according to the given mode
            (one of `{'constant', 'nearest', 'reflect', 'wrap'}`).
        cval: Value used for points outside the boundaries
            of the input if `mode='constant'`.

    # Returns
        The transformed version of the input.
    """

“仿射变换”是通过一系列的原子变换综合实现的，平移(Translation)、缩放(Scale)、旋转(Rotation)、翻转(Flip)、错切(Shear)等.

补充点：

图像的变换模型是指根据待匹配图像与背景图像之间几何畸变的情况，所选择的能最佳拟合两幅图像之间变化的几何变换模型。可采用的变换模型有如下几种:刚性变换、仿射变换、透视变换和非线形变换等，如下图：

我们常见的图像变换有图像的几何变换——拉伸、收缩、扭曲、旋转（stretch，shrink，distortion，rotation），拉伸、收缩、扭曲、旋转是图像的几何变换，在三维视觉技术中大量应用到这些变换，又分为仿射变换和透视变换。

仿射变换（Affine transformation）

透视变换（Perspective transformation）

7、其他方法

由于keras中本身提供的图像预处理的方法有限，不是特别多，所以后面还剩几个类似的方法就不再一一给出实例了，在这里统一归纳如下：

def apply_channel_shift(x, intensity, channel_axis=0):

def random_channel_shift(x, intensity_range, channel_axis=0):

def apply_brightness_shift(x, brightness):
  
def random_brightness(x, brightness_range):

def flip_axis(x, axis):

总结：上面的这些方法全部都在：keras_preprocessing\image.py 这个文件里面哦，只需要

import keras.preprocessing import image 即可。