【Tensorflow】tf.image的部分方法详解+基于随机子空间方法（RSM）的图像增强-CSDN博客

本文链接：https://blog.csdn.net/jin739738709/article/details/113600962

基于随机子空间方法（RSM）的图像增强

在数据图像处理中，Random Erasing是指随机选择图像中的一个或者多个区域进行擦除的操作，擦除之后的图像是原图像的一个随机子空间RSM，其随机保留了图像样本的部分特征而不是全部特征。

若结合集成学习的方法训练多个网络，可以在不进行数据增强的情况下增加训练样本的数量，并且提高模型的泛化能力。

import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
import cv2
import random

def RSM_DataGenerator(path,im_w=256, im_h=256,im_channels=3,divide=16,lowerlimit=0.97,upperlimit=1.0,postfix='bmp'):
    alpha = random.uniform(lowerlimit, upperlimit)
    print('random_ratio: ',alpha)
    print('N: ',divide * divide)
    part = int(divide * divide * alpha)
    index = np.arange(0, divide * divide)
    # print(index)
    random.shuffle(index)
    # print(index)

    kb = index[0:part]
    kb = np.sort(kb)
    print('kb: ')
    print(list(kb),sep=",")
    # kb=np.array([0, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 29, 30, 31, 32, 34, 35, 36, 37, 39, 40, 41, 42, 43, 44, 45, 46, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63])

    image = tf.io.read_file(path)  # 读取图片
    if postfix=='bmp':
        image = tf.image.decode_bmp(image, channels=im_channels)
    if postfix=='jpeg' or postfix=='jpg':
        image = tf.image.decode_jpeg(image, channels=im_channels)
        print(type(image))
    image = tf.image.resize(image, [im_h,im_w])
    # image = tf.cast(image, dtype=tf.float32)
    # print(type(image),image)
    # shape=image.get_shape().as_list()
    # print(shape)
    img_shape = image.shape
    print(img_shape,type(img_shape))
    rows = img_shape[0]
    cols = img_shape[1]
    w_cellsize = int(cols / divide)
    h_cellsize = int(rows / divide)
    img_list_h = []
    for i in range(0, divide):
        img_list_w = []
        for j in range(0, divide):
            offset_height = i * h_cellsize
            offset_width = j * w_cellsize
            target_height = h_cellsize
            target_width = w_cellsize
            img_temp = tf.image.crop_to_bounding_box(image, offset_height, offset_width, target_height, target_width)

            if i * divide + j in kb:
                # print(i * divide + j, offset_height, offset_width, target_height, target_width)
                img_list_w.append(img_temp / 255.0)
            else:
                img_list_w.append(img_temp * 0)
        img_concat_w = tf.concat([im for im in np.array(img_list_w)], axis=1)
        img_list_h.append(img_concat_w)

    img_concat = tf.concat([im for im in np.array(img_list_h)], axis=0)
    return img_concat

if __name__=='__main__':
    image=RSM_DataGenerator('./v1/cat.jpg',im_w=300,im_h=400,divide=8,lowerlimit=0.7,upperlimit=0.7,postfix='jpg')
    cv2.imshow('rsm_image',np.array(image*255,dtype='uint8')[:, :, [2, 1, 0]])
    cv2.waitKey(0)

将输入图像分为N个不相互重叠的子图像，用于构建随机子空间。每一个子空间包含α⋅N 个子图像，0≤α≤1 。

我们用一个随机索引向量kb∈ $Z^{\alpha N}$ 生成随机子空间b (b=1,2,…,B) ，kb 中的每一个元素都不重复，且元素值在1和N之间。

原图 divide=16，alpha=0.7 divide=8，alpha=0.7

tf.image的部分方法及相关处理图像的方法，解释如下：

tf.image.decode_jpeg或者tf.image.decode_bmp

https://tensorflow.google.cn/versions/r2.1/api_docs/python/tf/io/decode_jpeg

channels为解码图像所需的颜色通道数量。

ratio为下采样倍数，为2则尺寸缩小2倍，即是height长和width宽缩小2倍，面积缩小4倍。

tf.cast转化数据类型

https://tensorflow.google.cn/versions/r2.1/api_docs/python/tf/cast

The operation supports data types (for x and dtype) of uint8, uint16, uint32, uint64, int8, int16, int32, int64, float16, float32, float64, complex64, complex128, bfloat16.

In case of casting from complex types (complex64, complex128) to real types, only the real part of x is returned.

In case of casting from real types to complex types (complex64, complex128), the imaginary part of the returned value is set to 0.

The handling of complex types here matches the behavior of numpy.

用tf.image.decode_jpeg读取的图片数据格式为uint8，如果后续要做算术运算的话要先转float32

tf.Tensor获取尺寸：方法get_shape().as_list()或者shape数据成员

https://tensorflow.google.cn/versions/r2.1/api_docs/python/tf/Tensor

tf.Tensor类的get_shape方法返回tf.TensorShape对象

https://github.com/tensorflow/tensorflow/blob/v2.1.0/tensorflow/python/framework/ops.py#L572-L574

https://tensorflow.google.cn/versions/r2.1/api_docs/python/tf/TensorShape（tf.TensorShape类）

tf.TensorShape类的as_list方法，返回一个list，其中包含每个维度的大小

tf.Tensor类的shape数据成员

tf.Tensor类的shape数据成员本身是一个tf.TensorShape对象，

不论是tf.TensorShape对象还是tf.TensorShape.as_list返回的list对象，都可以用中括号进行取值，

数据内容为：[height,width,channels]，对应，[行数，列数，通道数]

tf.image.resize

https://tensorflow.google.cn/versions/r2.1/api_docs/python/tf/image/resize

这里注意输入的size=[height，width]对应图像的高和宽。

resize默认返回的tensor的数据格式为float32类型。

tf.image.crop_to_bounding_box

https://tensorflow.google.cn/versions/r2.1/api_docs/python/tf/image/crop_to_bounding_box

offset_height和offset_width表示要获取的boundingbox的左上角点在图像中的位置，offset_width是宽度方向的坐标值，offset_height是高度方向的坐标值

target_height和target_width表示要获取的boundingbox的高和宽，而且注意到，这个boundingbox的取值范围包含左上角点。

并且，这个函数返回值和batchsize无关：

如果是4Dtensor，则返回 [batch, target_height, target_width, channels]

如果是3Dtensor，则返回 [target_height, target_width, channels]