TensorFlow学习笔记-图像预处理

最新推荐文章于 2022-04-17 14:36:54 发布

TiRan_Yang

最新推荐文章于 2022-04-17 14:36:54 发布

阅读量6.6k

点赞数 2

分类专栏： TensorFlow 文章标签： Tensorflow image 预处理

本文链接：https://blog.csdn.net/lovelyaiq/article/details/78716325

版权

TensorFlow 专栏收录该内容

19 篇文章 0 订阅

订阅专栏

　　任何神经网络在开始训练数据时，都需要数据增强。什么是数据增强呢？我们首先看一个例子，假如我们的训练集有10万图片，如果直接使用这10张图片进行训练，是不是感觉训练集有点小。在假如，训练神经网络的目的是要正确识别猫，而此时呢？你的训练集中含有猫头的图片都是朝着左侧倾斜，那么当你训练好模型之后，你的模型不能准确识别猫头朝着右侧倾斜的图片，因为它没有被训练。而假如你在训练输入时，将训练集合中的所有图片进行水平翻转，就会得到10万新的数据，此时你就有20万张训练集合了。如果在进行随机裁剪、亮度、对比度变化，这又大大增加训练集合的数量，最终训练的集合更加健壮。这就是数据增强的作用：将单幅图片增加多个副本，提高了图片的利用率，并且防止对某一张图片结构的学习过拟合。
　　Tensorflow关于图像操作的类别有：编码/解码、缩放、裁剪、翻转和移位、图像调整。

编码/解码

　　Tensorflow读取图像文件，还不是以矩阵的形势，而是图像的原始数据，因此需要解码为jpg或png格式。编码就是将矩阵元素编码为jpg或png格式，然后进行保存。对应的API如下：

tf.image.decode_jpeg(contents, channels=None, ratio=None, fancy_upscaling=None, try_recover_truncated=None, acceptable_fraction=None, name=None)
tf.image.encode_jpeg(image, format=None, quality=None, progressive=None, optimize_size=None, chroma_downsampling=None, density_unit=None, x_density=None, y_density=None, xmp_metadata=None, name=None)
tf.image.decode_png(contents, channels=None, name=None)
tf.image.encode_png(image, compression=None, name=None)

　　例如，读取jpg格式的文件，并显示：http://blog.csdn.net/lovelyaiq/article/details/78690171

缩放

　　缩放对应的API有：

tf.image.resize_images(images, new_height, new_width, method=0)
tf.image.resize_area(images, size, name=None)
tf.image.resize_bicubic(images, size, name=None)
tf.image.resize_bilinear(images, size, name=None)
tf.image.resize_nearest_neighbor(images, size, name=None)

裁剪

tf.image.resize_image_with_crop_or_pad(image, target_height, target_width)
tf.image.pad_to_bounding_box(image, offset_height, offset_width, target_height, target_width)
tf.image.crop_to_bounding_box(image, offset_height, offset_width, target_height, target_width)
tf.image.random_crop(image, size, seed=None, name=None)
tf.image.extract_glimpse(input, size, offsets, centered=None, normalized=None, uniform_noise=None, name=None)

翻转和移位

tf.image.flip_up_down(image)
tf.image.random_flip_up_down(image, seed=None)
tf.image.flip_left_right(image)
tf.image.random_flip_left_right(image, seed=None)
tf.image.transpose_image(image)

图像调整

tf.image.adjust_brightness(image, delta, min_value=None, max_value=None)
tf.image.random_brightness(image, max_delta, seed=None)
tf.image.adjust_contrast(images, contrast_factor, min_value=None, max_value=None)
tf.image.random_contrast(image, lower, upper, seed=None)
tf.image.per_image_whitening(image)

　　关于这些函数的详细用法，请参考对应的API说明，下面我们以Cifar10例子中关于图像操作为例，对图像的操作有：随机裁剪、随机左右翻转、随机增加亮度、图像归一化。

  # 神经网络输入的尺寸
  height = IMAGE_SIZE
  width = IMAGE_SIZE

  # Image processing for training the network. Note the many random
  # distortions applied to the image.
  # 随机裁剪
  # Randomly crop a [height, width] section of the image.
  distorted_image = tf.random_crop(reshaped_image, [height, width, 3])

  # Randomly flip the image horizontally.
  distorted_image = tf.image.random_flip_left_right(distorted_image)

  # Because these operations are not commutative, consider randomizing
  # the order their operation.
  distorted_image = tf.image.random_brightness(distorted_image,
                                               max_delta=63)
  distorted_image = tf.image.random_contrast(distorted_image,
                                             lower=0.2, upper=1.8)

  # Subtract off the mean and divide by the variance of the pixels.
  float_image = tf.image.per_image_standardization(distorted_image)

  # Set the shapes of tensors.
  float_image.set_shape([height, width, 3])