计算机视觉系列3.1 VGGNet中的数据预处理

最新推荐文章于 2023-03-16 12:35:04 发布

coasxu

最新推荐文章于 2023-03-16 12:35:04 发布

阅读量2.1k

点赞数 5

分类专栏： # 深度学习文章标签：数据预处理 VGGNet 计算机视觉 tensorflow

本文链接：https://blog.csdn.net/weixin_44633882/article/details/87705734

版权

深度学习专栏收录该内容

14 篇文章 0 订阅

订阅专栏

计算机视觉系列3.1 VGGNet中的数据预处理

本文链接：https://blog.csdn.net/weixin_44633882/article/details/87705734

ps：《计算机视觉系列3 VGGNet网络的思路》这篇博客还在打磨，因为我认为发布的博客应该对于读者需要负责，能够帮助他们解答困惑，而不是未完成的博客来影响或误导读者，所以等完成了再上传，感谢理解！

1. 简介

本文章分析了tensorflow slim中vgg预处理的源码。

源码地址：https://github.com/tensorflow/models/blob/master/research/slim/preprocessing/vgg_preprocessing.py

1.1 vgg预处理

tensorflow slim中的训练和预测对图片的预处理是不同的。
训练中的图片预处理

随机生成一个最短边的长度resize_side，范围在[256,512]
对图像进行等比例变换，使最短边的大小等于resize_side
对图像进行随机裁剪，大小为output_height × output_width
水平翻转
减去ImageNet训练集的RGB均值

预测中的图片预处理

给定最短边长度resize_side
对图像进行等比例变换，使最短边的大小等于resize_side
对图像进行中心裁剪，大小为output_height × output_width
减去ImageNet训练集的RGB均值

def preprocess_image(image, output_height, output_width, is_training=False,
                     resize_side_min=_RESIZE_SIDE_MIN,
                     resize_side_max=_RESIZE_SIDE_MAX):
  if is_training:
    return preprocess_for_train(image, output_height, output_width,
                                resize_side_min, resize_side_max)
  else:
    return preprocess_for_eval(image, output_height, output_width,
                               resize_side_min)

训练

def preprocess_for_train(image,
                         output_height,
                         output_width,
                         resize_side_min=_RESIZE_SIDE_MIN,
                         resize_side_max=_RESIZE_SIDE_MAX):
  # 随机生成一个最短边的大小
  resize_side = tf.random_uniform(
      [], minval=resize_side_min, maxval=resize_side_max+1, dtype=tf.int32)
  # rehape(image)，等比例变换，使最短边的大小等于resize_side
  image = _aspect_preserving_resize(image, resize_side) 
  # 得到一次crop的图像
  image = _random_crop([image], output_height, output_width)[0]
  image.set_shape([output_height, output_width, 3])
  image = tf.to_float(image)
  # 水平翻转
  image = tf.image.random_flip_left_right(image)
  # 返回减去ImageNet训练集的RGB均值的图像。
  return _mean_image_subtraction(image, [_R_MEAN, _G_MEAN, _B_MEAN])

预测

def preprocess_for_eval(image, output_height, output_width, resize_side):
  # rehape(image)，等比例变换，使最短边的大小等于resize_side
  image = _aspect_preserving_resize(image, resize_side)
  image = _central_crop([image], output_height, output_width)[0]
  image.set_shape([output_height, output_width, 3])
  image = tf.to_float(image)
  # 返回减去ImageNet训练集的RGB均值的图像。
  return _mean_image_subtraction(image, [_R_MEAN, _G_MEAN, _B_MEAN])

2. 代码分析

import tensorflow as tf

slim = tf.contrib.slim

_R_MEAN = 123.68
_G_MEAN = 116.78
_B_MEAN = 103.94

_RESIZE_SIDE_MIN = 256
_RESIZE_SIDE_MAX = 512

裁剪图像

def _crop(image, offset_height, offset_width, crop_height, crop_width):
  """Crops the given image using the provided offsets and sizes.
  Note that the method doesn't assume we know the input image size but it does
  assume we know the input image rank.
  Args:
    image: an image of shape [height, width, channels].
    offset_height: a scalar tensor indicating the height offset.
    offset_width: a scalar tensor indicating the width offset.
    crop_height: the height of the cropped image.
    crop_width: the width of the cropped image.
  Returns:
    the cropped (and resized) image.
  Raises:
    InvalidArgumentError: if the rank is not 3 or if the image dimensions are
      less than the crop size.
  """
  original_shape = tf.shape(image)

  rank_assertion = tf.Assert(
      tf.equal(tf.rank(image), 3),
      ['Rank of image must be equal to 3.'])
  with tf.control_dependencies([rank_assertion]):
  # 合成一个tensor
    cropped_shape = tf.stack([crop_height, crop_width, original_shape[2]])
  
  size_assertion = tf.Assert(
      tf.logical_and(
          tf.greater_equal(original_shape[0], crop_height),
          tf.greater_equal(original_shape[1], crop_width)),
      ['Crop size greater than the image size.'])

  offsets = tf.to_int32(tf.stack([offset_height, offset_width, 0]))

  # Use tf.slice instead of crop_to_bounding box as it accepts tensors to
  # define the crop size.
  # 从image中根据起始位置offsets, 来裁剪出大小为cropped_shape的图像。
  with tf.control_dependencies([size_assertion]):
    image = tf.slice(image, offsets, cropped_shape)
  return tf.reshape(image, cropped_shape)

随机裁剪，使用_crop()

def _random_crop(image_list, crop_height, crop_width):
  """Crops the given list of images.
  The function applies the same crop to each image in the list. This can be
  effectively applied when there are multiple image inputs of the same
  dimension such as:
    image, depths, normals = _random_crop([image, depths, normals], 120, 150)
  Args:
    image_list: a list of image tensors of the same dimension but possibly
      varying channel.
    crop_height: the new height.
    crop_width: the new width.
  Returns:
    the image_list with cropped images.
  Raises:
    ValueError: if there are multiple image inputs provided with different size
      or the images are smaller than the crop dimensions.
  """
  if not image_list:
    raise ValueError('Empty image_list.')

  # Compute the rank assertions.
  rank_assertions = []
  for i in range(len(image_list)):
    image_rank = tf.rank(image_list[i]) # 返回图片的维度数
    # 查看是否维度数为3，返回的是一个op
    rank_assert = tf.Assert(
        tf.equal(image_rank, 3),
        ['Wrong rank for tensor  %s [expected] [actual]',
         image_list[i].name, 3, image_rank])
    # 添加入rank_assertions
    rank_assertions.append(rank_assert)
  
  with tf.control_dependencies([rank_assertions[0]]):
    image_shape = tf.shape(image_list[0])
  image_height = image_shape[0]
  image_width = image_shape[1]
  # 检查图片的高和宽是否都大于crop的高和宽，返回的是一个op
  crop_size_assert = tf.Assert(
      tf.logical_and(
          tf.greater_equal(image_height, crop_height),
          tf.greater_equal(image_width, crop_width)),
      ['Crop size greater than the image size.'])
  
  asserts = [rank_assertions[0], crop_size_assert]

  for i in range(1, len(image_list)):
    image = image_list[i]
    asserts.append(rank_assertions[i])
    with tf.control_dependencies([rank_assertions[i]]):
      shape = tf.shape(image)
    height = shape[0]
    width = shape[1]

    height_assert = tf.Assert(
        tf.equal(height, image_height),
        ['Wrong height for tensor %s [expected][actual]',
         image.name, height, image_height])
    width_assert = tf.Assert(
        tf.equal(width, image_width),
        ['Wrong width for tensor %s [expected][actual]',
         image.name, width, image_width])
    asserts.extend([height_assert, width_assert])

  # Create a random bounding box.
  #
  # 这里使用tf.random_uniform，而不是使用numpy.random.rand
  # 前者可以在graph eval time生成随机的数字
  # 后者在graph定义时生成随机的数字
  with tf.control_dependencies(asserts):
  # 返回一个tensor，shape'[]'将其变为scalar
    max_offset_height = tf.reshape(image_height - crop_height + 1, [])
  with tf.control_dependencies(asserts):
    max_offset_width = tf.reshape(image_width - crop_width + 1, [])
# 随机选择offset
  offset_height = tf.random_uniform(
      [], maxval=max_offset_height, dtype=tf.int32)
  offset_width = tf.random_uniform(
      [], maxval=max_offset_width, dtype=tf.int32)

  return [_crop(image, offset_height, offset_width,
                crop_height, crop_width) for image in image_list]

中心切片，crop的图片以原图的中心为中心，使用_crop()

def _central_crop(image_list, crop_height, crop_width):
  """Performs central crops of the given image list.
  Args:
    image_list: a list of image tensors of the same dimension but possibly
      varying channel.
    crop_height: the height of the image following the crop.
    crop_width: the width of the image following the crop.
  Returns:
    the list of cropped images.
  """
  outputs = []
  for image in image_list:
    image_height = tf.shape(image)[0]
    image_width = tf.shape(image)[1]

    offset_height = (image_height - crop_height) / 2
    offset_width = (image_width - crop_width) / 2

    outputs.append(_crop(image, offset_height, offset_width,
                         crop_height, crop_width))
  return outputs

使图像减去ImageNet训练集的RGB均值

def _mean_image_subtraction(image, means):
  """Subtracts the given means from each image channel.
  For example:
    means = [123.68, 116.779, 103.939]
    image = _mean_image_subtraction(image, means)
  Note that the rank of `image` must be known.
  Args:
    image: a tensor of size [height, width, C].
    means: a C-vector of values to subtract from each channel.
  Returns:
    the centered image.
  Raises:
    ValueError: If the rank of `image` is unknown, if `image` has a rank other
      than three or if the number of channels in `image` doesn't match the
      number of values in `means`.
  """
  if image.get_shape().ndims != 3:
    raise ValueError('Input must be of size [height, width, C>0]')
  num_channels = image.get_shape().as_list()[-1]
  if len(means) != num_channels:
    raise ValueError('len(means) must match the number of channels')
  # 一个tensor list, 包含了三个channel的数据
  channels = tf.split(axis=2, num_or_size_splits=num_channels, value=image)
  for i in range(num_channels):
    channels[i] -= means[i]
  return tf.concat(axis=2, values=channels)

根据smallest_side要求，将height和width中的最短的边，变为smallest_side大小。另一条边等比放大。
返回new_height, new_width

def _smallest_size_at_least(height, width, smallest_side):
  """Computes new shape with the smallest side equal to `smallest_side`.
  Computes new shape with the smallest side equal to `smallest_side` while
  preserving the original aspect ratio.
  Args:
    height: an int32 scalar tensor indicating the current height.
    width: an int32 scalar tensor indicating the current width.
    smallest_side: A python integer or scalar `Tensor` indicating the size of
      the smallest side after resize.
  Returns:
    new_height: an int32 scalar tensor indicating the new height.
    new_width: and int32 scalar tensor indicating the new width.
  """
  smallest_side = tf.convert_to_tensor(smallest_side, dtype=tf.int32)

  height = tf.to_float(height)
  width = tf.to_float(width)
  smallest_side = tf.to_float(smallest_side)

  scale = tf.cond(tf.greater(height, width),
                  lambda: smallest_side / width,
                  lambda: smallest_side / height)
  new_height = tf.to_int32(tf.rint(height * scale))
  new_width = tf.to_int32(tf.rint(width * scale))
  return new_height, new_width

对图像进行resize，使用_smallest_size_at_least()

def _aspect_preserving_resize(image, smallest_side):
  """Resize images preserving the original aspect ratio.
  Args:
    image: A 3-D image `Tensor`.
    smallest_side: A python integer or scalar `Tensor` indicating the size of
      the smallest side after resize.
  Returns:
    resized_image: A 3-D tensor containing the resized image.
  """
  smallest_side = tf.convert_to_tensor(smallest_side, dtype=tf.int32)

  shape = tf.shape(image)
  height = shape[0]
  width = shape[1]
  # 根据smallest_side要求，按比例，得到新的height和width
  new_height, new_width = _smallest_size_at_least(height, width, smallest_side)
  image = tf.expand_dims(image, 0) # shape增加一个维度
  # 使用双线插值
  resized_image = tf.image.resize_bilinear(image, [new_height, new_width],
                                           align_corners=False)
  # 删除图片中大小为1的维度
  resized_image = tf.squeeze(resized_image)
  resized_image.set_shape([None, None, 3])
  return resized_image

coasxu

关注

5
点赞
踩
10

收藏

觉得还不错? 一键收藏
2
评论
计算机视觉系列3.1 VGGNet中的数据预处理

计算机视觉系列3.1 VGGNet中的数据预处理ps：《计算机视觉系列3 VGGNet网络的思路》这篇博客还在打磨，因为我认为发布的博客应该对于读者需要负责，能够帮助他们解答困惑，而不是未完成的博客来影响或误导读者，所以等完成了再上传，感谢理解！1. 简介本文章分析了tensorflow slim中vgg预处理的源码。源码地址：https://github.com/tensorflow/...
复制链接

扫一扫