TensorFlow tf.nn.conv2d

最新推荐文章于 2024-07-19 14:41:44 发布

whitesilence

最新推荐文章于 2024-07-19 14:41:44 发布

阅读量549

点赞数

分类专栏： # tensorflow学习

本文链接：https://blog.csdn.net/whitesilence/article/details/74556491

版权

tensorflow学习专栏收录该内容

6 篇文章 0 订阅

订阅专栏

在MNIST例程的卷积模型中，最关键的就是tf.nn.conv2d和tf.nn.max_pool两个函数，
先把引用这两个函数的代码片贴出来：

def conv2d(x, W, b, strides=1):
    # Conv2D wrapper, with bias and relu activation
    x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME')
    x = tf.nn.bias_add(x, b)
    return tf.nn.relu(x)

def maxpool2d(x, k=2):
    # MaxPool2D wrapper
    return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1], padding='SAME')

追踪tf.nn.conv2d这个函数至gen_nn_ops.py中：

def conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None,data_format=None, name=None):
  r"""Computes a 2-D convolution given 4-D `input` and `filter` tensors.

  Given an input tensor of shape `[batch, in_height, in_width, in_channels]`
  and a filter / kernel tensor of shape
  `[filter_height, filter_width, in_channels, out_channels]`, this op
  performs the following:

  1. Flattens the filter to a 2-D matrix with shape
     `[filter_height * filter_width * in_channels, output_channels]`.
  2. Extracts image patches from the input tensor to form a *virtual*
     tensor of shape `[batch, out_height, out_width,
     filter_height * filter_width * in_channels]`.
  3. For each patch, right-multiplies the filter matrix and the image patch
     vector.

  In detail, with the default NHWC format,

      output[b, i, j, k] =
          sum_{di, dj, q} input[b, strides[1] * i + di, strides[2] * j + dj, q] *
                          filter[di, dj, q, k]

  Must have `strides[0] = strides[3] = 1`.  For the most common case of the same
  horizontal and vertices strides, `strides = [1, stride, stride, 1]`.

  Args:
    input: A `Tensor`. Must be one of the following types: `float32`, `float64`.
    filter: A `Tensor`. Must have the same type as `input`.
    strides: A list of `ints`.
      1-D of length 4.  The stride of the sliding window for each dimension
      of `input`. Must be in the same order as the dimension specified with format.
    padding: A `string` from: `"SAME", "VALID"`.
      The type of padding algorithm to use.
    use_cudnn_on_gpu: An optional `bool`. Defaults to `True`.
    data_format: An optional `string` from: `"NHWC", "NCHW"`. Defaults to `"NHWC"`.
      Specify the data format of the input and output data. With the
      default format "NHWC", the data is stored in the order of:
          [batch, in_height, in_width, in_channels].
      Alternatively, the format could be "NCHW", the data storage order of:
          [batch, in_channels, in_height, in_width].
    name: A name for the operation (optional).

  Returns:
    A `Tensor`. Has the same type as `input`.
  """

可以看到tf.nn.conv2d的第一个参数x表示输入数据，是一个4-D的tensor [bath,in_height,in_width,in_channels](四个维度分别表示一批训练的图片数量，图片高度和宽度及图片的通道数（一般灰度图像是1，rgb图像是3））；第二个参数 W表示使用的卷积核，一般是方阵；第三个参数是 strides=[1, strides, strides, 1], 表示每个维度做卷积的步幅；第四个参数是padding，有两个值可选“SAME”和“VALID”，若padding=’SAME’表示在做卷积前需要对输入图像进行0填充使，卷积后的图像与输入图像有相同的维度。若padding=’VALID’表示做卷积前不对输入图像进行0填充，卷积后的图像与输入图像维度不同。

关于卷积后图像的维度，有下面的结论：
设输入图像是i*i $\Leftrightarrow$ in_height,in_width
卷积核是k*k $\Leftrightarrow$ shape(W)
步幅是s $\Leftrightarrow$ strides
当padding=’VALID’时，输出图像为 $\floor(\frac{i-k}{s})+1$
当padding=’SAME’时，输出图像为i*i(s=1)

tf.nn.max_pool有4个参数，x表示输入图像； ksize=[1, k, k, 1]表示卷积核的尺寸, strides=[1, k, k, 1]表示歩幅，padding=’SAME’与上面相同
池化后的输出图像结果与上面卷积一样 $\floor(\frac{i-k}{s})+1$

更多关于卷积、池化后的输出图像相关理论可以参考：https://arxiv.org/pdf/1603.07285v1.pdf
后来发现有大神对这篇文档的部分解读，是中文的，感谢大神的分享：http://blog.csdn.net/kekong0713/article/details/68941498