tf.nn.conv2d和tf.contrib.slim.conv2d的区别

最新推荐文章于 2019-10-30 19:00:55 发布

poppy_MCT

最新推荐文章于 2019-10-30 19:00:55 发布

阅读量957

点赞数 2

分类专栏：深度学习 Tensorflow

本文链接：https://blog.csdn.net/weixin_42702666/article/details/88342959

版权

深度学习同时被 2 个专栏收录

2 篇文章 0 订阅

订阅专栏

Tensorflow

2 篇文章 0 订阅

订阅专栏

在查看代码的时候，看到有代码用到卷积层是tf.nn.conv2d，但是也有的使用的卷积层是tf.contrib.slim.conv2d，这两个函数调用的卷积层是否一致，在查看了API的文档，以及slim.conv2d的源码后，做如下总结：

tf.nn.conv2d函数

定义如下：

conv2d(
    input,
    filter,
    strides,
    padding,
    use_cudnn_on_gpu=None,
    data_format=None,
    name=None
)

input：

指需要做卷积的输入图像，它要求是一个Tensor，具有[batch_size, in_height, in_width, in_channels]这样的shape，具体含义是[训练时一个batch的图片数量, 图片高度, 图片宽度, 图像通道数]，注意这是一个4维的Tensor，要求数据类型为float32和float64其中之一。

filter

用于指定CNN中的卷积核，它要求是一个Tensor，具有[filter_height, filter_width, in_channels, out_channels]这样的shape，具体含义是[卷积核的高度，卷积核的宽度，图像通道数，卷积核个数]，要求类型与参数input相同，有一个地方需要注意，第三维in_channels，就是参数input的第四维，这里是维度一致，不是数值一致

strides

为卷积时在图像每一维的步长，这是一个一维的向量，长度为4，对应的是在input的4个维度上的步长

padding

string类型的变量，只能是"SAME","VALID"其中之一，这个值决定了不同的卷积方式，SAME代表卷积核可以停留图像边缘，VALID表示不能

use_cudnn_on_gpu

指定是否使用cudnn加速，默认为true

data_format

是用于指定输入的input的格式，默认为NHWC格式

结果返回一个Tensor，这个输出，就是我们常说的feature map

tf.contrib.slim.conv2d函数

convolution(inputs,
          num_outputs,
          kernel_size,
          stride=1,
          padding='SAME',
          data_format=None,
          rate=1,
          activation_fn=nn.relu,
          normalizer_fn=None,
          normalizer_params=None,
          weights_initializer=initializers.xavier_initializer(),
          weights_regularizer=None,
          biases_initializer=init_ops.zeros_initializer(),
          biases_regularizer=None,
          reuse=None,
          variables_collections=None,
          outputs_collections=None,
          trainable=True,
          scope=None):

inputs

指需要做卷积的输入图像

num_outputs

指定卷积核的个数（就是filter的个数）

kernel_size

指定卷积核的维度（卷积核的宽度，卷积核的高度）

stride

为卷积时在图像每一维的步长

padding

为padding的方式选择，VALID或者SAME

data_format

是用于指定输入的input的格式

rate

而且tf.nn.conv2d中没有，对于使用atrous convolution的膨胀率

activation_fn

用于激活函数的指定，默认的为ReLU函数

normalizer_fn

用于指定正则化函数

normalizer_params

用于指定正则化函数的参数

weights_initializer

用于指定权重的初始化程序

weights_regularizer

为权重可选的正则化程序

biases_initializer

用于指定biase的初始化程序

biases_regularizer: biases

可选的正则化程序

reuse

指定是否共享层或者和变量

variable_collections

指定所有变量的集合列表或者字典

outputs_collections

指定输出被添加的集合

trainable

卷积层的参数是否可被训练

scope

共享变量所指的variable_scope

在上述的API中，可以看出去除掉初始化的部分，那么两者并没有什么不同，只是tf.contrib.slim.conv2d提供了更多可以指定的初始化的部分，而对于tf.nn.conv2d而言，其指定filter的方式相比较tf.contrib.slim.conv2d来说，更加的复杂。去除掉少用的初始化部分，其实两者的API可以简化如下：

tf.contrib.slim.conv2d (inputs,
                num_outputs,[卷积核个数]
                kernel_size,[卷积核的高度，卷积核的宽度]
                stride=1,
                padding='SAME',
)
tf.nn.conv2d(
    input,(与上述一致)
    filter,([卷积核的高度，卷积核的宽度，图像通道数，卷积核个数])
    strides,
    padding,
)

最后配上tf.contrib.slim.conv2d的API英文版

def convolution(inputs,
                num_outputs,
                kernel_size,
                stride=1,
                padding='SAME',
                data_format=None,
                rate=1,
                activation_fn=nn.relu,
                normalizer_fn=None,
                normalizer_params=None,
                weights_initializer=initializers.xavier_initializer(),
                weights_regularizer=None,
                biases_initializer=init_ops.zeros_initializer(),
                biases_regularizer=None,
                reuse=None,
                variables_collections=None,
                outputs_collections=None,
                trainable=True,
                scope=None):
  """Adds an N-D convolution followed by an optional batch_norm layer.
  It is required that 1 <= N <= 3.
  `convolution` creates a variable called `weights`, representing the
  convolutional kernel, that is convolved (actually cross-correlated) with the
  `inputs` to produce a `Tensor` of activations. If a `normalizer_fn` is
  provided (such as `batch_norm`), it is then applied. Otherwise, if
  `normalizer_fn` is None and a `biases_initializer` is provided then a `biases`
  variable would be created and added the activations. Finally, if
  `activation_fn` is not `None`, it is applied to the activations as well.
  Performs atrous convolution with input stride/dilation rate equal to `rate`
  if a value > 1 for any dimension of `rate` is specified.  In this case
  `stride` values != 1 are not supported.
  Args:
    inputs: A Tensor of rank N+2 of shape
      `[batch_size] + input_spatial_shape + [in_channels]` if data_format does
      not start with "NC" (default), or
      `[batch_size, in_channels] + input_spatial_shape` if data_format starts
      with "NC".
    num_outputs: Integer, the number of output filters.
    kernel_size: A sequence of N positive integers specifying the spatial
      dimensions of the filters.  Can be a single integer to specify the same
      value for all spatial dimensions.
    stride: A sequence of N positive integers specifying the stride at which to
      compute output.  Can be a single integer to specify the same value for all
      spatial dimensions.  Specifying any `stride` value != 1 is incompatible
      with specifying any `rate` value != 1.
    padding: One of `"VALID"` or `"SAME"`.
    data_format: A string or None.  Specifies whether the channel dimension of
      the `input` and output is the last dimension (default, or if `data_format`
      does not start with "NC"), or the second dimension (if `data_format`
      starts with "NC").  For N=1, the valid values are "NWC" (default) and
      "NCW".  For N=2, the valid values are "NHWC" (default) and "NCHW".
      For N=3, the valid values are "NDHWC" (default) and "NCDHW".
    rate: A sequence of N positive integers specifying the dilation rate to use
      for atrous convolution.  Can be a single integer to specify the same
      value for all spatial dimensions.  Specifying any `rate` value != 1 is
      incompatible with specifying any `stride` value != 1.
    activation_fn: Activation function. The default value is a ReLU function.
      Explicitly set it to None to skip it and maintain a linear activation.
    normalizer_fn: Normalization function to use instead of `biases`. If
      `normalizer_fn` is provided then `biases_initializer` and
      `biases_regularizer` are ignored and `biases` are not created nor added.
      default set to None for no normalizer function
    normalizer_params: Normalization function parameters.
    weights_initializer: An initializer for the weights.
    weights_regularizer: Optional regularizer for the weights.
    biases_initializer: An initializer for the biases. If None skip biases.
    biases_regularizer: Optional regularizer for the biases.
    reuse: Whether or not the layer and its variables should be reused. To be
      able to reuse the layer scope must be given.
    variables_collections: Optional list of collections for all the variables or
      a dictionary containing a different list of collection per variable.
    outputs_collections: Collection to add the outputs.
    trainable: If `True` also add variables to the graph collection
      `GraphKeys.TRAINABLE_VARIABLES` (see tf.Variable).
    scope: Optional scope for `variable_scope`.
  Returns:
    A tensor representing the output of the operation.
  Raises:
    ValueError: If `data_format` is invalid.
    ValueError: Both 'rate' and `stride` are not uniformly 1.