tf.nn.conv2d与tf.contrib.slim.conv2d

 
 

在查看代码的时候,看到有代码用到卷积层是tf.nn.conv2d,但是也有的使用的卷积层是tf.contrib.slim.conv2d,这两个函数调用的卷积层是否一致,在查看了API的文档,以及slim.conv2d的源码后,做如下总结: 首先是常见使用的tf.nn.conv2d的函数,其定义如下:

conv2d(
    input,
    filter,
    strides,
    padding,
    use_cudnn_on_gpu=None,
    data_format=None,
    name=None
)

input****指需要做卷积的输入图像,它要求是一个Tensor,具有[batch_size, in_height, in_width, in_channels]这样的shape,具体含义是[训练时一个batch的图片数量, 图片高度, 图片宽度, 图像通道数],注意这是一个4维的Tensor,要求数据类型为float32和float64其中之一
filter****用于指定****CNN****中的卷积核,它要求是一个Tensor,具有[filter_height, filter_width, in_channels, out_channels]这样的shape,具体含义是[卷积核的高度,卷积核的宽度,图像通道数,卷积核个数],要求类型与参数input相同,有一个地方需要注意,第三维in_channels,就是参数input的第四维,这里是维度一致,不是数值一致
strides****为卷积时在图像每一维的步长,这是一个一维的向量,长度为4,对应的是在input的4个维度上的步长
padding****是****string****类型的变量,只能是"SAME","VALID"其中之一,这个值决定了不同的卷积方式,SAME代表卷积核可以停留图像边缘,VALID表示不能,更详细的描述可以参考http://blog.csdn.net/mao_xiao_feng/article/details/53444333
use_cudnn_on_gpu****指定是否使用****cudnn****加速,默认为true
data_format****是用于指定输入的****input****的格式,默认为NHWC格式

结果返回一个Tensor,这个输出,就是我们常说的feature map

而对于tf.contrib.slim.conv2d,其函数定义如下:

convolution(inputs,
          num_outputs,
          kernel_size,
          stride=1,
          padding='SAME',
          data_format=None,
          rate=1,
          activation_fn=nn.relu,
          normalizer_fn=None,
          normalizer_params=None,
          weights_initializer=initializers.xavier_initializer(),
          weights_regularizer=None,
          biases_initializer=init_ops.zeros_initializer(),
          biases_regularizer=None,
          reuse=None,
          variables_collections=None,
          outputs_collections=None,
          trainable=True,
          scope=None):

inputs****同样是****指需要做卷积的输入图像
num_outputs****指定卷积核的个数(就是filter****的个数)
kernel_size****用于指定卷积核的维度****(卷积核的宽度,卷积核的高度)
stride****为卷积时在图像每一维的步长
padding****为padding****的方式选择,VALID****或者SAME
data_format****是用于指定输入的****input****的格式
rate****这个参数不是太理解,而且tf.nn.conv2d****中也没有
,对于使用atrous convolution的膨胀率(不是太懂这个atrous convolution)
activation_fn****用于激活函数的指定,默认的为ReLU函数
normalizer_fn****用于指定正则化函数
normalizer_params****用于指定正则化函数的参数
weights_initializer****用于指定权重的初始化程序
weights_regularizer****为权重可选的正则化程序
biases_initializer****用于指定biase****的初始化程序
biases_regularizer: biases****可选的正则化程序
reuse****指定是否共享层或者和变量
variable_collections****指定所有变量的集合列表或者字典
outputs_collections****指定输出被添加的集合
trainable:****卷积层的参数是否可被训练
scope:****共享变量所指的variable_scope

在上述的API中,可以看出去除掉初始化的部分,那么两者并没有什么不同,只是tf.contrib.slim.conv2d提供了更多可以指定的初始化的部分,而对于tf.nn.conv2d而言,其指定filter的方式相比较tf.contrib.slim.conv2d来说,更加的复杂。去除掉少用的初始化部分,其实两者的API可以简化如下:

tf.contrib.slim.conv2d (inputs,
                num_outputs,[卷积核个数]
                kernel_size,[卷积核的高度,卷积核的宽度]
                stride=1,
                padding='SAME',
)
tf.nn.conv2d(
    input,(与上述一致)
    filter,([卷积核的高度,卷积核的宽度,图像通道数,卷积核个数])
    strides,
    padding,
)

可以说两者是几乎相同的,运行下列代码也可知这两者一致

import tensorflow as tf 
import tensorflow.contrib.slim as slim
 
x1 = tf.ones(shape=[1, 64, 64, 3]) 
w = tf.fill([5, 5, 3, 64], 1)
# print("rank is", tf.rank(x1))
y1 = tf.nn.conv2d(x1, w, strides=[1, 1, 1, 1], padding='SAME')
y2 = slim.conv2d(x1, 64, [5, 5], weights_initializer=tf.ones_initializer, padding='SAME')
 
 
with tf.Session() as sess: 
    sess.run(tf.global_variables_initializer()) 
    y1_value,y2_value,x1_value=sess.run([y1,y2,x1])
    print("shapes are", y1_value.shape, y2_value.shape)
    print(y1_value==y2_value)
    print(y1_value)
    print(y2_value)

最后配上tf.contrib.slim.conv2d的API英文版

def convolution(inputs,
                num_outputs,
                kernel_size,
                stride=1,
                padding='SAME',
                data_format=None,
                rate=1,
                activation_fn=nn.relu,
                normalizer_fn=None,
                normalizer_params=None,
                weights_initializer=initializers.xavier_initializer(),
                weights_regularizer=None,
                biases_initializer=init_ops.zeros_initializer(),
                biases_regularizer=None,
                reuse=None,
                variables_collections=None,
                outputs_collections=None,
                trainable=True,
                scope=None):
  """Adds an N-D convolution followed by an optional batch_norm layer.
  It is required that 1 <= N <= 3.
  `convolution` creates a variable called `weights`, representing the
  convolutional kernel, that is convolved (actually cross-correlated) with the
  `inputs` to produce a `Tensor` of activations. If a `normalizer_fn` is
  provided (such as `batch_norm`), it is then applied. Otherwise, if
  `normalizer_fn` is None and a `biases_initializer` is provided then a `biases`
  variable would be created and added the activations. Finally, if
  `activation_fn` is not `None`, it is applied to the activations as well.
  Performs atrous convolution with input stride/dilation rate equal to `rate`
  if a value > 1 for any dimension of `rate` is specified.  In this case
  `stride` values != 1 are not supported.
  Args:
    inputs: A Tensor of rank N+2 of shape
      `[batch_size] + input_spatial_shape + [in_channels]` if data_format does
      not start with "NC" (default), or
      `[batch_size, in_channels] + input_spatial_shape` if data_format starts
      with "NC".
    num_outputs: Integer, the number of output filters.
    kernel_size: A sequence of N positive integers specifying the spatial
      dimensions of the filters.  Can be a single integer to specify the same
      value for all spatial dimensions.
    stride: A sequence of N positive integers specifying the stride at which to
      compute output.  Can be a single integer to specify the same value for all
      spatial dimensions.  Specifying any `stride` value != 1 is incompatible
      with specifying any `rate` value != 1.
    padding: One of `"VALID"` or `"SAME"`.
    data_format: A string or None.  Specifies whether the channel dimension of
      the `input` and output is the last dimension (default, or if `data_format`
      does not start with "NC"), or the second dimension (if `data_format`
      starts with "NC").  For N=1, the valid values are "NWC" (default) and
      "NCW".  For N=2, the valid values are "NHWC" (default) and "NCHW".
      For N=3, the valid values are "NDHWC" (default) and "NCDHW".
    rate: A sequence of N positive integers specifying the dilation rate to use
      for atrous convolution.  Can be a single integer to specify the same
      value for all spatial dimensions.  Specifying any `rate` value != 1 is
      incompatible with specifying any `stride` value != 1.
    activation_fn: Activation function. The default value is a ReLU function.
      Explicitly set it to None to skip it and maintain a linear activation.
    normalizer_fn: Normalization function to use instead of `biases`. If
      `normalizer_fn` is provided then `biases_initializer` and
      `biases_regularizer` are ignored and `biases` are not created nor added.
      default set to None for no normalizer function
    normalizer_params: Normalization function parameters.
    weights_initializer: An initializer for the weights.
    weights_regularizer: Optional regularizer for the weights.
    biases_initializer: An initializer for the biases. If None skip biases.
    biases_regularizer: Optional regularizer for the biases.
    reuse: Whether or not the layer and its variables should be reused. To be
      able to reuse the layer scope must be given.
    variables_collections: Optional list of collections for all the variables or
      a dictionary containing a different list of collection per variable.
    outputs_collections: Collection to add the outputs.
    trainable: If `True` also add variables to the graph collection
      `GraphKeys.TRAINABLE_VARIABLES` (see tf.Variable).
    scope: Optional scope for `variable_scope`.
  Returns:
    A tensor representing the output of the operation.
  Raises:
    ValueError: If `data_format` is invalid.
    ValueError: Both 'rate' and `stride` are not uniformly 1.


作者:马小李23
链接:https://www.jianshu.com/p/a70c1d931395
來源:简书
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
tf.nn.conv2d is a function in TensorFlow that performs a 2D convolution operation on a given input tensor and a set of filters. It is typically used in deep learning models for image processing and computer vision tasks. The function takes several arguments, including the input tensor, the filter tensor, the strides for the convolution operation, and the padding scheme. The output of the convolution operation is a new tensor that represents the result of applying the filters to the input tensor. Here is an example usage of tf.nn.conv2d: ``` import tensorflow as tf # Define input and filter tensors input_tensor = tf.placeholder(tf.float32, shape=[None, 28, 28, 3]) filter_tensor = tf.Variable(tf.random_normal([5, 5, 3, 32])) # Perform a 2D convolution operation with strides of 1 and padding of 'SAME' conv = tf.nn.conv2d(input_tensor, filter_tensor, strides=[1, 1, 1, 1], padding='SAME') # Run the convolution operation within a TensorFlow session with tf.Session() as sess: sess.run(tf.global_variables_initializer()) # Define a sample input tensor input_data = np.random.rand(1, 28, 28, 3) # Run the convolution operation on the input tensor conv_result = sess.run(conv, feed_dict={input_tensor: input_data}) ``` In this example, we define an input tensor with a shape of (None, 28, 28, 3), which represents a batch of 28x28 RGB images. We also define a filter tensor with a shape of (5, 5, 3, 32), which represents 32 5x5 filters that will be applied to the input tensor. We then call tf.nn.conv2d with the input and filter tensors, specifying a stride of 1 and a padding scheme of 'SAME'. This means that the output tensor will have the same spatial dimensions as the input tensor, and that the edges of the input tensor will be zero-padded to ensure that the filters can be applied to all pixels. Finally, we run the convolution operation within a TensorFlow session, providing a sample input tensor to test the operation. The resulting conv_result tensor will have a shape of (1, 28, 28, 32), representing a batch of 28x28 feature maps for each of the 32 filters applied to the input tensor.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值