tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, data_format=None, name=None) |
参数input是:
[batch, in_height, in_width, in_channels]
参数filter是:
[filter_height, filter_width, in_channels, out_channels]
参数strides是一个一维具有四个元素的张量,其规定前后必须为1,可以改的是中间两个数,中间两个数分别代表了水平滑动和垂直滑动步长值:
strides=[1, 1, 1, 1] 或 strides=[1, 2, 2, 1]
个人认为前后的1可能是对应的input中的batch和in_channels,这两个维度,只能是1,不能跳过卷积,所以不能设置成别的数值。
参数padding可选same和valid:
比如有图大小为5*5,卷积核为2*2,步长为2,卷积核扫描了两次后,剩下一个元素,不够卷积核扫描了,这个时候就在后面补零,补完后满足卷积核的扫描,这种方式就是same。
如果说把刚才不足以扫描的元素位置抛弃掉,就是valid方式。
对于多通道来说,input是[1x3x3x2]是3x3图像有2个通道,filter是[2x2x2x1],步长均是1,padding=VALID,输出是[1x2x2x1],如图:
后续补充:贴一下源码的注释
def conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, data_format=None, name=None): r"""Computes a 2-D convolution given 4-D `input` and `filter` tensors. Given an input tensor of shape `[batch, in_height, in_width, in_channels]` and a filter / kernel tensor of shape `[filter_height, filter_width, in_channels, out_channels]`, this op performs the following: 1. Flattens the filter to a 2-D matrix with shape `[filter_height * filter_width * in_channels, output_channels]`. 2. Extracts image patches from the input tensor to form a *virtual* tensor of shape `[batch, out_height, out_width, filter_height * filter_width * in_channels]`. 3. For each patch, right-multiplies the filter matrix and the image patch vector. In detail, with the default NHWC format, output[b, i, j, k] = sum_{di, dj, q} input[b, strides[1] * i + di, strides[2] * j + dj, q] * filter[di, dj, q, k] Must have `strides[0] = strides[3] = 1`. For the most common case of the same horizontal and vertices strides, `strides = [1, stride, stride, 1]`. Args: input: A `Tensor`. Must be one of the following types: `half`, `float32`, `float64`. A 4-D tensor. The dimension order is interpreted according to the value of `data_format`, see below for details. filter: A `Tensor`. Must have the same type as `input`. A 4-D tensor of shape `[filter_height, filter_width, in_channels, out_channels]` strides: A list of `ints`. 1-D tensor of length 4. The stride of the sliding window for each dimension of `input`. The dimension order is determined by the value of `data_format`, see below for details. padding: A `string` from: `"SAME", "VALID"`. The type of padding algorithm to use. use_cudnn_on_gpu: An optional `bool`. Defaults to `True`. data_format: An optional `string` from: `"NHWC", "NCHW"`. Defaults to `"NHWC"`. Specify the data format of the input and output data. With the default format "NHWC", the data is stored in the order of: [batch, height, width, channels]. Alternatively, the format could be "NCHW", the data storage order of: [batch, channels, height, width]. name: A name for the operation (optional). Returns: A `Tensor`. Has the same type as `input`. A 4-D tensor. The dimension order is determined by the value of `data_format`, see below for details. """