一、环境
TensorFlow API r1.12
CUDA 9.2 V9.2.148
cudnn64_7.dll
Python 3.6.3
Windows 10
二、官方说明
计算给定4维输入张量和4维过滤器 / 卷积核张量的而维卷积
https://www.tensorflow.org/api_docs/python/tf/nn/conv2d
tf.nn.conv2d(
input,
filter,
strides,
padding,
use_cudnn_on_gpu=True,
data_format='NHWC',
dilations=[1, 1, 1, 1],
name=None
)
输入:
(1)input:输入张量,数据类型必须是:half, bfloat16, float32, float64
(2)filter:过滤器 / 卷积核张量,必须和输入张量维度一致,4个维度分别表示 [filter_height, filter_width, in_channels, out_channels]
(3)strides:输入张量的每个维度所对应的滑动窗口的步长,长度为4的一维张量,整型数值构成的列表,第1维和最后1维必须为1,即(1, stride, stride, 1),一般情况下,strides 的 horizontal 和 vertices 是相同的
(4)padding:设置填充算法,字符类型:“SAME”或“VALID”
(5)use_cudnn_on_gpu:可选参数,布尔型,默认为True
(6)data_format:可选参数 ,字符型,默认为“NHWC”,指定输入张量和输出张量的数据格式为 [batch, height, width, channels],如果设置为“NCHW”,则指定输入张量和输出张量的数据格式 [batch, channels, height, width]
(7)dilations:可选参数,整型数据构成的列表,长度为4的一维张量,默认为 [1,1,1,1],表示输入张量的每个维度的膨胀因子。如果设置数值 k 大于1,表示在该维度上过滤元素之间跳过 k-1 个单元,使用时需要注意的是 batch 和 depth 这两个维度必须设置为1
(8)name:可选参数,设置该操作的名称
返回结果:
(1)张量,类型和输入张量相同
三、实例(默认数据组织形式 “NHWC”)
输入(input):batch_size, height, width, channels 的维度分别为 [1, 3, 3, 1]
卷积核(filter):filter_height, filter_width, in_channels, out_channels 的维度分别为 [1, 2, 2, 1]
步幅(strides): batch_size, height, width, channels 的步幅分别为 [1,1,1,1]
(1)padding 为 "VALID"
输出维度计算(ceil为取上整数):
output_h = ceil (( input_h - filter_h + 1 ) / strides_h )
output_w = ceil (( input_w - filter_w + 1 ) / strides_w )
通常输入的数据为方形,即 height = width,output_size = ceil (( input_s - filter_s + 1 ) / strides_s )
本例的输出维度计算过程:
( 3 - 2 + 1 ) / 1 取上整值为 2,所以卷积操作输出的长、宽都是 28 ,batch_size 保持不变为 1 ,输出的通道由卷积核的输出通道数 1 决定,即下面代码中 output_tensor 的维度为:(1, 2, 2, 1)
>>> import tensorflow as tf
>>> import numpy as np
# 通过 numpy 构建输入张量
>>> input_data = [i+1 for i in range(9)]
>>> input_data = np.asarray(input_data)
>>> input_data = input_data.reshape(1,3,3,1)
>>> input_data = input_data.astype(np.float32)
>>> input_data
# array([[[[1.],
# [2.],
# [3.]],
#
# [[4.],
# [5.],
# [6.]],
#
# [[7.],
# [8.],
# [9.]]]], dtype=float32)
>>> input_tensor = tf.constant(input_data, dtype=tf.float32)
>>> input_tensor
# <tf.Tensor 'Const_1:0' shape=(1, 3, 3, 1) dtype=float32>
# 通过 numpy 构建卷积核
>>> filter_data = [i+1 for i in range(4)]
>>> filter_data = np.asarray(filter_data).reshape(2,2,1,1)
>>> filter_data = filter_data.astype(np.float32)
>>> filter_data
# array([[[[1.]],
#
# [[2.]]],
#
#
# [[[3.]],
#
# [[4.]]]], dtype=float32)
>>> filter_tensor = tf.constant(filter_data, dtype=tf.float32)
>>> filter_tensor
# <tf.Tensor 'Const_3:0' shape=(2, 2, 1, 1) dtype=float32>
>>> strides_list = [1,1,1,1]
>>> padding_str = "VALID"
# 使用 TensorFlow 中的二维卷积操作
>>> output_tensor = tf.nn.conv2d(input=input_tensor, filter=filter_tensor, strides=strides_list, padding=padding_str)
>>> output_tensor
# <tf.Tensor 'Conv2D:0' shape=(1, 2, 2, 1) dtype=float32>
# 初始化所有的变量
>>> init_op = tf.global_variables_initializer()
# 通过会话(Session)来运行默认图中的相关计算操作
>>> with tf.Session() as sess:
... sess.run(init_op)
... result = sess.run(output_tensor)
... print(tf.shape(input_data))
... print(input_data)
... print(tf.shape(result))
... print(result)
...
# 2018-12-26 19:35:11.069319: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] #Adding visible gpu devices: 0
# 2018-12-26 19:35:11.072304: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] #Device interconnect StreamExecutor with strength 1 edge matrix:
# 2018-12-26 19:35:11.075594: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
# 2018-12-26 19:35:11.077656: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
# 2018-12-26 19:35:11.080492: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6459 MB memory) -> physical GPU (device: 0, name: Quadro M4000, pci bus id: 0000:03:00.0, compute capability: 5.2)
# Tensor("Shape:0", shape=(4,), dtype=int32)
# [[[[1.]
# [2.]
# [3.]]
#
# [[4.]
# [5.]
# [6.]]
#
# [[7.]
# [8.]
# [9.]]]]
# Tensor("Shape:0", shape=(4,), dtype=int32)
# [[[[37.]
# [47.]]
#
# [[67.]
# [77.]]]]
(2)padding 为 "SAME"
输出维度计算:
output_h = ceil ( input_h / strides_h )
output_w = ceil ( input_w / strides_w )
通常输入的数据为方形,即 height = width,output_size = ceil ( input / strides)
本例的输出维度计算过程:
3 / 1 取上整值为 3,所以卷积操作输出的长、宽都是 28 ,batch_size 保持不变为 1 ,输出的通道由卷积核的输出通道数 1 决定,即下面代码中 output_tensor 的维度为:(1, 3, 3, 1)
>>> import tensorflow as tf
>>> import numpy as np
>>> input_data = [i+1 for i in range(9)]
>>> input_data = np.asarray(input_data)
>>> input_data = input_data.reshape(1,3,3,1)
>>> input_data = input_data.astype(np.float32)
>>> input_tensor = tf.constant(input_data, dtype=tf.float32)
>>> filter_data = [i+1 for i in range(4)]
>>> filter_data = np.asarray(filter_data).reshape(2,2,1,1)
>>> filter_data = filter_data.astype(np.float32)
>>> filter_tensor = tf.constant(filter_data, dtype=tf.float32)
>>> strides_list = [1,1,1,1]
>>> padding_str = "SAME"
>>> output_tensor = tf.nn.conv2d(input=input_tensor, filter=filter_tensor, strides=strides_list, padding=padding_str)
>>> output_tensor
# <tf.Tensor 'Conv2D_1:0' shape=(1, 3, 3, 1) dtype=float32>
>>> init_op = tf.global_variables_initializer()
>>> with tf.Session() as sess:
... sess.run(init_op)
... result = sess.run(output_tensor)
... print(tf.shape(input_data))
... print(input_data)
... print(tf.shape(result))
... print(result)
...
# 2018-12-26 19:48:59.179410: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
# 2018-12-26 19:48:59.465250: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
# name: Quadro M4000 major: 5 minor: 2 memoryClockRate(GHz): 0.7725
# pciBusID: 0000:03:00.0
# totalMemory: 8.00GiB freeMemory: 6.70GiB
# 2018-12-26 19:48:59.471612: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
# 2018-12-26 19:49:00.895949: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
# 2018-12-26 19:49:00.901132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
# 2018-12-26 19:49:00.904139: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
# 2018-12-26 19:49:00.908069: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6459 MB memory) -> physical GPU (device: 0, name: Quadro M4000, pci bus id: 0000:03:00.0, compute capability: 5.2)
Tensor("Shape_2:0", shape=(4,), dtype=int32)
# [[[[1.]
# [2.]
# [3.]]
# [[4.]
# [5.]
# [6.]]
#
# [[7.]
# [8.]
# [9.]]]]
# Tensor("Shape:0", shape=(4,), dtype=int32)
# [[[[37.]
# [47.]
# [21.]]
#
# [[67.]
# [77.]
# [33.]]
#
# [[23.]
# [26.]
# [ 9.]]]]