本文主要展示一下Tensorflow中的 tf.nn.conv2d具体是计算了什么~
先来看一下conv2d的定义
conv2d(
input, # 一个4维张量,采用data_format格式映射每一维度的含义
filter, # 卷积核,一个4维张量,每一维的含义是[filter_height, filter_width, in_channels, out_channels]
strides, # 在图上每次移动的步长,一个4维张量,[1, stride_x ,stride_y ,1]=>[1, 横向移动步长, 纵向移动步长, 1]
padding, # 填充,"SAME", "VALID"二选一, "SAME"表明卷积核只能在图像内部滑动,"VALID"表明卷积核要对每个移动到的点周围的向外扩散卷积核进行卷积,而不是在图像内部滑动。
use_cudnn_on_gpu=True,
data_format='NHWC', # input数据格式,默认为NHWC即[batch, channels, height, width]=>[批大小, 高, 宽, 通道数]
dilations=[1, 1, 1, 1], # 1.5之后引入,先忽略
name=None
)
经过conv2d之后的输出是一个4维张量,每一个维度的含义是 [batch, height_after, width_after, out_channels],
其中 height_after 跟 width_after 的计算关系大致如下:
padding | height_after | width_after |
---|---|---|
SAME | (height/stidey)+1 ( h e i g h t / s t i d e y ) + 1 | (width/stidex)+1 ( w i d t h / s t i d e x ) + 1 |
VALID | ((height−filterheight)/stridey)+1) ( ( h e i g h t − f i l t e r h e i g h t ) / s t r i d e y ) + 1 ) | ((width−filterwidth)/stridex)+1 ( ( w i d t h − f i l t e r w i d t h ) / s t r i d e x ) + 1 |
本文简易实现以padding=VALID为例,也有好多情况没有考虑~
计算公式大概如下所示:
output[b,i,j,k]=∑bm=0∑hi=0∑wj=0input[b,strides[1]∗i+di,strides[2]∗j+dj,q]∗filter[di,dj,q,k]
o
u
t
p
u
t
[
b
,
i
,
j
,
k
]
=
∑
m
=
0
b
∑
i
=
0
h
∑
j
=
0
w
i
n
p
u
t
[
b
,
s
t
r
i
d
e
s
[
1
]
∗
i
+
d
i
,
s
t
r
i
d
e
s
[
2
]
∗
j
+
d
j
,
q
]
∗
f
i
l
t
e
r
[
d
i
,
d
j
,
q
,
k
]
其中
字母 | 含义 |
---|---|
m/b | batch 批个数 |
i/h | 输出的高度 |
j/w | 输出的宽度 |
p | 输入的通道数 |
k | 输出的通道数 |
实现之后大概如下
import numpy as np
def conv2d(
input,
filter,
strides,
padding=None
):
ish = input.shape
fsh = filter.shape
output = np.zeros([ish[0],(ish[1]-fsh[0])//strides[1]+1,(ish[2]-fsh[1])//strides[2]+1,fsh[3]])
osh = output.shape
for m in range(osh[0]):
for i in range(osh[1]):
for j in range(osh[2]):
for di in range(fsh[0]):
for dj in range(fsh[1]):
t = np.dot(
input[m,strides[1]*i+di,strides[2]*j+dj,:],
filter[di,dj,:,:]
)
output[m,i,j] = np.sum(
[
t,
output[m,i,j]
],
axis=0
)
return output
举个例子:
# 先来个简单的
# input.shape = [1,3,3,1]
input = \
[[[0], [1], [2]],
[[3], [4], [5]],
[[6], [7], [8]]]
# filter.shape = [3,3,1,1]
filter = \
[[[[0]], [[1]], [[2]]],
[[[3]], [[4]], [[5]]],
[[[6]], [[7]], [[8]]]]
# strides = [1,1,1,1]
# 那么按照这个配置卷积核应该只能卷积1次,卷积操作对应d的input区域为整体input
# 计算过程为
input_filter3[0,0,:]*filter[0,0,:,:]+\ # 0*0
input_filter3[0,1,:]*filter[0,1,:,:]+\ # 1*1
...+ \
input_filter3[3,3,:]*filter[3,3,:,:] # 8*8
= 204
# 结果为
output = [[[[204.]]]]
output.shape = [1,1,1,1]
# 再来个复杂的
# input.shape = [1,5,5,3]
input = \
[[[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9,10,11], [12,13,14]],
[[15,16,17], [18,19,20], [21,22,23], [24,25,26], [27,28,29]],
[[30,31,32], [33,34,35], [36,37,38], [39,40,41], [42,43,44]],
[[45,46,47], [48,49,50], [51,52,53], [54,55,56], [57,58,59]],
[[60,61,62], [63,64,65], [66,67,68], [69,70,71], [72,73,74]]]
# filter.shape = [3,3,3,5]
filter = \
[[[[ 0], [ 1], [ 2]],
[[ 3], [ 4], [ 5]],
[[ 6], [ 7], [ 8]]],
[[[ 9], [10], [11]],
[[12], [13], [14]],
[[15], [16], [17]]],
[[[18], [19], [20]],
[[21], [22], [23]],
[[24], [25], [26]]]]
# strides.shape = [1,2,2,1]
# padding = 'VALID'
# 那么按照这个配置卷积核应该只能卷积4次,卷积操作对应的input区域为
input_filter0 = \
[[[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8]],
[[15,16,17], [18,19,20], [21,22,23]],
[[30,31,32], [33,34,35], [36,37,38]]]
input_filter1 = \
[[[ 6, 7, 8], [ 9,10,11], [12,13,14]],
[[21,22,23], [24,25,26], [27,28,29]],
[[36,37,38], [39,40,41], [42,43,44]]]
input_filter2 = \
[[[30,31,32], [33,34,35], [36,37,38]],
[[45,46,47], [48,49,50], [51,52,53]],
[[60,61,62], [63,64,65], [66,67,68]]]
input_filter3 = \
[[[36,37,38], [39,40,41], [42,43,44]],
[[51,52,53], [54,55,56], [57,58,59]],
[[66,67,68], [69,70,71], [72,73,74]]]
# 然后就是这四个卷积核对应的跟filter相乘,并求和~
# 以input_filter0为例
input_filter3[0,0,:]*filter[0,0,:,:]+\ # 0*0+1*1+2*2
input_filter3[0,1,:]*filter[0,1,:,:]+\ # 3*3+4*4+5*5
...+ \
input_filter3[3,3,:]*filter[3,3,:,:] # 36*24+37*25+38*26
= 9279
# 最后整体可求的,结果为
[[[[ 9279.][11385.]]
[[19809.][21915.]]]]
shape=[1,2,2,1]
代码地址:
https://github.com/xo1988/numpy_deeplearning/blob/master/conv2d.py