CNN中的padding

最新推荐文章于 2025-03-19 11:06:16 发布

rain6789

最新推荐文章于 2025-03-19 11:06:16 发布

阅读量2.2w

点赞数 9

在CNN中，卷积和池化是一种很常见的操作，一般认为通过卷积和池化可以降低输入图像的维度，也可以达到一定的旋转不变性和平移不变性；
而在这种操作过程中，图像(或者特征图)的尺寸是怎么变化的呢？
本文主要描述TensorFlow中，使用不同方式做填充后(padding = 'SAME' or 'VALID' )的tensor的size变化。

对于输入，tf中一般用一个4-D的tensor来表示，其shape为[batch_size, in_width, in_height, channels]，卷积核一般也用一个4-D的tensor来表示，其shape为[filter_width, filter_height, input_channels, output_channels]，卷积核移动的步长strides一般为[1, strides, strides, 1].

如果padding模式选择了SAME，那么就需要在必要的时候使用0进行填充，给定输入尺寸，移动步长后，输出尺寸的计算公式如下： $o u t p u t_s i z e = ⌈ \frac{i n p u t_s i z e}{s t r i d e s} ⌉$

out_height = ceil(float(in_height) / float(strides[1]))
out_width  = ceil(float(in_width) / float(strides[2]))

此时，需要填充的0的数量的计算公式为： $p a d d i n g_n u m = ⟮ \begin{matrix} m a x (k e r n e l_s i z e - s t r i d e s, 0) & i f i n p u t_s i z e m o d s t r i d e s = 0 m a x (k e r n e l_s i z e - (i n p u t_s i z e m o d s t r i d e s), 0) & i f i n p u t_s i z e m o d s t r i d e s \neq 0 \end{matrix}$

if (in_height % strides[1] == 0):
  pad_along_height = max(filter_height - strides[1], 0)
else:
  pad_along_height = max(filter_height - (in_height % strides[1]), 0)
if (in_width % strides[2] == 0):
  pad_along_width = max(filter_width - strides[2], 0)
else:
  pad_along_width = max(filter_width - (in_width % strides[2]), 0)

确定了要填充了总的数量后，左边/上边要填充的0的数量为： $p_{l e f t_o r_t o p} = p a d d i n g_n u m / 2$ ，右边/下边要填充的0的数量为： $p_{r i g h t_o r_b o t t o m} = p a d d i n g_n u m - p_{l e f t_o r_t o p}$

pad_top = pad_along_height // 2
pad_bottom = pad_along_height - pad_top
pad_left = pad_along_width // 2
pad_right = pad_along_width - pad_left

如果选择了VALID模式，那么情况就比较简单了，确定了输入尺寸、卷积核尺寸以及步长后，输出的尺寸大小为： $o u t p u t_s i z e = ⌈ \frac{i n p u t_s i z e - k e r n e l_s i z e + 1}{s t r i d e s} ⌉$

out_height = ceil(float(in_height - filter_height + 1) / float(strides[1]))
out_width  = ceil(float(in_width - filter_width + 1) / float(strides[2]))

比如输入图片是28*28的单通道图片，其输入shape为[batch_size, 28, 28, 1];

第一层卷积为 32个 5*5卷积核，其 shape为 [5,5,1,32]，其步长 strides为 [1,1,1,1]，紧接着是第一层的 2*2的 max_pooling，其形状为 [1,2,2,1]，其步长 strides为 [1,2,2,1];
第二层卷积为 64个 5*5卷积核，其 shape为 [5,5,32,64]，其步长 strides为 [1,1,1,1]，紧接着是第一层的 2*2的 max_pooling，其形状为 [1,2,2,1]，其步长 strides为 [1,2,2,1];
padding全部使用 SAME;

那么图像的尺寸经过以上两次卷积，两次池化后的变化如下：
[batch_size, 28, 28, 1]
↓ (第一层卷积)
[batch_size, 28, 28, 32]
↓ (第一层池化)
[batch_size, 14, 14, 32]
↓ (第二层卷积)
[batch_size, 14, 14, 64]
↓ (第二层池化)
[batch_size, 7, 7, 64]

如果上述所有的卷积核，池化核以及步长都保持不变，但是全部使用VALID模式，那么尺寸变化如下：
[batch_size, 28, 28, 1]
↓ (第一层卷积)
[batch_size, 24, 24, 32]
↓ (第一层池化)
[batch_size, 12, 12, 32]
↓ (第二层卷积)
[batch_size, 8, 8, 64]
↓ (第二层池化)
[batch_size, 4, 4, 64]