TensorFlow实现卷积、反卷积和空洞卷积
TensorFlow已经实现了卷积(tf.nn.conv2d卷积函数),反卷积(tf.nn.conv2d_transpose反卷积函数)以及空洞卷积(tf.nn.atrous_conv2d空洞卷积(dilated convolution)),这三个函数的参数理解,可参考网上。比较难的是计算维度,这里提供三种方式封装卷积、反卷积和空洞卷积的方法,方面调用:
一、卷积
- 输入图片大小 W×W
- Filter大小 F×F
- 步长 S
- padding的像素数 P
于是我们可以得出
N = [(W − F + 2P )/S]+1
输出图片大小为 N×N,卷积维度计算方法:https://blog.csdn.net/qq_21997625/article/details/87252780
可以使用TensorFlow高级的API的slim.conv2d
net = slim.conv2d(inputs=inputs,
num_outputs=num_outputs,
weights_initializer=tf.truncated_normal_initializer(stddev=0.01),
weights_regularizer=reg,
kernel_size=[kernel, kernel],
activation_fn=activation_fn,
stride=stride,
padding=padding,
trainable=True,
scope=scope)
一些特殊情况,可以自己对feature进行填充:
def slim_conv2d(inputs,num_outputs,stride,padding,kernel,activation_fn,reg,scope):
if padding=="VALID":
padding_size=int(kernel /2)
inputs = tf.pad(inputs, paddings=[[0, 0], [padding_size, padding_size], [padding_size, padding_size], [0, 0]],
mode='REFLECT')
print("pad.inputs.shape:{}".format(inputs.get_shape()))
net = slim.conv2d(inputs=inputs,
num_outputs=num_outputs,
weights_initializer=tf.truncated_normal_initializer(stddev=0.01),
weights_regularizer=reg,
kernel_size=[kernel, kernel],
activation_fn=activation_fn,
stride=stride,
padding=padding,
trainable=True,
scope=scope)
print("net.{}.shape:{}".format(scope,net.get_shape()))
return net
下面是使用TensorFlow自己封装的卷积,与TensorFlow自带的slim.conv2d高级API类似的功能
def conv2D_layer(inputs,num_outputs,kernel_size,activation_fn,stride,padding,scope,weights_regularizer):
'''
根据tensorflow slim模块封装一个卷积层API:包含卷积和激活函数,但不包含池化层
:param inputs:
:param num_outputs:
:param kernel_size: 卷积核大小,一般是[1,1],[3,3],[5,5]
:param activation_fn:激活函数
:param stride: 例如:[2,2]
:param padding: SAME or VALID
:param scope: scope name
:param weights_regularizer:正则化,例如:weights_regularizer = slim.l2_regularizer(scale=0.01)
:return:
'''
with tf.variable_scope(name_or_scope=scope):
in_channels = inputs.get_shape().as_list()[3]
# kernel=[height, width, in_channels, output_channels]
kernel=[kernel_size[0],kernel_size[1],in_channels,num_outputs]
strides=[1,stride[0],stride[1],1]
# filter_weight=tf.Variable(initial_value=tf.truncated_normal(shape,stddev=0.1))
filter_weight = slim.variable(name='weights',
shape=kernel,
initializer=tf.truncated_normal_initializer(stddev=0.1),
regularizer=weights_regularizer)
bias = tf.Variable(tf.constant(0.01, shape=[num_outputs]))
inputs = tf.nn.conv2d(inputs, filter_weight, strides, padding=padding) + bias
if not activation_fn is None:
inputs = activation_fn(inputs)
return inputs
二、反卷积
TensorFlow的高级API已经封装好了反卷积函数,分别是: tf.layers.conv2d_transpose以及slim.conv2d_transpose,其用法基本一样,如果想使用tf.nn.conv2d_transpose实现反卷积功能,那么需要自己根据padding='VALID'和‘SAME’计算输出维度,这里提供一个函数deconv_output_length函数,可以根据输入的维度,filter_size, padding, stride自动计算其输出维度。
# -*-coding: utf-8 -*-
"""
@Project: YNet-python
@File : myTest.py
@Author : panjq
@E-mail : pan_jinquan@163.com
@Date : 2019-01-10 15:51:23
"""
import tensorflow as tf
import tensorflow.contrib.slim as slim
def deconv_output_length(input_length, filter_size, padding, stride):
"""Determines output length of a transposed convolution given input length.
Arguments:
input_length: integer.
filter_size: integer.
padding: one of SAME or VALID ,FULL
stride: integer.
Returns:
The output length (integer).
"""
if input_length is None:
return None
# 默认SAME
input_length *= stride
if padding == 'VALID':
input_length += max(filter_size - stride, 0)
elif padding == 'FULL':
input_length -= (stride + filter_size - 2)
return input_length
def conv2D_transpose_layer(inputs,num_outputs,kernel_size,activation_fn,stride,padding,scope,weights_regularizer):
'''
实现反卷积的API:包含反卷积和激活函数,但不包含池化层
:param inputs:input Tensor=[batch, in_height, in_width, in_channels]
:param num_outputs:
:param kernel_size: 卷积核大小,一般是[1,1],[3,3],[5,5]
:param activation_fn:激活函数
:param stride: 例如:[2,2]
:param padding: SAME or VALID
:param scope: scope name
:param weights_regularizer:正则化,例如:weights_regularizer = slim.l2_regularizer(scale=0.01)
:return:
'''
with tf.variable_scope(name_or_scope=scope):
# shape = [batch_size, height, width, channel]
in_shape = inputs.get_shape().as_list()
# 计算反卷积的输出维度
output_height=deconv_output_length(in_shape[1], kernel_size[0], padding=padding, stride=stride[0])
output_width =deconv_output_length(in_shape[2], kernel_size[1], padding=padding, stride=stride[1])
output_shape=[in_shape[0],output_height,output_width,num_outputs]
strides=[1,stride[0],stride[1],1]
# kernel=[kernel_size, kernel_size, output_channel, input_channel ]
kernel=[kernel_size[0],kernel_size[1],num_outputs,in_shape[3]]
filter_weight = slim.variable(name='weights',
shape=kernel,
initializer=tf.truncated_normal_initializer(stddev=0.1),
regularizer=weights_regularizer)
bias = tf.Variable(tf.constant(0.01, shape=[num_outputs]))
inputs = tf.nn.conv2d_transpose(value=inputs, filter=filter_weight,output_shape=output_shape, strides=strides, padding=padding) + bias
if not activation_fn is None:
inputs = activation_fn(inputs)
return inputs
if __name__ == "__main__":
inputs = tf.ones(shape=[4, 100, 100, 3])
stride=2
kernel_size=10
padding="SAME"
net1 = tf.layers.conv2d_transpose(inputs=inputs,
filters=32,
kernel_size=kernel_size,
strides=stride,
padding=padding)
net2 = slim.conv2d_transpose(inputs=inputs,
num_outputs=32,
kernel_size=[kernel_size, kernel_size],
stride=[stride, stride],
padding=padding)
net3 = conv2D_transpose_layer(inputs=inputs,
num_outputs=32,
kernel_size=[kernel_size, kernel_size],
activation_fn=tf.nn.relu,
stride=[stride, stride],
padding=padding,
scope="conv2D_transpose_layer",
weights_regularizer=None)
print("net1.shape:{}".format(net1.get_shape()))
print("net2.shape:{}".format(net2.get_shape()))
print("net3.shape:{}".format(net3.get_shape()))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
三、空洞卷积:增大感受野
Dilated/Atrous conv 空洞卷积/多孔卷积,又名扩张卷积(dilated convolutions),向卷积层引入了一个称为 “扩张率(dilation rate)”的新参数,该参数定义了卷积核处理数据时各值的间距。该结构的目的是在不用pooling(pooling层会导致信息损失)且计算量相当的情况下,提供更大的感受野
在空洞卷积中有个重要的参数叫rate,这个参数代表了空洞的大小。 要理解空洞概念和如何操作可以从两个角度去看。
1)从原图角度,所谓空洞就是在原图上做采样。采样的频率是根据rate参数来设置的,当rate为1时候,就是原图不丢失任何信息采样,此时卷积操作就是标准的卷积操作,当rate>1,比如2的时候,就是在原图上每隔一(rate-1)个像素采样,如图b,可以把红色的点想象成在原图上的采样点,然后将采样后的图像与kernel做卷积,这样做其实变相增大了感受野。
2)从kernel角度去看空洞的话就是扩大kernel的尺寸,在kernel中,相邻点之间插入rate-1个零,然后将扩大的kernel和原图做卷积 ,这样还是增大了感受野。
3)标准卷积为了提高感受野,可以通过池化pooling下采样,降低图像尺度的同时增大感受野,但pooling本身是不可学习的,也会丢失很多细节信息。而dilated conv空洞卷积,不需要pooling,也能有较大的感受野看到更多的信息。
4)增大卷积核的大小也可以提高感受野,但这会增加计算量
标准卷积:
空洞卷积 :
在VGG网络中就证明了使用小卷积核叠加来取代大卷积核可以起到减少参数同时达到大卷积核同样大小感受野的功效。但是通过叠加小卷积核来扩大感受野只能线性增长,公式为(kernelSize−1)∗layers+1(kernelSize−1)∗layers+1,,也就是线性增长,而空洞卷积可以以指数级增长感受野。
参考资料:https://blog.csdn.net/silence2015/article/details/79748729
def dilated_conv2D_layer(inputs,num_outputs,kernel_size,activation_fn,rate,padding,scope,weights_regularizer):
'''
使用Tensorflow封装的空洞卷积层API:包含空洞卷积和激活函数,但不包含池化层
:param inputs:
:param num_outputs:
:param kernel_size: 卷积核大小,一般是[1,1],[3,3],[5,5]
:param activation_fn:激活函数
:param rate:
:param padding: SAME or VALID
:param scope: scope name
:param weights_regularizer:正则化,例如:weights_regularizer = slim.l2_regularizer(scale=0.01)
:return:
'''
with tf.variable_scope(name_or_scope=scope):
in_channels = inputs.get_shape().as_list()[3]
kernel=[kernel_size[0],kernel_size[1],in_channels,num_outputs]
# filter_weight=tf.Variable(initial_value=tf.truncated_normal(shape,stddev=0.1))
filter_weight = slim.variable(name='weights',
shape=kernel,
initializer=tf.truncated_normal_initializer(stddev=0.1),
regularizer=weights_regularizer)
bias = tf.Variable(tf.constant(0.01, shape=[num_outputs]))
# inputs = tf.nn.conv2d(inputs,filter_weight, strides, padding=padding) + bias
inputs = tf.nn.atrous_conv2d(inputs, filter_weight, rate=rate, padding=padding) + bias
if not activation_fn is None:
inputs = activation_fn(inputs)
return inputs