Written by | title | date |
---|---|---|
zhengchu1994 | 《A guide to convolution arithmetic for deep learning》 | 2018-05-26 15:46:30 |
仿射变换(affine transformations)
- 定义:即向量乘上矩阵产生的输出加上bias之后投进激活函数。
- 缺点:所有的axis都等同对待的情况下没有偏置冗余,拓扑信息没有利用。
离散卷积(Discrete convolutions)
- 定义:类似线性变换,对输入有稀疏作用, 只有部分输入信息保留在输出;参数得到重用。
- 组成:
- input feature maps AND output feature maps:
如图,淡蓝色的是input feature maps,阴影处是卷积核在input feature maps
上滑动(slides
)采集特征,即重叠的部分对应元素相乘后求加和的结果。得到绿色的结果是输出,即output feature maps。
定义
叠加input feature maps
- 对于叠加在一起的
input feature maps
,用不同类型的kernel
去进行卷积得到多个output feature maps
.
如图,用
(n=3,m=2,k1=3,k2=3)
(
n
=
3
,
m
=
2
,
k
1
=
3
,
k
2
=
3
)
的离散卷积核采样,即output feature maps
的数量等于3,input feature maps
的数量等于2,3×3大小的核函数。最左边先对input feature maps 1用核函数
w1,1
w
1
,
1
做卷积,在对input feature maps 2用核函数
w1,2
w
1
,
2
做卷积,得到的两个output逐个元素求和得到左边的 output feature maps,中间的用核函数
w2
w
2
做卷积, 最右边的用核函数
w3
w
3
做卷积,即得到三个叠加的output feature maps,同时知道output feature maps的数量等于核函数的数量。
strides
- 解释:步长类似于对
input
做多少程度的子采样(subsampling
),即核保留输出不一样的程度。
zero padding
- 为了得到合适的
output
outputs size 计算公式:
1.卷积后的输出大小
o
o
等于:
其中 i i 是输入
input feature map
的大小, 是padding大小,
k
k
是核函数大小, 是strides大小。
2.池化pooling后的输出大小,因为没有padding,所以 p p 为0,其输出 为:
tensorflow卷积操作
conv2d
tf.nn.conv2d(input,filter,strides,padding,use_on_gpu,data_format,name=None)
input
:默认的顺序是[batch,in_height,in_width,in_channels]
,该顺序是NHWC
,还有一种是NCWH
,可以通过设data_format
改变图片的格式顺序.filter
:表示一个核(或者称为滤波器),格式为[filter_height,filter_width,in_channels,out_channels]
.- ‘strides’:核在input上滑动的步长。
padding
:参数SAME
或者VALID
。
代码
import tensorflow as tf
#Generate the filename queue, and read the gif files contents
filename_queue = tf.train.string_input_producer(tf.train.match_filenames_once("data/Akatsuki.(NARUTO).full.488229.jpg"))
reader = tf.WholeFileReader()
key, value = reader.read(filename_queue)
image=tf.image.decode_gif(value)
#Define the kernel parameters
kernel=tf.constant(
[
[[[-1.]],[[-2.]],[[-1.]]],
[[[0.]],[[0.]],[[0.]]],
[[[1.]],[[1.]],[[1.]]]
]
)
#Define the train coordinator
coord = tf.train.Coordinator()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
threads = tf.train.start_queue_runners(sess,coord=coord)
#Get first image
image_tensor = tf.image.rgb_to_grayscale(sess.run([image])[0])
#apply convolution, preserving the image size
imagen_convoluted_tensor=tf.nn.conv2d(tf.cast(image_tensor, tf.float32),kernel,[1,1,1,1],"SAME")
#Prepare to save the convolution option
file=open ("Sobel.png", "wb+")
print("write done!")
#Cast to uint8 (0..255), previous scalation, because the convolution could alter the scale of the final image
out=tf.image.encode_png(tf.reshape(tf.cast(imagen_convoluted_tensor/tf.reduce_max(imagen_convoluted_tensor)*255.,tf.uint8), tf.shape(imagen_convoluted_tensor.eval()[0]).eval()))
file.write(out.eval())
file.close()
coord.request_stop()
#coord.join :此调用将一直阻塞,直到一组线程终止为止。
coord.join(threads)
结果:
输入图片:
输出图片:
池化操作
- 作用:在
feature maps
上滑动一个窗口,对窗口内的信息做pooling function进行缩减,等于用函数对其进行子区域的概括(summarize subregions),表示对重要信息的一种压缩表示。 - 缺点:让模型失去了位置特性信息。
如图的max pooling
进行子区域的特征提取。
max_pool
tf.nn.max_pool(value,ksize,stride,padding,data_format,name)
value
:shape为[batch length,height,weight,channels]
的数据。ksize
: 整形列表,表示窗口大小。stride
:在input上滑动的步长。
tensorflow代码
import tensorflow as tf
#Generate the filename queue, and read the gif files contents
filename_queue = tf.train.string_input_producer(tf.train.match_filenames_once("data/Akatsuki.(NARUTO).full.488229.jpg"))
reader = tf.WholeFileReader()
key, value = reader.read(filename_queue)
image = tf.image.decode_gif(value)
#Define the coordinator
coord = tf.train.Coordinator()
def normalize_and_encode (img_tensor):
image_dimensions = tf.shape(img_tensor.eval()[0]).eval()
return tf.image.encode_jpeg(tf.reshape(tf.cast(img_tensor, tf.uint8), image_dimensions))
with tf.Session() as sess:
maxfile=open ("maxpool.jpeg", "wb+")
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
threads = tf.train.start_queue_runners(coord=coord)
image_tensor = tf.image.rgb_to_grayscale(sess.run([image])[0])
maxed_tensor=tf.nn.avg_pool(tf.cast(image_tensor, tf.float32),[1,2,2,1],[1,2,2,1],"SAME")
averaged_tensor=tf.nn.avg_pool(tf.cast(image_tensor, tf.float32),[1,2,2,1],[1,2,2,1],"SAME")
maxfile.write(normalize_and_encode(maxed_tensor).eval())
coord.request_stop()
maxfile.close()
coord.join(threads)
输出图片:
结合在一起用Mnist
import tensorflow as tf
%matplotlib inline
import matplotlib.pyplot as plt
# Import MINST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
#Show the first training image
plt.imshow(mnist.train.images[0].reshape((28, 28), order='C'), cmap='Greys', interpolation='nearest')
# Parameters
batch_size = 128
learning_rate = 0.05
number_iterations = 2000
steps = 10
# Network Parameters
n_input = 784 # 28x28 images
n_classes = 10 # 10 digit classes
dropout = 0.80 # Dropout probability
# tf Graph input
X = tf.placeholder(tf.float32, [None, n_input])
Y = tf.placeholder(tf.float32, [None, n_classes])
keep_prob = tf.placeholder(tf.float32) #dropout (keep probability)
# Create some wrappers for simplicity
def conv2d(x, W, b, strides=1):
# Conv2D wrapper, with bias and relu activation
x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME')
x = tf.nn.bias_add(x, b)
return tf.nn.relu(x)
def subsampling(x, k=2):
# MaxPool2D wrapper
return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],
padding='SAME')
# Create model
def conv_net(x_in, weights, biases, dropout):
# Reshape input picture
x_in = tf.reshape(x_in, shape=[-1, 28, 28, 1])
# Convolution Layer 1
conv_layer_1 = conv2d(x_in, weights['wc1'], biases['bc1'])
# Subsampling
conv_layer_1 = subsampling(conv_layer_1, k=2)
# Convolution Layer 2
conv_layer_2 = conv2d(conv_layer_1, weights['wc2'], biases['bc2'])
# Subsampling
conv_layer_2 = subsampling(conv_layer_2, k=2)
# Fully connected layer
# Reshape conv_layer_2 output to fit fully connected layer input
#为了与全连接层的权值相乘做的变换
fully_connected_layer = tf.reshape(conv_layer_2, [-1, weights['wd1'].get_shape().as_list()[0]])
fully_connected_layer = tf.add(tf.matmul(fully_connected_layer, weights['wd1']), biases['bd1'])
fully_connected_layer = tf.nn.relu(fully_connected_layer)
# Apply Dropout
fully_connected_layer = tf.nn.dropout(fully_connected_layer, dropout)
# Output, class prediction
prediction_output = tf.add(tf.matmul(fully_connected_layer, weights['out']), biases['out'])
return prediction_output
# Store layers weight & bias
weights = {
# 5x5 convolutional units, 1 input, 32 outputs
'wc1': tf.Variable(tf.random_normal([5, 5, 1, 32])),
# 5x5 convolutional units, 32 inputs, 64 outputs
'wc2': tf.Variable(tf.random_normal([5, 5, 32, 64])),
# fully connected, 7*7*64 inputs, 1024 outputs
'wd1': tf.Variable(tf.random_normal([7*7*64, 1024])),
# 1024 inputs, 10 outputs (class prediction)
'out': tf.Variable(tf.random_normal([1024, n_classes]))
}
biases = {
'bc1': tf.Variable(tf.random_normal([32])),
'bc2': tf.Variable(tf.random_normal([64])),
'bd1': tf.Variable(tf.random_normal([1024])),
'out': tf.Variable(tf.random_normal([n_classes]))
}
# Construct model
pred = conv_net(X, weights, biases, keep_prob)
# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=Y,logits=pred))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
# Evaluate model
correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
# Initializing the variables
init = tf.gloable_variables_initializer()
# Launch the graph
with tf.Session() as sess:
sess.run(init)
step = 1
# Keep training until reach max iterations
while step * batch_size < number_iterations:
batch_x, batch_y = mnist.train.next_batch(batch_size)
test = batch_x[0]
fig = plt.figure()
plt.imshow(test.reshape((28, 28), order='C'), cmap='Greys',
interpolation='nearest')
# Run optimization op (backprop)
sess.run(optimizer, feed_dict={X: batch_x, Y: batch_y,
keep_prob: dropout})
if step % steps == 0:
# Calculate batch loss and accuracy
loss, acc = sess.run([cost, accuracy], feed_dict={X: batch_x,
Y: batch_y,
keep_prob: 1.})
print ("Iter " + str(step*batch_size) + ", Minibatch Loss= " + \
"{:.6f}".format(loss) + ", Training Accuracy= " + \
"{:.5f}".format(acc))
step += 1
# Calculate accuracy for 256 mnist test images
print ("Testing Accuracy:", \
sess.run(accuracy, feed_dict={X: mnist.test.images[:256],
Y: mnist.test.labels[:256],
keep_prob: 1.}))
其中用的dropout
- 作用:随机选择一些权重赋值为0,达到“解关联(decorrelation)”的作用。
import tensorflow as tf
x = [1.0, 0.5, 0.75, 0.25, 0.2, 0.8, 0.4, 0.6]
dropout = tf.nn.dropout(x, 0.5)
with tf.Session() as sess:
print(sess.run(dropout))
[2. 1. 1.5 0. 0.4 1.6 0.8 0. ]
转置卷积(Transposed convolution)
- 卷积函数可以看做是一个稀疏矩阵 C C ,比如input feature map是 ,那么扁平化为 1×16 1 × 16 的向量,卷积 C C 是一个 的矩阵,得到output feature map 是 1×4 1 × 4 , 还需要进一步reshape之后是 2×2 2 × 2 的格式;反向传播误差的时候类似,乘以的矩阵是 C C 的转置 ,。
- 转置卷积类似于相反的操作,input feature map先乘以 CT C T ,误差计算时乘以 C C <script id="MathJax-Element-25" type="math/tex">C</script>。
代码:待续。