简单理解CNNs的结构，转置卷积，附上tensorflow实现

最新推荐文章于 2024-07-26 20:14:05 发布

NockinOnHeavensDoor

最新推荐文章于 2024-07-26 20:14:05 发布

阅读量1.1k

点赞数

分类专栏：神经网络 tensorflow

本文链接：https://blog.csdn.net/NockinOnHeavensDoor/article/details/80575890

版权

tensorflow 同时被 2 个专栏收录

64 篇文章 0 订阅

订阅专栏

神经网络

28 篇文章 1 订阅

订阅专栏

Written by	title	date
zhengchu1994	《A guide to convolution arithmetic for deep learning》	2018-05-26 15:46:30

仿射变换(affine transformations)

定义：即向量乘上矩阵产生的输出加上bias之后投进激活函数。
缺点：所有的axis都等同对待的情况下没有偏置冗余，拓扑信息没有利用。

离散卷积(Discrete convolutions)

定义：类似线性变换，对输入有稀疏作用，只有部分输入信息保留在输出；参数得到重用。
组成：
input feature maps AND output feature maps：

如图，淡蓝色的是input feature maps，阴影处是卷积核在input feature maps上滑动(slides)采集特征，即重叠的部分对应元素相乘后求加和的结果。得到绿色的结果是输出，即output feature maps。

定义

叠加input feature maps

对于叠加在一起的input feature maps，用不同类型的kernel去进行卷积得到多个output feature maps.

如图，用 $(n=3,m=2,k_1=3,k_2=3)$ 的离散卷积核采样，即output feature maps的数量等于3，input feature maps的数量等于2，3×3大小的核函数。最左边先对input feature maps 1用核函数 $w_{1,1}$ 做卷积，在对input feature maps 2用核函数 $w_{1,2}$ 做卷积，得到的两个output逐个元素求和得到左边的 output feature maps，中间的用核函数 $w_{2}$ 做卷积，最右边的用核函数 $w_3$ 做卷积，即得到三个叠加的output feature maps，同时知道output feature maps的数量等于核函数的数量。

strides

解释：步长类似于对input做多少程度的子采样(subsampling)，即核保留输出不一样的程度。

zero padding

为了得到合适的output

outputs size 计算公式：

1.卷积后的输出大小 $\mathcal o$ 等于：

o = ⌊ i + 2 p - k s ⌋ + 1

$\mathcal o = \lfloor \frac{i + 2p - k}{s}\rfloor + 1$
其中

i i $i$ 是输入input feature map的大小，

p

$p$ 是padding大小，

k k $k$ 是核函数大小，

s

$s$ 是strides大小。
2.池化pooling后的输出大小，因为没有padding，所以

p p $p$ 为0，其输出

o

$\mathcal o$ 为：

o = ⌊ i - k s ⌋ + 1

$\mathcal o = \lfloor \frac{i - k}{s}\rfloor + 1$

tensorflow卷积操作

conv2d

tf.nn.conv2d(input,filter,strides,padding,use_on_gpu,data_format,name=None)

input：默认的顺序是[batch,in_height,in_width,in_channels]，该顺序是NHWC，还有一种是NCWH，可以通过设data_format改变图片的格式顺序.
filter：表示一个核(或者称为滤波器)，格式为 [filter_height,filter_width,in_channels,out_channels].
‘strides’：核在input上滑动的步长。
padding：参数SAME或者VALID。

代码

import tensorflow as tf

#Generate the filename queue, and read the gif files contents
filename_queue = tf.train.string_input_producer(tf.train.match_filenames_once("data/Akatsuki.(NARUTO).full.488229.jpg"))
reader = tf.WholeFileReader()
key, value = reader.read(filename_queue)
image=tf.image.decode_gif(value)

#Define the kernel parameters
kernel=tf.constant(
        [
         [[[-1.]],[[-2.]],[[-1.]]],
         [[[0.]],[[0.]],[[0.]]],
         [[[1.]],[[1.]],[[1.]]]
         ]
    )

#Define the train coordinator
coord = tf.train.Coordinator()

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    sess.run(tf.local_variables_initializer())
    threads = tf.train.start_queue_runners(sess,coord=coord)
    #Get first image
    image_tensor = tf.image.rgb_to_grayscale(sess.run([image])[0])
    #apply convolution, preserving the image size
    imagen_convoluted_tensor=tf.nn.conv2d(tf.cast(image_tensor, tf.float32),kernel,[1,1,1,1],"SAME")
    #Prepare to save the convolution option
    file=open ("Sobel.png", "wb+")
    print("write done!")
    #Cast to uint8 (0..255), previous scalation, because the convolution could alter the scale of the final image
    out=tf.image.encode_png(tf.reshape(tf.cast(imagen_convoluted_tensor/tf.reduce_max(imagen_convoluted_tensor)*255.,tf.uint8), tf.shape(imagen_convoluted_tensor.eval()[0]).eval()))
    file.write(out.eval())
    file.close()
    coord.request_stop()
#coord.join :此调用将一直阻塞，直到一组线程终止为止。
coord.join(threads)

结果：

输入图片：
输出图片：

池化操作

作用：在feature maps上滑动一个窗口，对窗口内的信息做pooling function进行缩减，等于用函数对其进行子区域的概括(summarize subregions)，表示对重要信息的一种压缩表示。
缺点：让模型失去了位置特性信息。

如图的max pooling进行子区域的特征提取。

max_pool

tf.nn.max_pool(value,ksize,stride,padding,data_format,name)

value:shape为[batch length,height,weight,channels]的数据。
ksize: 整形列表，表示窗口大小。
stride:在input上滑动的步长。

tensorflow代码

import tensorflow as tf

#Generate the filename queue, and read the gif files contents
filename_queue = tf.train.string_input_producer(tf.train.match_filenames_once("data/Akatsuki.(NARUTO).full.488229.jpg"))
reader = tf.WholeFileReader()
key, value = reader.read(filename_queue)
image = tf.image.decode_gif(value)

#Define the  coordinator
coord = tf.train.Coordinator()

def normalize_and_encode (img_tensor):
    image_dimensions = tf.shape(img_tensor.eval()[0]).eval()
    return tf.image.encode_jpeg(tf.reshape(tf.cast(img_tensor, tf.uint8), image_dimensions))

with tf.Session() as sess:
    maxfile=open ("maxpool.jpeg", "wb+")
    sess.run(tf.global_variables_initializer())
    sess.run(tf.local_variables_initializer())
    threads = tf.train.start_queue_runners(coord=coord)

    image_tensor = tf.image.rgb_to_grayscale(sess.run([image])[0])

    maxed_tensor=tf.nn.avg_pool(tf.cast(image_tensor, tf.float32),[1,2,2,1],[1,2,2,1],"SAME")
    averaged_tensor=tf.nn.avg_pool(tf.cast(image_tensor, tf.float32),[1,2,2,1],[1,2,2,1],"SAME")

    maxfile.write(normalize_and_encode(maxed_tensor).eval())
    coord.request_stop()
    maxfile.close()
coord.join(threads)

输出图片：

结合在一起用Mnist

import tensorflow as tf
%matplotlib inline
import matplotlib.pyplot as plt

# Import MINST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
#Show the first training image
plt.imshow(mnist.train.images[0].reshape((28, 28), order='C'), cmap='Greys',  interpolation='nearest')

# Parameters
batch_size = 128
learning_rate = 0.05
number_iterations = 2000
steps = 10

# Network Parameters
n_input = 784 # 28x28 images
n_classes = 10 # 10 digit classes
dropout = 0.80 # Dropout probability

# tf Graph input
X = tf.placeholder(tf.float32, [None, n_input])
Y = tf.placeholder(tf.float32, [None, n_classes])
keep_prob = tf.placeholder(tf.float32) #dropout (keep probability)

# Create some wrappers for simplicity
def conv2d(x, W, b, strides=1):
    # Conv2D wrapper, with bias and relu activation
    x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME')
    x = tf.nn.bias_add(x, b)
    return tf.nn.relu(x)


def subsampling(x, k=2):
    # MaxPool2D wrapper
    return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],
                          padding='SAME')


# Create model
def conv_net(x_in, weights, biases, dropout):
    # Reshape input picture
    x_in = tf.reshape(x_in, shape=[-1, 28, 28, 1])

    # Convolution Layer 1
    conv_layer_1 = conv2d(x_in, weights['wc1'], biases['bc1'])
    # Subsampling
    conv_layer_1 = subsampling(conv_layer_1, k=2)

    # Convolution Layer 2
    conv_layer_2 = conv2d(conv_layer_1, weights['wc2'], biases['bc2'])
    # Subsampling
    conv_layer_2 = subsampling(conv_layer_2, k=2)

    # Fully connected layer
    # Reshape conv_layer_2 output to fit fully connected layer input
    #为了与全连接层的权值相乘做的变换
    fully_connected_layer = tf.reshape(conv_layer_2, [-1, weights['wd1'].get_shape().as_list()[0]])
    fully_connected_layer = tf.add(tf.matmul(fully_connected_layer, weights['wd1']), biases['bd1'])
    fully_connected_layer = tf.nn.relu(fully_connected_layer)
    # Apply Dropout
    fully_connected_layer = tf.nn.dropout(fully_connected_layer, dropout)

    # Output, class prediction
    prediction_output = tf.add(tf.matmul(fully_connected_layer, weights['out']), biases['out'])
    return prediction_output

# Store layers weight & bias
weights = {
    # 5x5 convolutional units, 1 input, 32 outputs
    'wc1': tf.Variable(tf.random_normal([5, 5, 1, 32])),
    # 5x5 convolutional units, 32 inputs, 64 outputs
    'wc2': tf.Variable(tf.random_normal([5, 5, 32, 64])),
    # fully connected, 7*7*64 inputs, 1024 outputs
    'wd1': tf.Variable(tf.random_normal([7*7*64, 1024])),
    # 1024 inputs, 10 outputs (class prediction)
    'out': tf.Variable(tf.random_normal([1024, n_classes]))
}

biases = {
    'bc1': tf.Variable(tf.random_normal([32])),
    'bc2': tf.Variable(tf.random_normal([64])),
    'bd1': tf.Variable(tf.random_normal([1024])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

# Construct model
pred = conv_net(X, weights, biases, keep_prob)

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=Y,logits=pred))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# Evaluate model
correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# Initializing the variables
init = tf.gloable_variables_initializer()

# Launch the graph
with tf.Session() as sess:
    sess.run(init)
    step = 1
    # Keep training until reach max iterations
    while step * batch_size < number_iterations:
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        test = batch_x[0]
        fig = plt.figure()
        plt.imshow(test.reshape((28, 28), order='C'), cmap='Greys',
                   interpolation='nearest')
        # Run optimization op (backprop)
        sess.run(optimizer, feed_dict={X: batch_x, Y: batch_y,
                                       keep_prob: dropout})
        if step % steps == 0:
            # Calculate batch loss and accuracy
            loss, acc = sess.run([cost, accuracy], feed_dict={X: batch_x,
                                                              Y: batch_y,
                                                              keep_prob: 1.})
            print ("Iter " + str(step*batch_size) + ", Minibatch Loss= " + \
                  "{:.6f}".format(loss) + ", Training Accuracy= " + \
                  "{:.5f}".format(acc))
        step += 1

    # Calculate accuracy for 256 mnist test images
    print ("Testing Accuracy:", \
        sess.run(accuracy, feed_dict={X: mnist.test.images[:256],
                                      Y: mnist.test.labels[:256],
                                      keep_prob: 1.}))

其中用的dropout

作用：随机选择一些权重赋值为0，达到“解关联(decorrelation)”的作用。

import tensorflow as tf
x = [1.0, 0.5, 0.75, 0.25, 0.2, 0.8, 0.4, 0.6]
dropout = tf.nn.dropout(x, 0.5)
with tf.Session() as sess:
    print(sess.run(dropout))
[2.  1.  1.5 0.  0.4 1.6 0.8 0. ]

转置卷积(Transposed convolution)

卷积函数可以看做是一个稀疏矩阵 $C$ ,比如input feature map是 $4 \times 4$ ，那么扁平化为 $1 \times 16$ 的向量，卷积 $C$ 是一个 $16 \times 4$ 的矩阵，得到output feature map 是 $1\times 4$ , 还需要进一步reshape之后是 $2\times 2$ 的格式；反向传播误差的时候类似，乘以的矩阵是 $C$ 的转置， $C^T$ 。
转置卷积类似于相反的操作，input feature map先乘以 $C^T$ ，误差计算时乘以 C <script id="MathJax-Element-25" type="math/tex">C</script>。