简单理解CNNs的结构,转置卷积 ,附上tensorflow实现


Written bytitledate
zhengchu1994《A guide to convolution arithmetic for deep learning》2018-05-26 15:46:30

仿射变换(affine transformations)

  • 定义:即向量乘上矩阵产生的输出加上bias之后投进激活函数。
  • 缺点:所有的axis都等同对待的情况下没有偏置冗余,拓扑信息没有利用。

离散卷积(Discrete convolutions)

  • 定义:类似线性变换,对输入有稀疏作用, 只有部分输入信息保留在输出;参数得到重用。
  • 组成:
  • input feature maps AND output feature maps
    1

如图,淡蓝色的是input feature maps,阴影处是卷积核在input feature maps上滑动(slides)采集特征,即重叠的部分对应元素相乘后求加和的结果。得到绿色的结果是输出,即output feature maps

定义

叠加input feature maps

  • 对于叠加在一起的input feature maps,用不同类型的kernel去进行卷积得到多个output feature maps.

如图,用 (n=3,m=2,k1=3,k2=3) ( n = 3 , m = 2 , k 1 = 3 , k 2 = 3 ) 的离散卷积核采样,即output feature maps的数量等于3,input feature maps的数量等于2,3×3大小的核函数。最左边先对input feature maps 1用核函数 w1,1 w 1 , 1 做卷积,在对input feature maps 2用核函数 w1,2 w 1 , 2 做卷积,得到的两个output逐个元素求和得到左边的 output feature maps,中间的用核函数 w2 w 2 做卷积, 最右边的用核函数 w3 w 3 做卷积,即得到三个叠加的output feature maps,同时知道output feature maps的数量等于核函数的数量

strides

  • 解释:步长类似于对input做多少程度的子采样(subsampling),即核保留输出不一样的程度。

zero padding

  • 为了得到合适的output

outputs size 计算公式:

1.卷积后的输出大小 o o 等于:

o=i+2pks+1 o = ⌊ i + 2 p − k s ⌋ + 1

其中 i i 是输入input feature map的大小,p 是padding大小, k k 是核函数大小,s 是strides大小。
2.池化pooling后的输出大小,因为没有padding,所以 p p 为0,其输出 o 为:
o=iks+1 o = ⌊ i − k s ⌋ + 1

tensorflow卷积操作

conv2d

tf.nn.conv2d(input,filter,strides,padding,use_on_gpu,data_format,name=None)
  • input:默认的顺序是[batch,in_height,in_width,in_channels],该顺序是NHWC,还有一种是NCWH,可以通过设data_format改变图片的格式顺序.
  • filter:表示一个核(或者称为滤波器),格式为 [filter_height,filter_width,in_channels,out_channels].
  • ‘strides’:核在input上滑动的步长。
  • padding:参数SAME或者VALID

代码

import tensorflow as tf

#Generate the filename queue, and read the gif files contents
filename_queue = tf.train.string_input_producer(tf.train.match_filenames_once("data/Akatsuki.(NARUTO).full.488229.jpg"))
reader = tf.WholeFileReader()
key, value = reader.read(filename_queue)
image=tf.image.decode_gif(value)

#Define the kernel parameters
kernel=tf.constant(
        [
         [[[-1.]],[[-2.]],[[-1.]]],
         [[[0.]],[[0.]],[[0.]]],
         [[[1.]],[[1.]],[[1.]]]
         ]
    )

#Define the train coordinator
coord = tf.train.Coordinator()

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    sess.run(tf.local_variables_initializer())
    threads = tf.train.start_queue_runners(sess,coord=coord)
    #Get first image
    image_tensor = tf.image.rgb_to_grayscale(sess.run([image])[0])
    #apply convolution, preserving the image size
    imagen_convoluted_tensor=tf.nn.conv2d(tf.cast(image_tensor, tf.float32),kernel,[1,1,1,1],"SAME")
    #Prepare to save the convolution option
    file=open ("Sobel.png", "wb+")
    print("write done!")
    #Cast to uint8 (0..255), previous scalation, because the convolution could alter the scale of the final image
    out=tf.image.encode_png(tf.reshape(tf.cast(imagen_convoluted_tensor/tf.reduce_max(imagen_convoluted_tensor)*255.,tf.uint8), tf.shape(imagen_convoluted_tensor.eval()[0]).eval()))
    file.write(out.eval())
    file.close()
    coord.request_stop()
#coord.join :此调用将一直阻塞,直到一组线程终止为止。
coord.join(threads)

结果:

  • 输入图片:
    这里写图片描述

  • 输出图片:


池化操作

  • 作用:在feature maps上滑动一个窗口,对窗口内的信息做pooling function进行缩减,等于用函数对其进行子区域的概括(summarize subregions),表示对重要信息的一种压缩表示。
  • 缺点:让模型失去了位置特性信息。


如图的max pooling进行子区域的特征提取。

max_pool

tf.nn.max_pool(value,ksize,stride,padding,data_format,name)
  • value:shape为[batch length,height,weight,channels]的数据。
  • ksize: 整形列表,表示窗口大小。
  • stride:在input上滑动的步长。

tensorflow代码

import tensorflow as tf

#Generate the filename queue, and read the gif files contents
filename_queue = tf.train.string_input_producer(tf.train.match_filenames_once("data/Akatsuki.(NARUTO).full.488229.jpg"))
reader = tf.WholeFileReader()
key, value = reader.read(filename_queue)
image = tf.image.decode_gif(value)

#Define the  coordinator
coord = tf.train.Coordinator()

def normalize_and_encode (img_tensor):
    image_dimensions = tf.shape(img_tensor.eval()[0]).eval()
    return tf.image.encode_jpeg(tf.reshape(tf.cast(img_tensor, tf.uint8), image_dimensions))

with tf.Session() as sess:
    maxfile=open ("maxpool.jpeg", "wb+")
    sess.run(tf.global_variables_initializer())
    sess.run(tf.local_variables_initializer())
    threads = tf.train.start_queue_runners(coord=coord)

    image_tensor = tf.image.rgb_to_grayscale(sess.run([image])[0])

    maxed_tensor=tf.nn.avg_pool(tf.cast(image_tensor, tf.float32),[1,2,2,1],[1,2,2,1],"SAME")
    averaged_tensor=tf.nn.avg_pool(tf.cast(image_tensor, tf.float32),[1,2,2,1],[1,2,2,1],"SAME")

    maxfile.write(normalize_and_encode(maxed_tensor).eval())
    coord.request_stop()
    maxfile.close()
coord.join(threads)

输出图片:

结合在一起用Mnist

import tensorflow as tf
%matplotlib inline
import matplotlib.pyplot as plt

# Import MINST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
#Show the first training image
plt.imshow(mnist.train.images[0].reshape((28, 28), order='C'), cmap='Greys',  interpolation='nearest')

# Parameters
batch_size = 128
learning_rate = 0.05
number_iterations = 2000
steps = 10

# Network Parameters
n_input = 784 # 28x28 images
n_classes = 10 # 10 digit classes
dropout = 0.80 # Dropout probability

# tf Graph input
X = tf.placeholder(tf.float32, [None, n_input])
Y = tf.placeholder(tf.float32, [None, n_classes])
keep_prob = tf.placeholder(tf.float32) #dropout (keep probability)

# Create some wrappers for simplicity
def conv2d(x, W, b, strides=1):
    # Conv2D wrapper, with bias and relu activation
    x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME')
    x = tf.nn.bias_add(x, b)
    return tf.nn.relu(x)


def subsampling(x, k=2):
    # MaxPool2D wrapper
    return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],
                          padding='SAME')


# Create model
def conv_net(x_in, weights, biases, dropout):
    # Reshape input picture
    x_in = tf.reshape(x_in, shape=[-1, 28, 28, 1])

    # Convolution Layer 1
    conv_layer_1 = conv2d(x_in, weights['wc1'], biases['bc1'])
    # Subsampling
    conv_layer_1 = subsampling(conv_layer_1, k=2)

    # Convolution Layer 2
    conv_layer_2 = conv2d(conv_layer_1, weights['wc2'], biases['bc2'])
    # Subsampling
    conv_layer_2 = subsampling(conv_layer_2, k=2)

    # Fully connected layer
    # Reshape conv_layer_2 output to fit fully connected layer input
    #为了与全连接层的权值相乘做的变换
    fully_connected_layer = tf.reshape(conv_layer_2, [-1, weights['wd1'].get_shape().as_list()[0]])
    fully_connected_layer = tf.add(tf.matmul(fully_connected_layer, weights['wd1']), biases['bd1'])
    fully_connected_layer = tf.nn.relu(fully_connected_layer)
    # Apply Dropout
    fully_connected_layer = tf.nn.dropout(fully_connected_layer, dropout)

    # Output, class prediction
    prediction_output = tf.add(tf.matmul(fully_connected_layer, weights['out']), biases['out'])
    return prediction_output

# Store layers weight & bias
weights = {
    # 5x5 convolutional units, 1 input, 32 outputs
    'wc1': tf.Variable(tf.random_normal([5, 5, 1, 32])),
    # 5x5 convolutional units, 32 inputs, 64 outputs
    'wc2': tf.Variable(tf.random_normal([5, 5, 32, 64])),
    # fully connected, 7*7*64 inputs, 1024 outputs
    'wd1': tf.Variable(tf.random_normal([7*7*64, 1024])),
    # 1024 inputs, 10 outputs (class prediction)
    'out': tf.Variable(tf.random_normal([1024, n_classes]))
}

biases = {
    'bc1': tf.Variable(tf.random_normal([32])),
    'bc2': tf.Variable(tf.random_normal([64])),
    'bd1': tf.Variable(tf.random_normal([1024])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

# Construct model
pred = conv_net(X, weights, biases, keep_prob)

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=Y,logits=pred))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# Evaluate model
correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# Initializing the variables
init = tf.gloable_variables_initializer()

# Launch the graph
with tf.Session() as sess:
    sess.run(init)
    step = 1
    # Keep training until reach max iterations
    while step * batch_size < number_iterations:
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        test = batch_x[0]
        fig = plt.figure()
        plt.imshow(test.reshape((28, 28), order='C'), cmap='Greys',
                   interpolation='nearest')
        # Run optimization op (backprop)
        sess.run(optimizer, feed_dict={X: batch_x, Y: batch_y,
                                       keep_prob: dropout})
        if step % steps == 0:
            # Calculate batch loss and accuracy
            loss, acc = sess.run([cost, accuracy], feed_dict={X: batch_x,
                                                              Y: batch_y,
                                                              keep_prob: 1.})
            print ("Iter " + str(step*batch_size) + ", Minibatch Loss= " + \
                  "{:.6f}".format(loss) + ", Training Accuracy= " + \
                  "{:.5f}".format(acc))
        step += 1

    # Calculate accuracy for 256 mnist test images
    print ("Testing Accuracy:", \
        sess.run(accuracy, feed_dict={X: mnist.test.images[:256],
                                      Y: mnist.test.labels[:256],
                                      keep_prob: 1.}))

其中用的dropout

  • 作用:随机选择一些权重赋值为0,达到“解关联(decorrelation)”的作用。
import tensorflow as tf
x = [1.0, 0.5, 0.75, 0.25, 0.2, 0.8, 0.4, 0.6]
dropout = tf.nn.dropout(x, 0.5)
with tf.Session() as sess:
    print(sess.run(dropout))
[2.  1.  1.5 0.  0.4 1.6 0.8 0. ]

转置卷积(Transposed convolution)

  • 卷积函数可以看做是一个稀疏矩阵 C C ,比如input feature map4×4 ,那么扁平化为 1×16 1 × 16 的向量,卷积 C C 是一个16×4 的矩阵,得到output feature map 1×4 1 × 4 , 还需要进一步reshape之后是 2×2 2 × 2 的格式;反向传播误差的时候类似,乘以的矩阵是 C C 的转置 ,CT
  • 转置卷积类似于相反的操作,input feature map先乘以 CT C T ,误差计算时乘以 C C <script id="MathJax-Element-25" type="math/tex">C</script>。

这里写图片描述

代码:待续。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值