CNN卷积神经网络

最新推荐文章于 2022-07-12 23:33:32 发布

~扬之水~

最新推荐文章于 2022-07-12 23:33:32 发布

阅读量467

点赞数

分类专栏：深度学习文章标签： TensorFlow

深度学习专栏收录该内容

8 篇文章 1 订阅

订阅专栏

tf.nn.conv2d介绍

tf.nn.conv2d是TensorFlow里面实现卷积的函数，函数定义：

tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, name=None)

第一个参数input：指需要做卷积的输入图像，它要求是一个Tensor，具有[batch, in_height, in_width, in_channels]这样的shape，具体含义是[训练时一个batch的图片数量, 图片高度, 图片宽度, 图像通道数]，注意这是一个4维的Tensor，要求类型为float32和float64其中之一
第二个参数filter：相当于CNN中的卷积核，它要求是一个Tensor，具有[filter_height, filter_width, in_channels, out_channels]这样的shape，具体含义是[卷积核的高度，卷积核的宽度，图像通道数，卷积核个数]，要求类型与参数input相同，有一个地方需要注意，第三维in_channels，就是参数input的第四维；
out_channels是该层卷积核的个数，即feature map，为提取的特征个数，下图红线所标为该层的卷积核个数，卷积核个数可以手动指定，一般网络越深的地方这个值越大，因为随着网络的加深，feature map的长宽尺寸缩小，本卷积层的每个map提取的特征越具有代表性（精华部分），所以后一层卷积层需要增加feature map的数量，才能更充分的提取出前一层的特征，一般是成倍增加（不过具体论文会根据实验情况具体设置）。（该部分参考：https://blog.csdn.net/qq_36231549/article/details/80580100）
第三个参数strides：卷积时在图像每一维的步长，这是一个一维的向量，长度4
第四个参数padding：string类型的量，只能是"SAME","VALID"其中之一，这个值决定了不同的卷积方式（后面会介绍）
第五个参数：use_cudnn_on_gpu:bool类型，是否使用cudnn加速，默认为true
结果返回一个Tensor，这个输出，就是我们常说的feature map，shape仍然是[batch, height, width, channels]这种形式。

tf.contrib.slim.conv2d介绍

TF-Slim是tensorflow中定义、训练和评估复杂模型的轻量级库。tf-slim中的组件可以轻易地和原生tensorflow框架以及例如tf.contrib.learn这样的框架进行整合。
github：https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/slim
tf.contrib.slim.conv2d，其函数定义如下：

conv2d(inputs,

          num_outputs,
          
          kernel_size,

          stride=1,

          padding='SAME',

          data_format=None,

          rate=1,

          activation_fn=nn.relu,

          normalizer_fn=None,

          normalizer_params=None,

          weights_initializer=initializers.xavier_initializer(),

          weights_regularizer=None,

          biases_initializer=init_ops.zeros_initializer(),

          biases_regularizer=None,

          reuse=None,

          variables_collections=None,

          outputs_collections=None,

          trainable=True,

          scope=None):

inputs同样是指需要做卷积的输入图像

num_outputs指定卷积核的个数（就是filter的个数）

kernel_size用于指定卷积核的维度（卷积核的宽度，卷积核的高度）

stride为卷积时在图像每一维的步长

padding为padding的方式选择，VALID或者SAME

data_format是用于指定输入的input的格式

rate这个参数不是太理解，而且tf.nn.conv2d中也没有，对于使用atrous convolution的膨胀率（不是太懂这个atrous convolution）

activation_fn用于激活函数的指定，默认的为ReLU函数

normalizer_fn用于指定正则化函数

normalizer_params用于指定正则化函数的参数

weights_initializer用于指定权重的初始化程序

weights_regularizer为权重可选的正则化程序

biases_initializer用于指定biase的初始化程序

biases_regularizer: biases可选的正则化程序

reuse指定是否共享层或者和变量

variable_collections指定所有变量的集合列表或者字典

outputs_collections指定输出被添加的集合

trainable:卷积层的参数是否可被训练

scope:共享变量所指的variable_scope

slim.fully_connected

slim.fully_connected(x, 128,activation_fn=None, scope=‘fc1’)
前两个参数分别为网络输入、输出的神经元数量，activation_fn为激活函数。

CNN训练技巧

优化卷积核

在实际的卷积训练中，为了加快速度，常常把卷积核裁开。比如一个3×3的过滤器，可以裁成3×1和1×3两个过滤器，分别对原有输入做卷积操作，这样可以大大提升运算的速度。
原理：在浮点运算中乘法消耗的资源比较多，我们目的就是尽量减小乘法运算。

用以下代码举例

# tf Graph Input
x = tf.placeholder(tf.float32, [None, 24,24,3]) # cifar data image of shape 24*24*3
y = tf.placeholder(tf.float32, [None, 10]) # 0-9 数字=> 10 classes


W_conv1 = weight_variable([5, 5, 3, 64])
b_conv1 = bias_variable([64])

x_image = tf.reshape(x, [-1,24,24,3])

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

W_conv2 = weight_variable([5, 5, 64, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

W_conv3 = weight_variable([5, 5, 64, 10])
b_conv3 = bias_variable([10])
h_conv3 = tf.nn.relu(conv2d(h_pool2, W_conv3) + b_conv3)

nt_hpool3=avg_pool_6x6(h_conv3)#10
nt_hpool3_flat = tf.reshape(nt_hpool3, [-1, 10])
y_conv=tf.nn.softmax(nt_hpool3_flat)

将上面代码中的第二层5×5的卷积操作conv2注释掉，换成两个5×1和1×5的卷积操作(换成下面的代码)，代码运行后可以看到准确率没有变化，但是速度快了一些。

x_image = tf.reshape(x, [-1,24,24,3])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
W_conv21 = weight_variable([5, 1, 64, 64])
b_conv21 = bias_variable([64])
h_conv21 = tf.nn.relu(conv2d(h_pool1, W_conv21) + b_conv21)
W_conv2 = weight_variable([1, 5, 64, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_conv21, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

多通道卷积技术

多通道卷积的变化就是，在单个卷积层中加入若干个不同尺寸的过滤器，这样会使生成的feature map特征更加多样性。
在这里插入图片描述

使用该方式可以将上面代码中的第二层5×5的卷积，扩展到7×7卷积、1×1卷积、3×3卷积，并将它们的输出通过concat函数并在一起。
具体代码改为：

# tf Graph Input
x = tf.placeholder(tf.float32, [None, 24,24,3]) # cifar data image of shape 24*24*3
y = tf.placeholder(tf.float32, [None, 10]) # 0-9 数字=> 10 classes


W_conv1 = weight_variable([5, 5, 3, 64])
b_conv1 = bias_variable([64])

x_image = tf.reshape(x, [-1,24,24,3])

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
#######################################################多卷积核
W_conv2_5x5 = weight_variable([5, 5, 64, 64]) 
b_conv2_5x5 = bias_variable([64]) 
W_conv2_7x7 = weight_variable([7, 7, 64, 64]) 
b_conv2_7x7 = bias_variable([64]) 

W_conv2_3x3 = weight_variable([3, 3, 64, 64]) 
b_conv2_3x3 = bias_variable([64]) 

W_conv2_1x1 = weight_variable([3, 3, 64, 64]) 
b_conv2_1x1 = bias_variable([64]) 

h_conv2_1x1 = tf.nn.relu(conv2d(h_pool1, W_conv2_1x1) + b_conv2_1x1)
h_conv2_3x3 = tf.nn.relu(conv2d(h_pool1, W_conv2_3x3) + b_conv2_3x3)
h_conv2_5x5 = tf.nn.relu(conv2d(h_pool1, W_conv2_5x5) + b_conv2_5x5)
h_conv2_7x7 = tf.nn.relu(conv2d(h_pool1, W_conv2_7x7) + b_conv2_7x7)
h_conv2 = tf.concat([h_conv2_5x5,h_conv2_7x7,h_conv2_3x3,h_conv2_1x1],3)

#######################################################
#W_conv2 = weight_variable([5, 5, 64, 64])
#b_conv2 = bias_variable([64])
#
#h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
#######################################################

W_conv3 = weight_variable([5, 5, 256, 10])
b_conv3 = bias_variable([10])
h_conv3 = tf.nn.relu(conv2d(h_pool2, W_conv3) + b_conv3)

nt_hpool3=avg_pool_6x6(h_conv3)#10
nt_hpool3_flat = tf.reshape(nt_hpool3, [-1, 10])
y_conv=tf.nn.softmax(nt_hpool3_flat)

上面代码中，1×1、3×3、5×5、7×7的卷积操作输入都是h_pool1，每个卷积操作后都生成了64个feature map，再用concat函数将它们合在一起变成一个[batch、12，12，256]大小的数据（4个64channels=256个channels）。

批量归一化

批量归一化（简称BN算法）。一般用在全连接或卷积神经网络中。这个里程碑式技术的问世，使得整个神经网络的识别准确度上升了一个台阶。
批量正则化的作用是要最大限度地保证每次的正向传播输出在同一分布上。
批量正则化的做法是将每一层运算出来的数据都归一化成均值为0方差为1的标准高斯分布。这样就会在保留样本分布特征的同时，又消除了层与层间的分布差异。

添加批量归一化处理

import cifar10_input
import tensorflow as tf
import numpy as np
from tensorflow.contrib.layers.python.layers import batch_norm

batch_size = 128
data_dir = '/tmp/cifar10_data/cifar-10-batches-bin'
print("begin")
images_train, labels_train = cifar10_input.inputs(eval_data = False,data_dir = data_dir, batch_size = batch_size)
images_test, labels_test = cifar10_input.inputs(eval_data = True, data_dir = data_dir, batch_size = batch_size)
print("begin data")

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)
  
def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')  
                        
def avg_pool_6x6(x):
  return tf.nn.avg_pool(x, ksize=[1, 6, 6, 1],
                        strides=[1, 6, 6, 1], padding='SAME')
 #定义BN函数
def batch_norm_layer(value,train = None, name = 'batch_norm'): 
  if train is not None:       
      return batch_norm(value, decay = 0.9,updates_collections=None, is_training = True)
  else:
      return batch_norm(value, decay = 0.9,updates_collections=None, is_training = False)
  
# tf Graph Input
x = tf.placeholder(tf.float32, [None, 24,24,3]) # cifar data image of shape 24*24*3
y = tf.placeholder(tf.float32, [None, 10]) # 0-9 数字=> 10 classes
train = tf.placeholder(tf.float32)#定义一个train将训练状态当成一个占位符来传入

W_conv1 = weight_variable([5, 5, 3, 64])
b_conv1 = bias_variable([64])

x_image = tf.reshape(x, [-1,24,24,3])
#在第一层h_conv1与第二层h_conv2的输出之前卷积之后加入BN层。
h_conv1 = tf.nn.relu(batch_norm_layer((conv2d(x_image, W_conv1) + b_conv1),train))
h_pool1 = max_pool_2x2(h_conv1)

W_conv2 = weight_variable([5, 5, 64, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(batch_norm_layer((conv2d(h_pool1, W_conv2) + b_conv2),train))
h_pool2 = max_pool_2x2(h_conv2)

W_conv3 = weight_variable([5, 5, 64, 10])
b_conv3 = bias_variable([10])
h_conv3 = tf.nn.relu(conv2d(h_pool2, W_conv3) + b_conv3)

nt_hpool3=avg_pool_6x6(h_conv3)#10
nt_hpool3_flat = tf.reshape(nt_hpool3, [-1, 10])
y_conv=tf.nn.softmax(nt_hpool3_flat)

cross_entropy = -tf.reduce_sum(y*tf.log(y_conv))

global_step = tf.Variable(0, trainable=False)
decaylearning_rate = tf.train.exponential_decay(0.04, global_step,1000, 0.9)#加入退化学习率

train_step = tf.train.AdamOptimizer(decaylearning_rate).minimize(cross_entropy,global_step=global_step)

correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

sess = tf.Session()
sess.run(tf.global_variables_initializer())
tf.train.start_queue_runners(sess=sess)
for i in range(20000):
  image_batch, label_batch = sess.run([images_train, labels_train])
  label_b = np.eye(10,dtype=float)[label_batch] #one hot
  
  train_step.run(feed_dict={x:image_batch, y: label_b,train:1},session=sess)
  
  if i%200 == 0:
    train_accuracy = accuracy.eval(feed_dict={
        x:image_batch, y: label_b},session=sess)
    print( "step %d, training accuracy %g"%(i, train_accuracy))


image_batch, label_batch = sess.run([images_test, labels_test])
label_b = np.eye(10,dtype=float)[label_batch]#one hot
print ("finished！ test accuracy %g"%accuracy.eval(feed_dict={
     x:image_batch, y: label_b},session=sess))