卷积神经网络
自动图像特征提取过程:卷积层+池化层
卷积层
通过卷积层处理后节点矩阵变深
结构核心:过滤器Filter/内核kernel,尺寸即输入节点矩阵的大小,深度即输出单位节点矩阵的深度。
如 2 ∗ 2 ∗ 3 2*2*3 2∗2∗3的输入矩阵经过 2 ∗ 2 ∗ 3 2*2*3 2∗2∗3且深度为5(可理解为5个 2 ∗ 2 ∗ 3 2*2*3 2∗2∗3的过滤器)的过滤器后,其总共需要的参数为 2 ∗ 2 ∗ 3 ∗ 5 + 5 2*2*3*5+5 2∗2∗3∗5+5
注意:卷积神经网络中每个卷积层中使用的过滤器的参数都是一样的
import tensorflow as tf
# 过滤器设置,第二个参数shape:5*5为其尺寸,3为三个通道,16为其深度
filter_weights = tf.get_variable('weights', [5, 5, 3, 16],
initializer=tf.truncated_normal_initializer(stddev=0.1))
# 偏置设置,第二个参数shape:16个
biases = tf.get_variable('biases', [16], initializer=tf.constant_initializer(0.1))
# 函数conv2d用来实现卷积层前向传播
# 第三个参数步长:第一维和最后一维必须为1,因为卷积层的步长只对矩阵的长和宽有效
# 第四个参数填充:SAME为添加全0填充,VALID为不添加
conv = tf.nn.conv2d(input, filter_weights, [1, 1, 1, 1], padding='SAME')
# 加入偏置项
bias = tf.nn.bias_add(conv, biases)
actived_conv = tf.nn.relu(bias)
池化层
通过池化层处理后矩阵深度不变尺寸变小
可有效缩小矩阵尺寸,从而减少最后全连接层的参数,使得池化层可加快计算速度并防止过拟合问题。
# ksize为池化层尺寸,第一维和第四维必须为1,剩余两维为其尺寸
# strides步长,第一维和第四维必须为1
pool = tf.nn.max_pool(actived_conv, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='SAME')
LeNet-5框架模型
总共有7层
实例展示:
- 输入数据格式:四维矩阵[batch个数,图片尺寸,图片尺寸,图片深度]
x = tf.placeholder(tf.float32, [BATCH_SIZE, IMAGE_SIZE, IMAGE_SIZE, NUM_CHANNELS], name='x-input')
- 模型设计
import tensorflow as tf
INPUT_NODE = 784
OUTPUT_NODE = 10
IMAGE_SIZE = 28
NUM_CHANNELS = 1
NUM_LABELS = 10
CONV1_DEEP = 32
CONV1_SIZE = 5
CONV2_DEEP = 64
CONV2_SIZE = 5
FC_SIZE = 512
# train是用来后面判断是否使用dropout而设置的
def inference(input_tensor, train, regularizer):
with tf.variable_scope('layer1-conv1'):
conv1_weights = tf.get_variable("weights", [CONV1_SIZE, CONV1_SIZE, NUM_CHANNELS, CONV1_DEEP],
initializer=tf.truncated_normal_initializer(stddev=0.1))
conv1_biases = tf.get_variable("bias", [CONV1_DEEP], initializer=tf.constant_initializer(0.0))
conv1 = tf.nn.conv2d(input_tensor, conv1_weights, strides=[1, 1, 1, 1], padding='SAME')
relu1 = tf.nn.relu(tf.nn.bias_add(conv1, conv1_biases))
with tf.name_scope('layer2-pool1'):
pool1 = tf.nn.max_pool(relu1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
with tf.variable_scope('layer3-conv2'):
conv2_weights = tf.get_variable("weights", [CONV2_SIZE, CONV2_SIZE, NUM_CHANNELS, CONV2_DEEP],
initializer=tf.truncated_normal_initializer(stddev=0.1))
conv2_biases = tf.get_variable("bias", [CONV2_DEEP], initializer=tf.constant_initializer(0.0))
conv2 = tf.nn.conv2d(pool1, conv2_weights, strides=[1, 1, 1, 1], padding='SAME')
relu2 = tf.nn.relu(tf.nn.bias_add(conv2, conv2_biases))
with tf.name_scope('layer4-pool2'):
pool2 = tf.nn.max_pool(relu2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
# 转换到全连接层:将7*7*64的矩阵拉直成一个向量
# 得到第四层结果矩阵维度
pool_shape = pool2.get_shape().as_list()
# pool_shape[0] batch中数据个数 pool_shape[1] pool_shape[2] 图像尺寸 pool_shape[3] 图像深度
nodes = pool_shape[1] * pool_shape[2] * pool_shape[3]
reshaped = tf.reshape(pool2, [pool_shape[0], nodes])
# 只有全连接层需要正则化
with tf.variable_scope('layer5-fc1'):
fc1_weights = tf.get_variable("weights", [nodes, FC_SIZE], initializer=tf.truncated_normal_initializer(stddev=0.1))
if regularizer != None:
tf.add_to_collection("losses", regularizer(fc1_weights))
fc1_biases = tf.get_variable("bias", [FC_SIZE], initializer=tf.constant_initializer(0.1))
fc1 = tf.nn.relu(tf.matmul(reshaped, fc1_weights) + fc1_biases)
# 只有训练时需要使用dropout防止过拟合,测试时不需要
if train :
fc1 = tf.nn.dropout(fc1, 0.5)
with tf.variable_scope('layer6-fc2'):
fc2_weights = tf.get_variable("weights", [FC_SIZE, NUM_LABELS], initializer=tf.truncated_normal_initializer(stddev=0.1))
if regularizer != None:
tf.add_to_collection("losses", regularizer(fc2_weights))
fc2_biases = tf.get_variable("bias", [NUM_LABELS], initializer=tf.constant_initializer(0.1))
logit = tf.matmul(fc1, fc2_weights) + fc2_biases
return logit
注意:
name_scope 和 variable_scope的差别:
tf.get_variable 会忽略 name_scope。所以使用name_scope给operation分类。同时使用variable_scope来区分variable.