参考了《Tensorflow 实战Google深度学习框架》和《TensorFlow实战 》两本书
LeNet-5简介
LeNet-5模型是Yann LeCun教授于1998年在论文 Gradient-based learning applied to document recognition 中提出的,是第一个用于数字识别的卷积神经网络。
LeNet-5模型总共有7层,模型结构如下图所示(网络解析(一):LeNet-5详解):
接下来简单对上图的几个层进行说明一下:
- Input-data:原始的图像, 32 × 32 \text{32}\times \text{32} 32×32表示图像的维度;
- C1:第一个卷积层。输入为原始图像,卷积kernel的尺寸为 5 × 5 \text{5}\times \text{5} 5×5,无padding,所以输出的尺寸为 32-5+1=28 \text{32-5+1=28} 32-5+1=28,设置的深度为6。这层卷积的参数个数为 5 × 5 × 1 × 6+6=156 \text{5}\times \text{5}\times \text{1}\times \text{6+6=156} 5×5×1×6+6=156,其中有6个偏置参数。因为下一层的节点矩阵的节点个数为 28 × 28 × 6=4704 \text{28}\times \text{28}\times \text{6=4704} 28×28×6=4704,每个节点由 5 × 5 \text{5}\times \text{5} 5×5的卷积核和一个偏置产生,所以本层有 4704 × ( 25+1 ) =122304 \text{4704}\times \left( \text{25+1} \right)\text{=122304} 4704×(25+1)=122304个连接。
- S2:第一个下采样层——池化层。输入为上一层深度为6的数据,filter的尺寸为 2 × 2 \text{2}\times \text{2} 2×2,长和宽的步长均为2,所以本层的输出维度为 14 × 14 × 6 \text{14}\times \text{14}\times \text{6} 14×14×6。
- C3:第二个卷积层。输入为池化层的输出,kernel的尺寸大小为 5 × 5 \text{5}\times \text{5} 5×5,无padding,所以输出的尺寸为 14-5+1=10 \text{14-5+1=10} 14-5+1=10,设置的深度为16,输出的维度为 10 × 10 × 16 \text{10}\times \text{10}\times \text{16} 10×10×16。这层卷积的参数个数为 5 × 5 × 6 × 16+16=2416 \text{5}\times \text{5}\times \text{6}\times \text{16+16=2416} 5×5×6×16+16=2416,有 10 × 10 × 16 × ( 25+1 ) = 41600 \text{10}\times \text{10}\times \text{16}\times \left( \text{25+1} \right)=41600 10×10×16×(25+1)=41600个连接。
- S4:第二个下采样层——池化层。输入为上一层深度为16的数据,filter的尺寸为 2 × 2 \text{2}\times \text{2} 2×2,长和宽的步长均为2,所以本层的输出维度为 5 × 5 × 16 \text{5}\times \text{5}\times \text{16} 5×5×16。
- C5:第三个卷积层——实际上的全连接层。本层的输入维度为 5 × 5 × 16 \text{5}\times \text{5}\times \text{16} 5×5×16,但是卷积kernel的尺寸为 5 × 5 \text{5}\times \text{5} 5×5,所以和全连接没有区别(这里认为是参数的个数没有区别,但是对于实际操作上,还是存在一定的差别)。本层的输出的节点个数为120,所以共有 5 × 5 × 16 × 120 + 120 = 48120 5\times 5\times 16\times 120+120=48120 5×5×16×120+120=48120个参数。(个人觉得仅仅当成全连接的操作对于MNIST识别还可以,但是对于识别等操作不可以等同)
- F6:第一个真正的全连接层。 本层的输入节点个数为120个,输出节点个数为84个,总共参数有 120 × 84 + 84 = 10164 120\times 84+84=10164 120×84+84=10164个。
- F7:第二个全连接层。 本层的输入节点个数为84个,输出节点个数为10个,总共参数有 84 × 10 + 10 = 850 84\times 10+10=850 84×10+10=850个。
Tensorflow简易实现LeNet
1. 简易的LeNet
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
#载入MNIST数据集,创建默认会话
mnist = input_data.read_data_sets("MNIST_data/", one_hot = True)
sess = tf.InteractiveSession()
# 定义权重获得方式
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev = 0.1)
return tf.Variable(initial)
# 定义偏置获得方式
def bias_variable(shape):
initial = tf.constant(0.1, shape = shape)
return tf.Variable(initial)
# 定义尺寸不变的二维卷积,即通过增加padding使输出的维度和输入相同
def conv2d(input_data, weights):
return tf.nn.conv2d(input_data, weights, strides = [1, 1, 1, 1], padding = 'SAME')
# 定义最大池化层,通过padding使得尺寸符合[input_size / kernel_size]
def max_pool_2x2(input_data):
return tf.nn.max_pool(input_data, ksize = [1, 2, 2, 1], strides = [1, 2, 2, 1], padding = 'SAME')
# 载入输入图和真实label
x = tf.placeholder(tf.float32, [None, 784])
y_ = tf.placeholder(tf.float32, [None, 10])
# 将一维数据转换成二维数据,-1表示图像数据的个数不变
x_image = tf.reshape(x, [-1, 28, 28, 1])
# 定义第一层卷积层及后的池化层的参数
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
# 这一步过后,输出的维度为28x28x32
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
# 这一步过后,输出的维度为14x14x32
h_pool1 = max_pool_2x2(h_conv1)
# 定义第二层卷积层及后的池化层的参数
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
# 这一步过后,输出的维度为14x14x64
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
# 这一步过后,输出的维度为7x7x64
h_pool2 = max_pool_2x2(h_conv2)
# 这里做了简化,仅使用了两个全连接层,但是实际意义是相同的
# 定义第一个全连接层的参数
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])
# 将二维数据拉成一维数据
h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])
# 这一步做的实际是全连接操作
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
# 添加Dropout层,减轻过拟合,通过keep_prob来控制
# keep_prob=1表示所有的参数都参与计算,keep_prob越小表示参与的参数越少
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
# 将Dropout层的输出连接一个Sotmax层,得到最后的概率输出
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])
y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
# 定义交叉损失函数
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv), reduction_indices = [1]))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
# 计算准确率
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# 初始化所有参数
tf.global_variables_initializer().run()
# 开始训练
for i in range(20000):
x_batch, y_batch = mnist.train.next_batch(50)
# 这里的100轮的数据用作validation
if i % 100 == 0:
train_accuracy = accuracy.eval({x: x_batch, y_: y_batch, keep_prob:1.0})
print("step %d, training accuracy %g" % (i, train_accuracy))
train_step.run(feed_dict = {x: x_batch, y_: y_batch, keep_prob: 0.5})
# 输出最终的准确率
print("test accuracy %g" %accuracy.eval(feed_dict = {x:mnist.test.images, y_:mnist.test.labels, keep_prob: 1.0}))
以上代码实现的准确率为99.73%左右。
2. 一个完整的代码(根据《TensorFlow实战google深度学习框架》修改使用)
LeNet5_inference.py
import tensorflow as tf
# 相关参数定义
INPUT_NODE = 784
OUTPUT_NODE = 10
IMAGE_SIZE = 28
NUM_CHANNELS = 1
NUM_LABELS = 10
CONV1_DEEP = 32
CONV1_SIZE = 5
CONV2_DEEP = 64
CONV2_SIZE = 5
FC_SIZE = 512
# 定义LeNet5模型的基本结构
def inference(input_tensor, train, regularizer):
# 第一层卷积,kernel_size = 5x5, padding, stride = 1
with tf.variable_scope('layer1-conv1'):
conv1_weights = tf.get_variable(
"weight", [CONV1_SIZE, CONV1_SIZE, NUM_CHANNELS, CONV1_DEEP],
initializer=tf.truncated_normal_initializer(stddev=0.1))
conv1_biases = tf.get_variable("bias", [CONV1_DEEP], initializer=tf.constant_initializer(0.0))
conv1 = tf.nn.conv2d(input_tensor, conv1_weights, strides=[1, 1, 1, 1], padding='SAME')
relu1 = tf.nn.relu(tf.nn.bias_add(conv1, conv1_biases))
# 第一层池化, kernel_size = 2x2, padding, stride = 2
with tf.name_scope("layer2-pool1"):
pool1 = tf.nn.max_pool(relu1, ksize = [1,2,2,1],strides=[1,2,2,1],padding="SAME")
# 第二层卷积,kernel_size = 5x5, padding, stride = 1
with tf.variable_scope("layer3-conv2"):
conv2_weights = tf.get_variable(
"weight", [CONV2_SIZE, CONV2_SIZE, CONV1_DEEP, CONV2_DEEP],
initializer=tf.truncated_normal_initializer(stddev=0.1))
conv2_biases = tf.get_variable("bias", [CONV2_DEEP], initializer=tf.constant_initializer(0.0))
conv2 = tf.nn.conv2d(pool1, conv2_weights, strides=[1, 1, 1, 1], padding='SAME')
relu2 = tf.nn.relu(tf.nn.bias_add(conv2, conv2_biases))
# 第二层池化, kernel_size = 2x2, padding, stride = 2
with tf.name_scope("layer4-pool2"):
pool2 = tf.nn.max_pool(relu2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
pool_shape = pool2.get_shape().as_list()
nodes = pool_shape[1] * pool_shape[2] * pool_shape[3]
reshaped = tf.reshape(pool2, [pool_shape[0], nodes])
# 第一个全连接层
with tf.variable_scope('layer5-fc1'):
fc1_weights = tf.get_variable("weight", [nodes, FC_SIZE],
initializer=tf.truncated_normal_initializer(stddev=0.1))
if regularizer != None: tf.add_to_collection('losses', regularizer(fc1_weights))
fc1_biases = tf.get_variable("bias", [FC_SIZE], initializer=tf.constant_initializer(0.1))
fc1 = tf.nn.relu(tf.matmul(reshaped, fc1_weights) + fc1_biases)
if train: fc1 = tf.nn.dropout(fc1, 0.5)
# 第二个全连接层
with tf.variable_scope('layer6-fc2'):
fc2_weights = tf.get_variable("weight", [FC_SIZE, NUM_LABELS],
initializer=tf.truncated_normal_initializer(stddev=0.1))
if regularizer != None: tf.add_to_collection('losses', regularizer(fc2_weights))
fc2_biases = tf.get_variable("bias", [NUM_LABELS], initializer=tf.constant_initializer(0.1))
logit = tf.matmul(fc1, fc2_weights) + fc2_biases
return logit
LeNet5_train.py
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import LeNet5_inference
import os
import numpy as np
# 定义相关参数
BATCH_SIZE = 100
LEARNING_RATE_BASE = 0.01
LEARNING_RATE_DECAY = 0.99
REGULARIZATION_RATE = 0.0001
TRAINING_STEPS = 6000
MOVING_AVERAGE_DECAY = 0.99
MODEL_SAVE_PATH = "LeNet_model/"
MODEL_NAME = "lenet_model"
def train(mnist):
# 定义输出为4维矩阵的placeholder
x = tf.placeholder(tf.float32, [
BATCH_SIZE,
LeNet5_inference.IMAGE_SIZE,
LeNet5_inference.IMAGE_SIZE,
LeNet5_inference.NUM_CHANNELS],
name='x-input')
y_ = tf.placeholder(tf.float32, [None, LeNet5_inference.OUTPUT_NODE], name='y-input')
regularizer = tf.contrib.layers.l2_regularizer(REGULARIZATION_RATE)
y = LeNet5_inference.inference(x,False,regularizer)
global_step = tf.Variable(0, trainable=False)
# 定义损失函数、学习率、滑动平均操作以及训练过程。
variable_averages = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY, global_step)
variables_averages_op = variable_averages.apply(tf.trainable_variables())
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y, labels=tf.argmax(y_, 1))
cross_entropy_mean = tf.reduce_mean(cross_entropy)
loss = cross_entropy_mean + tf.add_n(tf.get_collection('losses'))
learning_rate = tf.train.exponential_decay(
LEARNING_RATE_BASE,
global_step,
mnist.train.num_examples / BATCH_SIZE, LEARNING_RATE_DECAY,
staircase=True)
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
with tf.control_dependencies([train_step, variables_averages_op]):
train_op = tf.no_op(name='train')
# 初始化TensorFlow持久化类。
saver = tf.train.Saver()
with tf.Session() as sess:
tf.global_variables_initializer().run()
for i in range(TRAINING_STEPS):
xs, ys = mnist.train.next_batch(BATCH_SIZE)
reshaped_xs = np.reshape(xs, (
BATCH_SIZE,
LeNet5_inference.IMAGE_SIZE,
LeNet5_inference.IMAGE_SIZE,
LeNet5_inference.NUM_CHANNELS))
_, loss_value, step = sess.run([train_op, loss, global_step], feed_dict={x: reshaped_xs, y_: ys})
if i % 1000 == 0:
print("After %d training step(s), loss on training batch is %g." % (step, loss_value))
saver.save(sess, os.path.join(MODEL_SAVE_PATH, MODEL_NAME), global_step=global_step)
def main(argv=None):
mnist = input_data.read_data_sets("../../../datasets/MNIST_data", one_hot=True)
train(mnist)
if __name__ == '__main__':
main()
LeNet5_eval.py
import time
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import LeNet5_inference
import LeNet5_train
import numpy as np
# 加载的时间间隔。
EVAL_INTERVAL_SECS = 10
def evaluate(mnist):
with tf.Graph().as_default() as g:
# 定义2D的输入
x = tf.placeholder(tf.float32, [
mnist.validation.num_examples,
LeNet5_inference.IMAGE_SIZE,
LeNet5_inference.IMAGE_SIZE,
LeNet5_inference.NUM_CHANNELS],
name='x-input')
y_ = tf.placeholder(tf.float32, [None, LeNet5_inference.OUTPUT_NODE], name='y-input')
# 将测试数据转换成2D
reshaped_xs = np.reshape(mnist.validation.images, (
mnist.validation.num_examples,
LeNet5_inference.IMAGE_SIZE,
LeNet5_inference.IMAGE_SIZE,
LeNet5_inference.NUM_CHANNELS))
validate_feed = {x: reshaped_xs, y_: mnist.validation.labels}
# 计算相关准确率
y = LeNet5_inference.inference(x,False,None)
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# 增加平均池化
variable_averages = tf.train.ExponentialMovingAverage(LeNet5_train.MOVING_AVERAGE_DECAY)
# 加载持久图
variables_to_restore = variable_averages.variables_to_restore()
saver = tf.train.Saver(variables_to_restore)
while True:
with tf.Session() as sess:
ckpt = tf.train.get_checkpoint_state(LeNet5_train.MODEL_SAVE_PATH)
if ckpt and ckpt.model_checkpoint_path:
saver.restore(sess, ckpt.model_checkpoint_path)
global_step = ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1]
accuracy_score = sess.run(accuracy, feed_dict=validate_feed)
print("After %s training step(s), validation accuracy = %g" % (global_step, accuracy_score))
else:
print('No checkpoint file found')
return
time.sleep(EVAL_INTERVAL_SECS)
def main(argv=None):
mnist = input_data.read_data_sets("../../../datasets/MNIST_data", one_hot=True)
evaluate(mnist)
if __name__ == '__main__':
main()
Pytorch简易实现LeNet
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.datasets as dset
import torchvision.transforms as transforms
from torch.autograd import Variable
import torch.utils.data as Data
import numpy as np
import matplotlib.pyplot as plt
# 基本参数设置
batch_size = 100
learning_rate = 0.001
# 加载训练数据和测试数据
train_dataset = dset.MNIST(root = 'mnist_data', train = True, transform = transforms.ToTensor(), download = True)
test_dataset = dset.MNIST(root = 'mnist_data', train = False, transform = transforms.ToTensor())
# 定义LeNet
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# 1 input image channel, 6 output channels, 3x3 square convolution
# kernel
self.conv1 = nn.Conv2d(1, 6, 5, padding = 1)
self.conv2 = nn.Conv2d(6, 16, 5, padding = 1)
# an affine operation: y = Wx + b
self.fc1 = nn.Linear(16 * 5 * 5, 120) # 6*6 from image dimension
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
# Max pooling over a (2, 2) window
x = F.max_pool2d(F.relu(self.conv1(x)), 2)
# If the size is a square you can only specify a single number
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = x.view(-1, self.num_flat_features(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = F.softmax(self.fc3(x), dim = 1)
return x
def num_flat_features(self, x):
size = x.size()[1:] # all dimensions except the batch dimension
num_features = 1
for s in size:
num_features *= s
return num_features
# 定义训练集和测试集
train_loader = torch.utils.data.DataLoader(dataset = train_dataset, batch_size = batch_size, shuffle = True)
test_loader = torch.utils.data.DataLoader(dataset = test_dataset, batch_size = batch_size, shuffle = False)
# 创建网络实例并定义损失函数
net = Net()
criterion = nn.CrossEntropyLoss()
# 使用Adam
optimizer = torch.optim.Adam(net.parameters(), lr = learning_rate)
num_epochs = 5
# 开始5个epoch的训练
for epoch in range(num_epochs):
for i, (images, labels) in enumerate(train_loader):
images = Variable(images)
labels = Variable(labels)
optimizer.zero_grad()
outputs = net(images)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print(loss.data.item())
if(i + 1) % 100 == 0:
print('Epoch [%d/%d], Step [%d/%d], Loss: %.4f' % (epoch + 1, num_epochs, i + 1, len(train_dataset)//batch_size, loss.data.item()))
# 测试准确率
correct = 0
total = 0
for images, labels in test_loader:
images = Variable(images)
outputs = net(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum()
print('Accuracy: %.2f %%' % (100 * float(correct) / total))
以上代码可以达到99.11%的准确率。
有个问题,图像的频域特征对目标有什么作用?
optimizer = torch.optim.Adam(net.parameters(), lr = learning_rate)
num_epochs = 5
for epoch in range(num_epochs):
for i, (images, labels) in enumerate(train_loader):
images = np.fft.fft2(images)
images = np.log(np.abs(images))
images = torch.tensor(images, dtype=torch.float32)
images = Variable(images)
labels = Variable(labels)
optimizer.zero_grad()
outputs = net(images)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print(loss.data.item())
if(i + 1) % 100 == 0:
print('Epoch [%d/%d], Step [%d/%d], Loss: %.4f' % (epoch + 1, num_epochs, i + 1, len(train_dataset)//batch_size, loss.data.item()))
correct = 0
total = 0
for images, labels in test_loader:
images = np.fft.fft2(images)
images = np.log(np.abs(images))
images = torch.tensor(images, dtype=torch.float32)
images = Variable(images)
outputs = net(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum()
print('Accuracy: %.2f %%' % (100 * float(correct) / total))
以上的代码最高可以实现83.63%的准确率。