一步一步读代码-tensorflow实现Mnist手写识别（邱锡鹏-神经网络与深度学习-代码练习chap5_CNN）

一枚射手座的程序媛

已于 2024-06-10 14:02:27 修改

阅读量336

点赞数 1

分类专栏：代码分析文章标签： tensorflow python 深度学习

于 2023-04-23 20:29:14 首次发布

本文链接：https://blog.csdn.net/qq_43761715/article/details/130329169

版权

代码分析专栏收录该内容

2 篇文章 0 订阅

订阅专栏

详细分析tensorflow15实现Mnist手写识别

嗨，我是射手座的程序媛，期待与大家更多的学习与交流，欢迎添加3512724768

编译环境

tensorflow = 1.15，不用gpu

'''
Descripttion: 
Author: WangXiaoyo
version: tf15
Date: 2023-04-18 20:11:04
LastEditors: WangXiaoyo
LastEditTime: 2023-04-23 19:48:50
'''
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

learning_rate = 1e-4 # 学习率
keep_prob_rate = 0.7 # dropout被保留下的概率
max_epoch = 2000 #总epoch数
def compute_accuracy(v_xs, v_ys):
    global prediction
    y_pre = sess.run(prediction, feed_dict={xs: v_xs, keep_prob: 1})
    correct_prediction = tf.equal(tf.argmax(y_pre,1), tf.argmax(v_ys,1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    result = sess.run(accuracy, feed_dict={xs: v_xs, ys: v_ys, keep_prob: 1})
    return result

def weight_variable(shape):
    # 截断的产生正态分布的随机数
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)

def bias_variable(shape):
    # 初始化为0.1
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

def conv2d(x, W):
    # 每一维度  滑动步长全部是 1， padding 方式 选择 same
    # 提示 使用函数  tf.nn.conv2d
    # 考虑边界，补零 =>output大小不改变
    return tf.nn.conv2d(x,W,strides=[1,1,1,1],padding='SAME')

def max_pool_2x2(x):
    # 滑动步长 是 2步; 池化窗口的尺度 高和宽度都是2; padding 方式 请选择 same
    # 提示 使用函数  tf.nn.max_pool
    # 2*2的 =>output减半
    return tf.nn.max_pool(x,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME')

# define placeholder for inputs to network 提前占位（找内存）784 = 28*28
xs = tf.placeholder(tf.float32, [None, 784])/255.
ys = tf.placeholder(tf.float32, [None, 10])
keep_prob = tf.placeholder(tf.float32)
x_image = tf.reshape(xs, [-1, 28, 28, 1]) #输入为一通道（黑白图片）的28*28的图片，图片数量不定（-1）

#  卷积层 1
## conv1 layer ##

W_conv1 = weight_variable([7,7,1,32]) # patch 7x7, in size 1, out size 32 卷积核是7*7的，输入通道为1，输出通道为32
b_conv1 = bias_variable([32]) # out size = 32
h_conv1 = tf.nn.relu(conv2d(x_image,W_conv1) + b_conv1) # 卷积  自己选择 选择激活函数 output = 28*28*32
h_pool1 = max_pool_2x2(h_conv1) # 池化 output = 14*14*32        

# 卷积层 2
W_conv2 = weight_variable([5,5,32,64]) # patch 5x5, in size 32, out size 64 卷积核是5*5的，输入通道数是上一层卷积的输出通道数32，输出通道数为64
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1,W_conv2) + b_conv2) # 卷积  自己选择 选择激活函数 output = 14*14*64
h_pool2 = max_pool_2x2(h_conv2) # 池化 output = 7*7*64

#  全连接层 1
## fc1 layer ##
W_fc1 = weight_variable([7*7*64, 1024]) #输入是7*7*64（已经是一维的了），输出定为1024（自己随便定就行）
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64]) #这里就是把第二层卷积后的输出进行flat，变为一维的，大小为7*7*64
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1) #这里不是卷积，只是 乘
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob) #为了防止过拟合，考虑了dropout

# 全连接层 2
## fc2 layer ##
W_fc2 = weight_variable([1024, 10]) #最后一层：上一层全连接层输出大小1024为本层的输入大小，输出大小为10（0-9个数字呀）
b_fc2 = bias_variable([10])
prediction = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2) #最后的预测值，采用了softmax多分类输出每个类别的概率


# 交叉熵函数 计算真实值与预测值的误差
cross_entropy = tf.reduce_mean(-tf.reduce_sum(ys * tf.log(prediction),
                                              reduction_indices=[1]))
# 采用Adam优化器进行优化，使得cross_entropy最小
train_step = tf.train.AdamOptimizer(learning_rate).minimize(cross_entropy)

with tf.Session() as sess:
    # 定义session，初始化所有变量
    init = tf.global_variables_initializer()
    sess.run(init)
    
    #训练max_epoch = 2000次，每100次计算一下模型的精度
    for i in range(max_epoch):
        batch_xs, batch_ys = mnist.train.next_batch(100)
        sess.run(train_step, feed_dict={xs: batch_xs, ys: batch_ys, keep_prob:keep_prob_rate})
        if i % 100 == 0:
            print(compute_accuracy(
                mnist.test.images[:1000], mnist.test.labels[:1000]))

精度计算compute_accuracy

这里最好自己实验一边，找个矩阵一步一步的跑代码，就能知道到底是怎么算的了。

def compute_accuracy(v_xs, v_ys):
    global prediction
    y_pre = sess.run(prediction, feed_dict={xs: v_xs, keep_prob: 1})
    correct_prediction = tf.equal(tf.argmax(y_pre,1), tf.argmax(v_ys,1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    result = sess.run(accuracy, feed_dict={xs: v_xs, ys: v_ys, keep_prob: 1})
    return result

v_xs：输入的图片张量
v_ys：图片真实值，也就是the ground truth
keep_prob：dropout率，这里给赋值为1，也就是没有进行dropout，没有舍弃
tf.argmax(y_pre,1)：按照行，返回y_pre矩阵中每一行中最大值的索引（从0开始），比如下面的例子，第一行中最大的是3，其索引为2，第二行最大的是6，其索引为2。所以，输出为[2,2]
a = [[1,2,3],[4,5,6]]

print(sess.run(tf.arg_max(a,1))) #[2 2]

tf.equal(a,b)：对比这两个矩阵或者向量的相等的元素，如果是相等的那就返回True，反之返回False，返回的值的矩阵维度和A是一样的，逐元素对比，两者维度必需相同，输出的矩阵维度不变，与输入矩阵维度相同。如下面的例子: 逐元素进行比较，矩阵形式完全相同
a = [[1,2,3],[4,5,6]]
b = [[1,0,3],[1,5,1]]
print(sess.run(tf.equal(a,b)))

[[ True False  True]
 [False  True False]]

也就是说，此时correct_prediction是一个矩阵，矩阵取值为bool类型。

-tf.cast(correct_prediction, tf.float32)：将correct_prediction转化为float32类型的数据，实际上就是True =>1. ，False => 0.
因为是float32，所以肯定有. 把数据转换成float形式，便于计算啊

tf.reduce_mean()：这里就是求平均值啦，也就是获得精度。你想想，前面通过tf.cast()将数据转换成了float32类型的了，而且里面的数据不是1就是0，所以直接求平均值就可以算出来其精度了。
feed_dict={xs: v_xs, ys: v_ys, keep_prob: 1}：这里实际上就是相当于赋值了。程序将数据替换成其想要处理的数据，也就是加载数据吧，实现数据的准备工作。
整个操作过程：
卷积 -> 池化 -> 卷积 -> 池化 -> 全连接 -> 全连接 -> softmax的预测结果
以上就是个人对代码的理解了，错误之处，烦请指出！