Tensorflow2.0之Minist手写数字识别

最新推荐文章于 2024-05-25 10:52:03 发布

AndSonder

最新推荐文章于 2024-05-25 10:52:03 发布

阅读量3.3k

点赞数 6

分类专栏：小白的ai学习之路 TF2 文章标签： python tensorflow 人工智能深度学习

本文链接：https://blog.csdn.net/python_LC_nohtyp/article/details/104107648

版权

小白的ai学习之路同时被 2 个专栏收录

83 篇文章 8 订阅

订阅专栏

TF2

37 篇文章 2 订阅

订阅专栏

Tensorflow2.0之Minist手写数字识别

注：完整代码在最后

Minist数据集介绍

Minist数据集是(Lecun, Bottou, Bengio, & Haffner, 1998)发布的，它包含了0~9 共10 种数字的手写图片，每种数字一共有7000 张图片，采集自不同书写风格的真实手写图片，一共70000 张图片。其中60000张图片作为训练集𝔻 $^{train}$ (Training Set)，用来训练模型，剩下10000 张图片作为测试集𝔻 $^{test}$ (Test Set)，用来预测或者测试，训练集和测试集共同组成了整个MNIST 数据集。

考虑到手写数字图片包含的信息比较简单，每张图片均被缩放到28 × 28的大小，同时
只保留了灰度信息

现在我们来看下图片的表示方法。一张图片包含了ℎ行(Height/Row)，𝑤(Width/Column)，每个位置保存了像素(Pixel)值，像素值一般使用0~255 的整形数值来表达颜色强度信息，例如0 表示强度最低，255 表示强度最高。如果是彩色图片，则每个像素点包含了R、G、B 三个通道的强度信息，分别代表红色通道、绿色通道、蓝色通道的颜色强度，所以与灰度图片不同，它的每个像素点使用一个1 维、长度为3 的向量(Vector)来表示，向量的3 个元素依次代表了当前像素点上面的R、G、B 颜色强值，因此彩色图片需要保存为形状是[ℎ, 𝑤, 3]的张量(Tensor，可以通俗地理解为3 维数组)。如果是灰度图片，则使用一个数值来表示灰度强度，例如0 表示纯黑，255 表示纯白，因此它只需要一个形状为[ℎ, 𝑤]的二维矩阵(Matrix)来表示一张图片信息(也可以保存为[ℎ, 𝑤, 1]形状的张量)。图 3.3 演示了内容为8 的数字图片的矩阵内容，可以看到，图片中黑色的像素用0 表示，灰度信息用0~255 表示，图片中灰度越白的像素点，对应矩阵位置中数值也就越大。

在这里插入图片描述

网络结构介绍

本文中使用的简单的三层神经网络：
$out=relu \{\ relu\{\ [X@W_1+b_1]@W_2+b_2\}\ @W_3+b_3 \}\$
out 可以套上激活函数也可以不用套
我们采用的数据集是MNIST 手写数字图片集，输入节点数为784，第一层的输出节点数是256，第二层的输出节点数是128，第三层的输出节点是10，也就是当前样本属于10 类别的概率。

代码部分

导入相应的包

from matplotlib import pyplot as mp
import tensorflow as tf
from tensorflow.keras import datasets, layers, optimizers

预处理函数

从 keras.datasets 中加载的数据集的格式大部分情况都不能满足模型的输入要求，因此需要根据用户的逻辑自己实现预处理函数。Dataset 对象通过提供map(func)工具函数可以非常方便地调用用户自定义的预处理逻辑，它实现在func 函数里：

# 预处理函数实现在preprocess 函数中，传入函数引用即可
train_db = train_db.map(preprocess)

考虑 MNIST 手写数字图片，从keras.datasets 中经.batch()后加载的图片x shape 为[𝑏, 28,28]，像素使用0~255 的整形表示；标注shape 为[𝑏]，即采样的数字编码方式。实际的神经网络输入，一般需要将图片数据标准化到[0,1]或[−1,1]等0 附近区间，同时根据网络的设置，需要将shape [28,28] 的输入Reshape 为合法的格式；对于标注信息，可以选择在预处理时进行one-hot 编码，也可以在计算误差时进行one-hot 编码。

同时，我们将MNIST 图片数据映射到𝑥 ∈ [0,1]区间，视图调整为
[𝑏, 28 ∗ 28]；对于标注y，我们选择在预处理函数里面进行one-hot 编码：

def preprocess(x, y): # 自定义的预处理函数
	# 调用此函数时会自动传入x,y 对象，shape 为[b, 28, 28], [b]
	# 标准化到0~1
	x = tf.cast(x, dtype=tf.float32) / 255.
	x = tf.reshape(x, [-1, 28*28]) # 打平
	y = tf.cast(y, dtype=tf.int32) # 转成整形张量
	y = tf.one_hot(y, depth=10) # one-hot 编码
	# 返回的x,y 将替换传入的x,y 参数，从而实现数据的预处理功能
	return x,y

加载手写数据集并进行数据处理

batchsz = 512
train_db = tf.data.Dataset.from_tensor_slices((x, y))  # 转化为Dataset对象
train_db = train_db.shuffle(1000)  # 随机打散
train_db = train_db.batch(batchsz)  # 批训练
train_db = train_db.map(preprocess)  # 数据预处理
train_db = train_db.repeat(20)  # 复制20份数据
test_db = tf.data.Dataset.from_tensor_slices((x_test, y_test))
test_db = test_db.shuffle(1000).batch(batchsz).map(preprocess)
x, y = next(iter(train_db))
print('train sample:', x.shape, y.shape)

关于随机打散，批训练之类的内容在我的另一篇博客中有讲解：https://blog.csdn.net/python_LC_nohtyp/article/details/104106498

main() 函数部分

在本次的网络中我们定义学习率lr=1e-2,并使用accs和losses两个列表来存储准确度和误差，方便之后绘图使用

设置网络层结构

网络的输入结点有784个，输出结点有10个

# 784 => 512
    w1, b1 = tf.Variable(tf.random.normal([784, 256], stddev=0.1)), tf.Variable(tf.zeros([256]))
    # 512 => 256
    w2, b2 = tf.Variable(tf.random.normal([256, 128], stddev=0.1)), tf.Variable(tf.zeros([128]))
    # 256 => 10
    w3, b3 = tf.Variable(tf.random.normal([128, 10], stddev=0.1)), tf.Variable(tf.zeros([10]))

循环更新

现在我们进行循环更新，使用for循环去变量上述得到的train_db,并对w1,w2,w3,b1,b2,b3进行更新。

for step, (x, y) in enumerate(train_db):
	...

下面说的都是for循环内的内容：

先我们将图片信息张量打平

x = tf.reshape(x, (-1, 784))

之后进行网络的搭建和误差的计算

with tf.GradientTape() as tape:
    # layer1.
    h1 = x @ w1 + b1
    h1 = tf.nn.relu(h1)
    # layer2
    h2 = h1 @ w2 + b2
    h2 = tf.nn.relu(h2)
    # output
    out = h2 @ w3 + b3
    # compute loss
    # [b, 10] - [b, 10]
    loss = tf.square(y - out)
    # [b, 10] => scalar
    loss = tf.reduce_mean(loss)

通过自动求导函数计算梯度（求偏导）并进行参数的更新

参数更新通过公式：
$\theta '= \theta - \eta *\frac{\sigma L}{\sigma \theta}$
进行更新

# 计算梯度
grads = tape.gradient(loss, [w1, b1, w2, b2, w3, b3])
# 参数更新
for p, g in zip([w1, b1, w2, b2, w3, b3], grads):
    p.assign_sub(lr * g)

每当step可以被100整除的时候打印一下错误率，并将其添加到列表当中,同时还进行准确度的计算

# print
if step % 100 == 0:
    print(step, 'loss:', float(loss))
    losses.append(float(loss))
if step % 100 == 0:
    ...

接下来说一下第二个if里应该写什么

首先我们先定义两个变量用于计算准确度

total, total_correct = 0., 0

之后我们去迭代测试集获得准确度

我们将测试集中的图片数据带入到目前的网络中进行对比，我们知道网络输出的是一个[b,10]结构的张量，b代表在每个数据集下的准确度，那么我们就选取最大的作为预测值
我们根据tf.argmax函数选出概率最大值出现的索引号，也即样本最有可能的类别号：
pred = tf.argmax(out, axis=1)
由于我们的标注y 已经在预处理中完成了one-hot 编码，这在测试时其实是不需要的，因此通过tf.argmax 可以得到数字编码的标注y：
y = tf.argmax(y, axis=1)
通过tf.equal 可以比较这2 者的结果是否相等：
correct = tf.equal(pred, y)
并求和比较结果中所有True(转换为1)的数量，即为预测正确的数量：
total_correct += tf.reduce_sum(tf.cast(correct,dtype=tf.int32)).numpy()
通过预测的数量除以总测试数量即可得到准确度：
print(step, ‘Evaluate Acc:’, total_correct/total)

if step % 100 == 0:
    # evaluate/test
    total, total_correct = 0., 0
    # 计算准确度
    for x, y in test_db:
        # layer1.
        h1 = x @ w1 + b1
        h1 = tf.nn.relu(h1)
        # layer2
        h2 = h1 @ w2 + b2
        h2 = tf.nn.relu(h2)
        # output
        out = h2 @ w3 + b3
        # [b, 10] => [b]
        pred = tf.argmax(out, axis=1)
        # convert one_hot y to number y
        y = tf.argmax(y, axis=1)
        # bool type
        correct = tf.equal(pred, y)
        # bool tensor => int tensor => numpy
        total_correct += tf.reduce_sum(tf.cast(correct, dtype=tf.int32)).numpy()
        total += x.shape[0]
    print(step, 'Evaluate Acc:', total_correct / total)
    accs.append(total_correct / total)

到这里循环更新的内容就写完了，
通过简单的3 层神经网络，训练20 个Epoch 后，我们在测试集上获得了87.25%的准确率，如果使用复杂的神经网络模型，增加数据增强，精调网络超参数等技巧，可以获得更高的模型性能

生成svg图片文件

mp.figure()
x = [i * 80 for i in range(len(losses))]
mp.plot(x, losses, color='C0', marker='s', label='train')
mp.ylabel('MSE')
mp.xlabel('Step')
mp.legend()
mp.savefig('train.svg')
mp.figure()
mp.plot(x, accs, color='C1', marker='s', label='test')
mp.ylabel('Acc')
mp.xlabel('Step')
mp.legend()
mp.savefig('test.svg')

完整代码

from matplotlib import pyplot as mp
import tensorflow as tf
from tensorflow.keras import datasets, layers, optimizers


def preprocess(x, y):
    """
    预处理函数
    """
    # [b, 28, 28], [b]
    print(x.shape, y.shape)
    x = tf.cast(x, dtype=tf.float32) / 255.
    x = tf.reshape(x, [-1, 28 * 28])  # 将图片打平
    y = tf.cast(y, dtype=tf.int32)
    y = tf.one_hot(y, depth=10)
    return x, y


(x, y), (x_test, y_test) = datasets.mnist.load_data()  # 加载手写数据集数据
print('x:', x.shape, 'y:', y.shape, 'x test:', x_test.shape, 'y test:', y_test)

batchsz = 512
train_db = tf.data.Dataset.from_tensor_slices((x, y))  # 转化为Dataset对象
train_db = train_db.shuffle(1000)  # 随机打散
train_db = train_db.batch(batchsz)  # 批训练
train_db = train_db.map(preprocess)  # 数据预处理
train_db = train_db.repeat(20)  # 复制20份数据
test_db = tf.data.Dataset.from_tensor_slices((x_test, y_test))
test_db = test_db.shuffle(1000).batch(batchsz).map(preprocess)
x, y = next(iter(train_db))
print('train sample:', x.shape, y.shape)


def main():
    # learning rate
    lr = 1e-2
    accs, losses = [], []
    # 784 => 512
    w1, b1 = tf.Variable(tf.random.normal([784, 256], stddev=0.1)), tf.Variable(tf.zeros([256]))
    # 512 => 256
    w2, b2 = tf.Variable(tf.random.normal([256, 128], stddev=0.1)), tf.Variable(tf.zeros([128]))
    # 256 => 10
    w3, b3 = tf.Variable(tf.random.normal([128, 10], stddev=0.1)), tf.Variable(tf.zeros([10]))
    for step, (x, y) in enumerate(train_db):
        # [b, 28, 28] => [b, 784]
        x = tf.reshape(x, (-1, 784))
        with tf.GradientTape() as tape:
            # layer1.
            h1 = x @ w1 + b1
            h1 = tf.nn.relu(h1)
            # layer2
            h2 = h1 @ w2 + b2
            h2 = tf.nn.relu(h2)
            # output
            out = h2 @ w3 + b3
            # compute loss
            # [b, 10] - [b, 10]
            loss = tf.square(y - out)
            # [b, 10] => scalar
            loss = tf.reduce_mean(loss)
        # 计算梯度
        grads = tape.gradient(loss, [w1, b1, w2, b2, w3, b3])
        # 参数更新
        for p, g in zip([w1, b1, w2, b2, w3, b3], grads):
            p.assign_sub(lr * g)

        # print
        if step % 100 == 0:
            print(step, 'loss:', float(loss))
            losses.append(float(loss))

        if step % 100 == 0:
            # evaluate/test
            total, total_correct = 0., 0
            # 计算准确度
            for x, y in test_db:
                # layer1.
                h1 = x @ w1 + b1
                h1 = tf.nn.relu(h1)
                # layer2
                h2 = h1 @ w2 + b2
                h2 = tf.nn.relu(h2)
                # output
                out = h2 @ w3 + b3
                # [b, 10] => [b]
                pred = tf.argmax(out, axis=1)
                # convert one_hot y to number y
                y = tf.argmax(y, axis=1)
                # bool type
                correct = tf.equal(pred, y)
                # bool tensor => int tensor => numpy
                total_correct += tf.reduce_sum(tf.cast(correct, dtype=tf.int32)).numpy()
                total += x.shape[0]
            print(step, 'Evaluate Acc:', total_correct / total)
            accs.append(total_correct / total)

    mp.figure()
    x = [i * 80 for i in range(len(losses))]
    mp.plot(x, losses, color='C0', marker='s', label='train')
    mp.ylabel('MSE')
    mp.xlabel('Step')
    mp.legend()
    mp.savefig('train.svg')

    mp.figure()
    mp.plot(x, accs, color='C1', marker='s', label='test')
    mp.ylabel('Acc')
    mp.xlabel('Step')
    mp.legend()
    mp.savefig('test.svg')


if __name__ == '__main__':
    main()

AndSonder

关注

6
点赞
踩
33

收藏

觉得还不错? 一键收藏
9
评论
Tensorflow2.0之Minist手写数字识别

Tensorflow2.0之Minist手写数字识别注：完整代码在最后Minist数据集介绍 Minist数据集是(Lecun, Bottou, Bengio, & Haffner, 1998)发布的，它包含了0~9 共10 种数字的手写图片，每种数字一共有7000 张图片，采集自不同书写风格的真实手写图片，一共70000 张图片。其中60000张图片作为训练集????train^{tr...
复制链接

扫一扫

专栏目录