手把手：基于Tensorflow实现MNIST识别

Lantern2020

于 2024-09-30 16:10:08 发布

阅读量827

点赞数 31

文章标签： tensorflow 人工智能 python

本文链接：https://blog.csdn.net/Lantern2020/article/details/142657684

版权

一、MNIST介绍与读取

MNIST（Mixed National Institute of Standards and Technology database）是一个计算机视觉数据集，MNIST数据集来自美国国家标准与技术研究所, National Institute of Standards and Technology (NIST). 训练集 (training set) 由来自 250 个不同人手写的数字构成, 其中 50% 是高中学生, 50% 来自人口普查局 (the Census Bureau) 的工作人员. 测试集(test set) 也是同样比例的手写数字数据. 该数据集中的图片表示0~9的手写阿拉伯数字。mnist包含一个训练集（一个训练图片文件和一个训练标签文件）和一个测试集（一个测试图片文件，一个测试标签文件），其中训练集有60000个样本，测试集有10000个样本。

下载地址：http://yann.lecun.com/exdb/mnist/，从官方网站下载的数据是gz格式的压缩包，解压后可以得到原始文件。mnist数据集包含4个文件，分别对应60000个训练图片，60000个训练标签，10000个测试图片，10000个测试标签。数据集被分成两部分：60000 行的训练数据集（mnist.train）和10000行的测试数据集（mnist.test）。其中：60000 行的训练集分拆为 55000 行的训练集和 5000 行的验证集。

所以如果要正确提取MNIST数据集中的图片像素参量，有以下步骤或需要确定的：

指定二进制文件字节序（大端or小端）
读取二进制数据函数：
```
bin_data = open(filename, 'rb').read()
```

文件格式：offset与文件头信息、文件内容的关系，更新offset函数：

struct.calcsize(fmt_header) # 计算了按照格式字符串 fmt_header 打包数据所需的字节数

从字节对象中解析打包的数据函数：

magic_number, num_images, num_rows, num_cols = struct.unpack_from(fmt_header, bin_data, offset)
 # 从字节对象中解析打包的数据

参考博文：tensorflow入门数据集:mnist详解_tensorflow mnist-CSDN博客

大端读取：">iiii"（">"：表示大端模式；"iiii"：表示连续读取四个32位整数，i 表示32位整数）。
读取前16（0-15）个offset得到的是代表文件头的信息（文件头魔数、图像个数、图像宽度、图像高度），此后才是像素值。计算得到image_size = rows * cols。
像素值读取：unsigned byte（无符号字节），每个像素用一个字节表示，所以一幅手写数字图像像素大小为image_size个B。
```
fmt_image = '>' + str(image_size) + 'B'
```

大端读取：">ii"（">"：表示大端模式；"ii"：表示连续读取两个个32位整数，i 表示32位整数）。
读取前8（0-7）个offset得到的是代表文件头的信息（文件头魔数、标签个数），此后才是标签值。
像素值读取：unsigned byte（无符号字节），所以一个标签大小为1个B。

所以最终读取的函数：

def load_mnist_images(filename):
    # 读取二进制数据
    bin_data = open(filename, 'rb').read()

    # 解析文件头信息，依次为魔法数、图片数量、每张图片高、每张图片宽
    offset = 0
    fmt_header = '>iiii'
    magic_number, num_images, num_rows, num_cols = struct.unpack_from(fmt_header, bin_data, offset)
    print('魔法数:%d, 图片数量: %d张, 图片大小: %d*%d' % (magic_number, num_images, num_rows, num_cols))

    # 解析数据集
    image_size = num_rows * num_cols
    offset += struct.calcsize(fmt_header)  # 获得数据在缓存中的指针位置，从前面介绍的数据结构可以看出，读取了前4行之后，指针位置（即偏移位置offset）指向0016。
    # print(offset)
    fmt_image = '>' + str(image_size) + 'B'  
    # print(fmt_image, offset, struct.calcsize(fmt_image))
    images = np.empty((num_images, num_rows, num_cols))
    for j in range(num_images):
        if (j + 1) % 10000 == 0:
            print('已解析 %d' % (j + 1) + '张')
            print(offset)
        images[j] = np.array(struct.unpack_from(fmt_image, bin_data, offset)).reshape((num_rows, num_cols))
        offset += struct.calcsize(fmt_image)

    return images

def load_mnist_labels(filename):

    bin_data = open(filename, 'rb').read()

    # 解析文件头信息，依次为魔数和标签数
    offset = 0
    fmt_header = '>ii'
    magic_number, num_images = struct.unpack_from(fmt_header, bin_data, offset)
    print('魔法数:%d, 图片数量: %d张' % (magic_number, num_images))

    # 解析数据集
    offset += struct.calcsize(fmt_header)
    fmt_image = '>B'
    labels = np.empty(num_images)
    for j in range(num_images):
        if (j + 1) % 10000 == 0:
            print('已解析 %d' % (j + 1) + '张')
        labels[j] = struct.unpack_from(fmt_image, bin_data, offset)[0]
        offset += struct.calcsize(fmt_image)
    return labels

注意：

函数返回值的形式是turple（元组形式），所以需要加上索引（即使只有一个元素也需要索引），并且不可以直接进行reshape操作，如需进行reshape操作需要先变成Numpy数组。对应的两行代码如下：

images[j] = np.array(struct.unpack_from(fmt_image, bin_data, offset)).reshape((num_rows, num_cols))

labels[j] = struct.unpack_from(fmt_image, bin_data, offset)[0]

二、神经网络的搭建

tf.random.set_seed(1234)  # applied to achieve consistent results
model = models.Sequential(
    [
        layers.Dense(25, activation = 'relu',   name = "L1"),
        layers.Dense(15, activation = 'relu', name = "L2"),
        layers.Dense(10, activation = 'linear', name = 'L3')
    ]
)
model.compile(loss=losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=optimizers.Adam(0.001),
)
X_train = train_images.reshape((60000,28*28))
y_train = train_labels
history = model.fit(X_train,y_train,epochs=40)

三、模型训练的可视化

X = test_images.reshape((10000,28*28))
y = test_labels
m, n = X.shape
fig, axes = plt.subplots(8, 8, figsize=(5, 5))
fig.tight_layout(pad=0.13, rect=[0., 0.03, 1., 0.91])  # [left, bottom, right, top]
widgvis(fig)
for i, ax in enumerate(axes.flat):
    # Select random indices
    random_index = np.random.randint(m)

    # Select rows corresponding to the random indices and
    # reshape the image
    X_random_reshaped = X[random_index].reshape((28, 28)).T

    # Display the image
    ax.imshow(X_random_reshaped, cmap='gray')

    # Predict using the Neural Network
    prediction = model.predict(X[random_index].reshape(1, 784))
    prediction_p = tf.nn.softmax(prediction)
    yhat = np.argmax(prediction_p)

    # Display the label above the image
    ax.set_title(f"{int(y[random_index])},{yhat}", fontsize=10)
    ax.set_axis_off()
fig.suptitle("Label, yhat", fontsize=14)
plt.show()

errornum = display_errors(model,X,y)
print( f"{errornum} errors out of {len(X)} images")
print(f"Accuracy = {(1-errornum/len(X))*100}%")