Keras MNIST 手写数字识别数据集处理

最新推荐文章于 2024-09-17 15:14:03 发布

开心的小夏菇凉

最新推荐文章于 2024-09-17 15:14:03 发布

阅读量1.1k

点赞数 3

分类专栏：深度学习实践笔记文章标签：机器学习 python 深度学习人工智能数据分析

本文链接：https://blog.csdn.net/m0_54814566/article/details/117435128

版权

笔记同时被 2 个专栏收录

7 篇文章 0 订阅

订阅专栏

深度学习实践

4 篇文章 0 订阅

订阅专栏

一、下载MNIST数据

导入Keras及相关模块

#导入Keras及相关模块
import numpy as np
import pandas as pd
from keras.utils import np_utils
np.random.seed(10)

代码说明：

from keras.utils import np_utils 导入Keras.utils,因为后续要将label标签转换为One-Hot Encoding（一位有效编码）
import numpy as np 导入numpy 模块，Numpy是python语言的扩展链接库，支持维数组与矩阵运算
np.random.seed(10) 设置seed可以产生的随机数据

2、导入Keras模块

#导入Keras模块
from keras.datasets import mnist

3、下载数据集

#第一次进行MNISTU数据的下载
(x_train_image,y_train_label),(x_test_image,y_test_label)=mnist.load_data()

注：第一次执行时才会下载，下载之后不需要下载

4、读取并查看数据集

#读取MNIST数据
(x_train_image,y_train_label),(x_test_image,y_test_label)=mnist.load_data()
#查看MNIST数据
print('train data=',len(x_train_image))
print('test data=',len(x_test_image))

注：读取和下载都是用mnist.load_data()。

二、查看单个训练数据

1、可以在该文件的同一目录下查看是否有该数据集。

2、查看训练数据的images和labels

#训练数据是由images和labels所组成的
print('x_train_image:',x_train_image.shape)
print('y_train_label:',y_train_label.shape)

3、定义plot_image函数

#定义plot_image函数显示数字图像
import matplotlib.pyplot as plt
def plot_image(image):
    fig=plt.gcf()
    fig.set_size_inches(2,2)
    plt.imshow(image,cmap='binary')
    plt.show()

代码说明：

import matplotlib.pyplot as plt ->首先导入matplotlib.pyplot模块
def plot_image(image):->定义plot_image函数，传入image作为参数
fig=plt.gcf() fig.set_size_inches(2,2)->设置显示图形的大小
plt.imshow(image,cmap='binary')->使用plt.imshow 显示图形，传入参数image是28*28的图形，cmap参数设置为binary，以黑白灰度显示
plt.show()->开始绘图

4、查看第0个数字图像

#执行plot_image 函数查看第0个数字图像
plot_image(x_train_image[0])

5、查看第0项数据的label

#查看第0项label数据
y_train_label[0]

三、查看多项训练数据

1、定义显示多项数据的函数

#创建plot_images_labels_prediction()函数
import matplotlib.pyplot as plt #导入pyplot模块，后续会使用plt来引用
def plot_images_labels_prediction(images,labels,prediction,idx,num=10):#定义plot_images_labels_prediction()函数（参数：images(数字图像)、label（真实值）、prediction（预测结果）、idx（开始显示的数据index）、num（要显示的数据项数，默认是10，不超过25））
    fig=plt.gcf()             #设置显示图形大小
    fig.set_size_inches(12,14)#设置显示图形大小
    if num>25:#如果显示项数参数大于25，就设置为25，以免发生错误
        num=25
        pass
    for i in range(0,num):#for循环执行程序块内的程序代码，画出num个数字图形
        ax=plt.subplot(5,5,1+i)#建立subgraph 子图形为5行5列
        ax.imshow(images[idx],cmap='binary')#画出subgraph 子图形
        title="label="+str(labels[idx])#设置子图形title，显示便签（label）字段
        if len(prediction)>0:#如果传入了预测结果
            title+=",predict="+str(prediction[idx])#标题 title加入预测结果
            pass
        ax.set_title(title,fontsize=10)#设置子图形的标题 title与大小
        ax.set_xticks([]);ax.set_yticks([])#设置不显示刻度
        idx+=1#读取下一项
        pass
    plt.show()

2、使用函数，查看前10项数据

#查看训练数据前10项数据
plot_images_labels_prediction(x_train_image,y_train_label,[],0,10)#目前还没有预测结果（prediction），所以传入空list[]

3、查看test数据

#查看test测试数据
print('x_test_image:',x_test_image.shape)
print('y_test_label:',y_test_label.shape)

4、显示test数据的前十项images

#显示test测试数据
plot_images_labels_prediction(x_test_image,y_test_label,[],0,10)

四、features数据预处理

摘要：

features（数字图像的特征值）数据预处理可以分为下列两个步骤：

1、将原本的28*28 的数字图像以reshape转换为一维的向量，其长度是784，并且转换为float。2、数字图像image的数字标准化。

1、查看image的shape

步骤一、查看image的shape
print('x_train_image:',x_train_image.shape)
print('y_train_label:',y_train_label.shape)

2、用reshape转换image

#步骤二、将image以reshape转换（将原本28*28的二维数字图像以reshape转换为一维的向量，再以astype转换为float，共784个浮点数）
x_Train=x_train_image.reshape(60000,784).astype('float32')
x_Test=x_test_image.reshape(10000,784).astype('float32')

3、查看转换为一维向量的shape

#步骤三、查看转换为一维向量的shape
print('x_train:',x_Train.shape)
print('x_trst:',x_Test.shape)

4、查看转换后images第0项的内容

#步骤四、查看images第0项的内容
x_train_image[0]#每个数字都在从0-255的代表图形每一点灰度的深浅

5、将数字图像images标准化

#步骤五、将数字图像images的数字标准化（images的数字标准化可以提高后续训练模型的准确率，因为images的数字是从0到255的值，所以最简单的标准化方式是除以255）
x_Train_normalize=x_Train/255
x_Test_normalize=x_Test/255

6、查看标准化后的结果

#步骤六、查看数字图像images数字标准化后的结果（结果介于0-1之间）
x_Train_normalize[0]

五、label数据预处理

label（数字图像真实的值）标签字段原本是0-9的数字，必须以One-Hot Encoding(一位有效编码)转换为10个0或1的组合，例如数字七经过One-Hot Encoding转换后是0000000100，正好对应输出层的10个神经元。

1、查看原本的label

#步骤一、查看原本的label标签字段
y_train_label[:5]#查看训练数据label标签字段的前五项训练数据

2、将label进行转换

#步骤二、label标签字段进行One-Hot Encoding 转换（使用np_utils.to_categorical 分别传入参数y_train_label(训练数据)y_test_label（测试数据）的label标签字段，进行One-Hot Encoding转换）
y_TrainOneHot=np_utils.to_categorical(y_train_label)
y_TestOneHot=np_utils.to_categorical(y_test_label)

3、查看转换后的label

#步骤三、查看进行One-Hot Encoding 转换之后的label标签字段
y_TrainOneHot[:5]#查看训练数据label标签字段的前五项数据

六、尾声

数据预处理的工作就到这里了。。。。。。

开心的小夏菇凉

关注

3
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录