MINIST数据库之本地图片化_minist图片-CSDN博客

本文链接：https://blog.csdn.net/vola9527/article/details/79929002

MINIST 数据库作为机器学习入门数据库之一, 被广泛使用. 其中包含了共70,001张手写字符0-9的28x28的图片. 原始的MINIST是以二进制形式发布的, 需要一系列的转换才能转化为本地图片. 不方便进行数据库扩展及数据库可视化. 本文将MINIST数据库进行了本地化. 方便后来者进行进一步的数据库扩展.
本文利用了keras中自带的MINIST数据库,这个自带的数据库已经进行了train set 和test set的划分.因此,本文将相应的数据集保存到了本地.
直接上代码:
import necessary packages

import os
import numpy as np
import matplotlib.pyplot as plt
from keras.datasets import mnist
from matplotlib.image import imsave
import itertools

get the data which has been shuffled and split between train and test sets.

# the data, shuffled and split between tran and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
print("X_train original shape", X_train.shape)
print("y_train original shape", y_train.shape)

results: 此处可以看到, 训练集6w张图片.
show some demons

for i in range(9):
    plt.subplot(3,3,i+1)
    plt.imshow(X_train[i], cmap='gray', interpolation='none')
    plt.title("Class {}".format(y_train[i]))

* the destination folder structure we tend to use:
外层结构:

内层结构

* 对训练集的处理: 图片名称从0开始编号.

image_counter = itertools.count(0)
for image, label in zip(X_train, y_train):
    dest_folder = os.path.join(train_path, str(label))
    image_name = next(image_counter)
    image_path = os.path.join(dest_folder, str(image_name)+'.png')

    if not os.path.exists(dest_folder):
        os.mkdir(dest_folder)

    imsave(image_path, image, cmap = 'gray')

对测试集图片的处理:图片接着训练集编号.

for image, label in zip(X_test, y_test):
    dest_folder = os.path.join(test_path, str(label))
    image_name = next(image_counter)
    image_path = os.path.join(dest_folder, str(image_name)+'.png')

    if not os.path.exists(dest_folder):
        os.mkdir(dest_folder)

    imsave(image_path, image, cmap = 'gray')