读取mnist数据集方法大全（train-images-idx3-ubyte.gz，train-labels.idx1-ubyte等）（python读取gzip文件）

最新推荐文章于 2025-03-14 14:54:24 发布

鸾镜朱颜暗换

最新推荐文章于 2025-03-14 14:54:24 发布

阅读量2.5w

点赞数 19

分类专栏： python 文章标签： python 深度学习 tensorflow

本文链接：https://blog.csdn.net/qq_34769162/article/details/108766073

版权

python 专栏收录该内容

62 篇文章

订阅专栏

文章目录

注：import导入的包如果未安装使用pip安装

gzip包

如果仅仅是读取.gz文件使用gzip包即可。
例子：当前目录有一个input.gz文件，用以下代码来读取：

import gzip
with gzip.open('input.gz') as file:
   all_content = file.read()

这样input.gz的文件都读取到了all_content里面

keras读取mnist数据集

from tensorflow import keras
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

本地读取mnist数据集

下载数据集

数据集下载地址
数据集界面：

解压读取

方法一

from mnist import MNIST

mndata = MNIST('samples')

images, labels = mndata.load_training()
# or
images, labels = mndata.load_testing()
index = random.randrange(0, len(images))  # choose an index ;-)
print(mndata.display(images[index]))

方法二

将.gz文件解压后读取

from mlxtend.data import loadlocal_mnist
import platform

if not platform.system() == 'Windows':
    X, y = loadlocal_mnist(
            images_path='train-images.idx3-ubyte', 
            labels_path='train-labels.idx1-ubyte')

else:
    X, y = loadlocal_mnist(
            images_path='train-images.idx3-ubyte', 
            labels_path='train-labels.idx1-ubyte')
print('Dimensions: %s x %s' % (X.shape[0], X.shape[1]))
print('\n1st row', X[0])

gzip包读取

import gzip
import numpy as np
import matplotlib.pyplot as plt 

with gzip.open('train-images-idx3-ubyte.gz') as all_img:
    all_img = all_img.read()

# print(all_img[:4])
# print((len(all_img)-16)/784)
img1 = all_img[16:16+784]
img = []
for i in range(28):
    for j in range(28):
        img.append(img1[28*i+j])
#print(img)
img = np.array(img).reshape(28, 28)
print(img.shape)
plt.imshow(img)
plt.show()

读取bytes数据

参考stackoverflowConvert bytes to a string

>>> b"abcde"
b'abcde'

# utf-8 is used here because it is a very common encoding, but you
# need to use the encoding your data is actually in.
>>> b"abcde".decode("utf-8") 
'abcde'