本地加载测试mnist数据集——使用keras库

最新推荐文章于 2024-06-22 19:13:31 发布

guotianqing

最新推荐文章于 2024-06-22 19:13:31 发布

阅读量1.2w

点赞数 8

分类专栏：人工智能文章标签： keras mnist 本地加载测试

本文链接：https://blog.csdn.net/guotianqing/article/details/109229950

版权

人工智能专栏收录该内容

19 篇文章

订阅专栏

本文详细介绍了MNIST数据集，它是机器学习领域用于手写数字识别的经典数据集，包含60000张训练图像和10000张测试图像。由于网络下载时常遇到问题，建议本地加载数据。加载数据时，可以指定数据集的本地路径，避免网络错误。通过示例代码展示了如何加载和检查数据，为后续的神经网络训练做好准备。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

简介

在机器学习的领域里，一个经典的示例就是将手写数字的灰度图像划分到10个分类中。

图像是28像素*28像素，10个分类就是0-9。数据集就是mnist。

mnist数据集是机器学习领域的一个经典数据集，包含60000张训练图像和10000张测试图像，由美国国家标准与技术研究院（NIST）在上个世纪80年代收集得到。

这个问题可以看作是深度学习领域的“hello world”，用它来验证算法是否按预期运行。

马上开始吧！

加载数据

有两种方法可以加载到数据：

网络下载

from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

这样以不带参数的形式调用load_data, 默认从网络下载，但由于数据在外网，你懂得，经常会下载失败。

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
Traceback (most recent call last):
  File "D:\anaconda3\envs\tf2\lib\urllib\request.py", line 1349, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "D:\anaconda3\envs\tf2\lib\http\client.py", line 1287, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "D:\anaconda3\envs\tf2\lib\http\client.py", line 1333, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "D:\anaconda3\envs\tf2\lib\http\client.py", line 1282, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "D:\anaconda3\envs\tf2\lib\http\client.py", line 1042, in _send_output
    self.send(msg)
  File "D:\anaconda3\envs\tf2\lib\http\client.py", line 980, in send
    self.connect()
  File "D:\anaconda3\envs\tf2\lib\http\client.py", line 1448, in connect
    server_hostname=server_hostname)
  File "D:\anaconda3\envs\tf2\lib\ssl.py", line 407, in wrap_socket
    _context=self, _session=session)
  File "D:\anaconda3\envs\tf2\lib\ssl.py", line 817, in __init__
    self.do_handshake()
  File "D:\anaconda3\envs\tf2\lib\ssl.py", line 1077, in do_handshake
    self._sslobj.do_handshake()
  File "D:\anaconda3\envs\tf2\lib\ssl.py", line 689, in do_handshake
    self._sslobj.do_handshake()
TimeoutError: [WinError 10060] 由于连接方在一段时间后没有正确答复或连接的主机没有反应，连接尝试失败。

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\anaconda3\envs\tf2\lib\site-packages\tensorflow\python\keras\utils\data_utils.py", line 278, in get_file
    urlretrieve(origin, fpath, dl_progress)
  File "D:\anaconda3\envs\tf2\lib\urllib\request.py", line 248, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "D:\anaconda3\envs\tf2\lib\urllib\request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "D:\anaconda3\envs\tf2\lib\urllib\request.py", line 526, in open
    response = self._open(req, data)
  File "D:\anaconda3\envs\tf2\lib\urllib\request.py", line 544, in _open
    '_open', req)
  File "D:\anaconda3\envs\tf2\lib\urllib\request.py", line 504, in _call_chain
    result = func(*args)
  File "D:\anaconda3\envs\tf2\lib\urllib\request.py", line 1392, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "D:\anaconda3\envs\tf2\lib\urllib\request.py", line 1351, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [WinError 10060] 由于连接方在一段时间后没有正确答复或连接的主机没有反应，连接尝试失败。>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\anaconda3\envs\tf2\lib\site-packages\tensorflow\python\keras\datasets\mnist.py", line 62, in load_data
    '731c5ac602752760c8e48fbffcf8c3b850d9dc2a2aedcf2cc48468fc17b673d1')
  File "D:\anaconda3\envs\tf2\lib\site-packages\tensorflow\python\keras\utils\data_utils.py", line 282, in get_file
    raise Exception(error_msg.format(origin, e.errno, e.reason))
Exception: URL fetch failure on https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz: None -- [WinError 10060] 由于连接方在一段时间后没有正确答复或连接的主机没有反应，连接尝试失败。

解决这个问题的一个办法就是在本地加载数据。

本地加载

首先下载数据集到本地：mnist.npz(下载不到，可在评论中留下邮箱地址)
修改代码，指定本地路径方式调用load_data()

如下：

from keras.datasets import mnist
path = r"E:\practice\tf2\mnist.npz" # 修改为数据实际路径
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data(path)

就ok了。

测试

代码如下：

# 在python交互环境下输入即可
>>> from keras.datasets import mnist
>>> path = r"E:\practice\tf2\mnist.npz" # 修改为数据实际路径
>>> path
'E:\\practice\\tf2\\mnist.npz' # 数据路径，我是在win下
# 加载得到训练数据和测试数据，模型在训练数据上进行训练，并在测试数据上进行效果验证测试
>>> (train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data(path)
>>> train_images.shape
(60000, 28, 28) # 图像是Numpy数组
>>> len(train_labels)
60000 # 标签与图像一一对应
>>> train_labels
array([5, 0, 4, ..., 5, 6, 8], dtype=uint8) # 标签是数字数组，取值0-9
# 测试数据同理
>>> test_images.shape
(10000, 28, 28)
>>> len(test_labels)
10000
>>> test_labels
array([7, 2, 1, ..., 4, 5, 6], dtype=uint8)