《动手学深度学习》手动导入数据集产生错误的解决方法

最新推荐文章于 2022-07-21 17:03:32 发布

OneLine_

最新推荐文章于 2022-07-21 17:03:32 发布

阅读量2.5k

点赞数 3

文章标签：深度学习

本文链接：https://blog.csdn.net/OneLine_/article/details/106755741

版权

动手学深度学习

书本网页版 https://zh.gluon.ai/chapter_preface/preface.html

b站视频讲解 https://space.bilibili.com/209599371?spm_id_from=333.788.b_765f7570696e666f.1

书本源代码、pdf及数据集 链接：https://pan.baidu.com/s/1U53gc7ZIXsF1U23x1g8g4A 提取码：f793

在运行书本源代码的时候

需要导入数据集（没有数据集的话第一次会下载数据集

但是可能出现各种各样的错误导致我们不能自动下载数据集

（网络问题或者后台路径问题等等

参考资料：https://discuss.gluon.ai/t/topic/642/32

先试着加镜像

set MXNET_GLUON_REPO=https://apache-mxnet.s3.cn-north-1.amazonaws.com.cn/ jupyter notebook

如果还是RunTimeError 可以试试我的方法

我的解决方法是 使用了同学的数据集以及修改源代码 dataset.py

最后有加查找SHA1值的方法

1. 四个数据集放在C:\Users\OneLine\.mxnet\datasets\fashion-mnist 文件夹内

（不用解压）

也可以根据网页提示找到原本的下载路径

2. 对比我原来的dataset.py 和同学的dataset.py 稍微修改一下（建议直接下拉到末尾

dataset.py的位置在 D:\Anaconda\envs\gluon\Lib\site-packages\mxnet\gluon\data\vision

可能不是D盘要看自己安装软件的位置 ~\mxnet\gluon\data\vision 找到这一串应该没错

我的：

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

# coding: utf-8
# pylint: disable=
"""Dataset container."""
__all__ = ['MNIST', 'FashionMNIST', 'CIFAR10', 'CIFAR100',
           'ImageRecordDataset', 'ImageFolderDataset']

import os
import gzip
import tarfile
import struct
import warnings
import numpy as np

from .. import dataset
from ...utils import download, check_sha1, _get_repo_file_url
from .... import nd, image, recordio, base


class MNIST(dataset._DownloadedDataset):
    """MNIST handwritten digits dataset from http://yann.lecun.com/exdb/mnist

    Each sample is an image (in 3D NDArray) with shape (28, 28, 1).

    Parameters
    ----------
    root : str, default $MXNET_HOME/datasets/mnist
        Path to temp folder for storing data.
    train : bool, default True
        Whether to load the training or testing set.
    transform : function, default None
        A user defined callback that transforms each sample. For example::

            transform=lambda data, label: (data.astype(np.float32)/255, label)

    """
    def __init__(self, root=os.path.join(base.data_dir(), 'datasets', 'mnist'),
                 train=True, transform=None):
	self._train = train
	self._train_data = ('train-images-idx3-ubyte.gz','0cf37b0d40ed5169c6b3aba31069a9770ac9043d')
        self._train_label = ('train-labels-idx1-ubyte.gz','236021d52f1e40852b06a4c3008d8de8aef1e40b')
        self._test_data = ('t10k-images-idx3-ubyte.gz','626ed6a7c06dd17c0eec72fa3be1740f146a2863')
        self._test_label = ('t10k-labels-idx1-ubyte.gz','17f9ab60e7257a1620f4ad76bbbaf857c3920701')
	self._namespace = 'mnist'
        super(MNIST, self).__init__(root, transform)

    def _get_data(self):
        if self._train:
            data, label = self._train_data, self._train_label
        else:
            data, label = self._test_data, self._test_label

        namespace = 'gluon/dataset/'+self._namespace
        data_file = download(_get_repo_file_url(namespace, data[0]),
                             path=self._root,
                             sha1_hash=data[1])
        label_file = download(_get_repo_file_url(namespace, label[0]),
                              path=self._root,
                              sha1_hash=label[1])

        with gzip.open(label_file, 'rb') as fin:
            struct.unpack(">II", fin.read(8))
            label = np.frombuffer(fin.read(), dtype=np.uint8).astype(np.int32)

        with gzip.open(data_file, 'rb') as fin:
            struct.unpack(">IIII", fin.read(16))
            data = np.frombuffer(fin.read(), dtype=np.uint8)
            data = data.reshape(len(label), 28, 28, 1)

        self._data = nd.array(data, dtype=data.dtype)
        self._label = label


class FashionMNIST(MNIST):
    """A dataset of Zalando's article images consisting of fashion products,
    a drop-in replacement of the original MNIST dataset from
    https://github.com/zalandoresearch/fashion-mnist

    Each sample is an image (in 3D NDArray) with shape (28, 28, 1).

    Parameters
    ----------
    root : str, default $MXNET_HOME/datasets/fashion-mnist'
        Path to temp folder for storing data.
    train : bool, default True
        Whether to load the training or testing set.
    transform : function, default None
        A user defined callback that transforms each sample. For example::

            transform=lambda