关于hands on machine learning中datasets数据无法下载的问题

最新推荐文章于 2024-03-24 14:34:03 发布

weixin_43364556

最新推荐文章于 2024-03-24 14:34:03 发布

阅读量463

点赞数

分类专栏： Tensorflow 文章标签：大数据

本文链接：https://blog.csdn.net/weixin_43364556/article/details/111667142

版权

Tensorflow 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

文章目录

背景

我在重复hands on machine learning with scikit-learn,keras&tensorflow中的10.2.2中的代码时发现总是报错(我需要下载的数据是fashion-mnist数据集)：


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz





---------------------------------------------------------------------------
ConnectionAbortedError                    Traceback (most recent call last)
c:\users\fanxuezhe\appdata\local\programs\python\python37\lib\urllib\request.py in do_open(self, http_class, req, **http_conn_args)
   1316                 h.request(req.get_method(), req.selector, req.data, headers,
-> 1317                           encode_chunked=req.has_header('Transfer-encoding'))
   1318             except OSError as err: # timeout error

c:\users\fanxuezhe\appdata\local\programs\python\python37\lib\http\client.py in request(self, method, url, body, headers, encode_chunked)
   1228         """Send a complete request to the server."""
-> 1229         self._send_request(method, url, body, headers, encode_chunked)
   1230 

c:\users\fanxuezhe\appdata\local\programs\python\python37\lib\http\client.py in _send_request(self, method, url, body, headers, encode_chunked)
   1274             body = _encode(body, 'body')
-> 1275         self.endheaders(body, encode_chunked=encode_chunked)
   1276 

c:\users\fanxuezhe\appdata\local\programs\python\python37\lib\http\client.py in endheaders(self, message_body, encode_chunked)
   1223             raise CannotSendHeader()
-> 1224         self._send_output(message_body, encode_chunked=encode_chunked)
   1225 

c:\users\fanxuezhe\appdata\local\programs\python\python37\lib\http\client.py in _send_output(self, message_body, encode_chunked)
   1015         del self._buffer[:]
-> 1016         self.send(msg)
   1017 

c:\users\fanxuezhe\appdata\local\programs\python\python37\lib\http\client.py in send(self, data)
    955             if self.auto_open:
--> 956                 self.connect()
    957             else:

c:\users\fanxuezhe\appdata\local\programs\python\python37\lib\http\client.py in connect(self)
   1391             self.sock = self._context.wrap_socket(self.sock,
-> 1392                                                   server_hostname=server_hostname)
   1393 

c:\users\fanxuezhe\appdata\local\programs\python\python37\lib\ssl.py in wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, session)
    411             context=self,
--> 412             session=session
    413         )

c:\users\fanxuezhe\appdata\local\programs\python\python37\lib\ssl.py in _create(cls, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, context, session)
    852                         raise ValueError("do_handshake_on_connect should not be specified for non-blocking sockets")
--> 853                     self.do_handshake()
    854             except (OSError, ValueError):

c:\users\fanxuezhe\appdata\local\programs\python\python37\lib\ssl.py in do_handshake(self, block)
   1116                 self.settimeout(None)
-> 1117             self._sslobj.do_handshake()
   1118         finally:

ConnectionAbortedError: [WinError 10053] 你的主机中的软件中止了一个已建立的连接。

解决办法

根据错误的内容判断应该是下载过程中响应时间过长，导致链接中断，数据没有下载下来，看了源码发现是需要下载四个文件,这四个文件如下：

dirname = os.path.join('datasets', 'fashion-mnist')
  base = 'https://storage.googleapis.com/tensorflow/tf-keras-datasets/'
  files = [
      'train-labels-idx1-ubyte.gz', 'train-images-idx3-ubyte.gz',
      't10k-labels-idx1-ubyte.gz', 't10k-images-idx3-ubyte.gz'
  ]

但是我利用浏览器把路径和文件名合并然后输入浏览器地址栏之后发现可以下载，那么就是下载文件的源码出问题了，并不是被qiaNg了。假如想要快点解决这个问题，那么可以选择自己用浏览器把文件下载下来然后把文件拷贝到缓存文件夹中(%USERPROFILE%.keras\datasets\fashion-mnist)，让程序认为数据已经下载好了，不用再下载了就行，这里%USERPROFILE%指的是当前用户的用户目录相当于linux下的“~”符号。
我这里可以把我下载的文件放在百度云放在链接中放在这里(链接：https://pan.baidu.com/s/1vzoVIYKv7nOPnWwgZ8zsNw
提取码：dx03 ),希望帮到有需要的人。

其他信息

另外通过查看源码发现在python3中，tensorflow是通过urlretrieve来下载文件的，问题主要应该出现在这里。

###利用sys.version_info检查python的版本
if sys.version_info[0] == 2:

  def urlretrieve(url, filename, reporthook=None, data=None):
    """Replacement for `urlretrive` for Python 2.

    Under Python 2, `urlretrieve` relies on `FancyURLopener` from legacy
    `urllib` module, known to have issues with proxy management.

    Arguments:
        url: url to retrieve.
        filename: where to store the retrieved data locally.
        reporthook: a hook function that will be called once
            on establishment of the network connection and once
            after each block read thereafter.
            The hook will be passed three arguments;
            a count of blocks transferred so far,
            a block size in bytes, and the total size of the file.
        data: `data` argument passed to `urlopen`.
    """

    def chunk_read(response, chunk_size=8192, reporthook=None):
      content_type = response.info().get('Content-Length')
      total_size = -1
      if content_type is not None:
        total_size = int(content_type.strip())
      count = 0
      while True:
        chunk = response.read(chunk_size)
        count += 1
        if reporthook is not None:
          reporthook(count, chunk_size, total_size)
        if chunk:
          yield chunk
        else:
          break

    response = urlopen(url, data)
    with open(filename, 'wb') as fd:
      for chunk in chunk_read(response, reporthook=reporthook):
        fd.write(chunk)
else:
  from six.moves.urllib.request import urlretrieve

下载相关的代码

try:
      try:
        urlretrieve(origin, fpath, dl_progress)
      except HTTPError as e:
        raise Exception(error_msg.format(origin, e.code, e.msg))
      except URLError as e:
        raise Exception(error_msg.format(origin, e.errno, e.reason))
    except (Exception, KeyboardInterrupt) as e:
      if os.path.exists(fpath):
        os.remove(fpath)
      raise

weixin_43364556

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
关于hands on machine learning中datasets数据无法下载的问题

文章目录背景解决办法其他信息背景我在重复hands on machine learning with scikit-learn,keras&tensorflow中的10.2.2中的代码时发现总是报错(我需要下载的数据是fashion-mnist数据集)：Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz-------
复制链接

扫一扫