关于hands on machine learning中datasets数据无法下载的问题

背景

我在重复hands on machine learning with scikit-learn,keras&tensorflow中的10.2.2中的代码时发现总是报错(我需要下载的数据是fashion-mnist数据集):


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz





---------------------------------------------------------------------------
ConnectionAbortedError                    Traceback (most recent call last)
c:\users\fanxuezhe\appdata\local\programs\python\python37\lib\urllib\request.py in do_open(self, http_class, req, **http_conn_args)
   1316                 h.request(req.get_method(), req.selector, req.data, headers,
-> 1317                           encode_chunked=req.has_header('Transfer-encoding'))
   1318             except OSError as err: # timeout error

c:\users\fanxuezhe\appdata\local\programs\python\python37\lib\http\client.py in request(self, method, url, body, headers, encode_chunked)
   1228         """Send a complete request to the server."""
-> 1229         self._send_request(method, url, body, headers, encode_chunked)
   1230 

c:\users\fanxuezhe\appdata\local\programs\python\python37\lib\http\client.py in _send_request(self, method, url, body, headers, encode_chunked)
   1274             body = _encode(body, 'body')
-> 1275         self.endheaders(body, encode_chunked=encode_chunked)
   1276 

c:\users\fanxuezhe\appdata\local\programs\python\python37\lib\http\client.py in endheaders(self, message_body, encode_chunked)
   1223             raise CannotSendHeader()
-> 1224         self._send_output(message_body, encode_chunked=encode_chunked)
   1225 

c:\users\fanxuezhe\appdata\local\programs\python\python37\lib\http\client.py in _send_output(self, message_body, encode_chunked)
   1015         del self._buffer[:]
-> 1016         self.send(msg)
   1017 

c:\users\fanxuezhe\appdata\local\programs\python\python37\lib\http\client.py in send(self, data)
    955             if self.auto_open:
--> 956                 self.connect()
    957             else:

c:\users\fanxuezhe\appdata\local\programs\python\python37\lib\http\client.py in connect(self)
   1391             self.sock = self._context.wrap_socket(self.sock,
-> 1392                                                   server_hostname=server_hostname)
   1393 

c:\users\fanxuezhe\appdata\local\programs\python\python37\lib\ssl.py in wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, session)
    411             context=self,
--> 412             session=session
    413         )

c:\users\fanxuezhe\appdata\local\programs\python\python37\lib\ssl.py in _create(cls, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, context, session)
    852                         raise ValueError("do_handshake_on_connect should not be specified for non-blocking sockets")
--> 853                     self.do_handshake()
    854             except (OSError, ValueError):

c:\users\fanxuezhe\appdata\local\programs\python\python37\lib\ssl.py in do_handshake(self, block)
   1116                 self.settimeout(None)
-> 1117             self._sslobj.do_handshake()
   1118         finally:

ConnectionAbortedError: [WinError 10053] 你的主机中的软件中止了一个已建立的连接。

解决办法

根据错误的内容判断应该是下载过程中响应时间过长,导致链接中断,数据没有下载下来,看了源码发现是需要下载四个文件,这四个文件如下:

dirname = os.path.join('datasets', 'fashion-mnist')
  base = 'https://storage.googleapis.com/tensorflow/tf-keras-datasets/'
  files = [
      'train-labels-idx1-ubyte.gz', 'train-images-idx3-ubyte.gz',
      't10k-labels-idx1-ubyte.gz', 't10k-images-idx3-ubyte.gz'
  ]

但是我利用浏览器把路径和文件名合并然后输入浏览器地址栏之后发现可以下载,那么就是下载文件的源码出问题了,并不是被qiaNg了。假如想要快点解决这个问题,那么可以选择自己用浏览器把文件下载下来然后把文件拷贝到缓存文件夹中(%USERPROFILE%.keras\datasets\fashion-mnist),让程序认为数据已经下载好了,不用再下载了就行,这里%USERPROFILE%指的是当前用户的用户目录相当于linux下的“~”符号。
我这里可以把我下载的文件放在百度云放在链接中放在这里(链接:https://pan.baidu.com/s/1vzoVIYKv7nOPnWwgZ8zsNw
提取码:dx03 ),希望帮到有需要的人。

其他信息

另外通过查看源码发现在python3中,tensorflow是通过urlretrieve来下载文件的,问题主要应该出现在这里。

###利用sys.version_info检查python的版本
if sys.version_info[0] == 2:

  def urlretrieve(url, filename, reporthook=None, data=None):
    """Replacement for `urlretrive` for Python 2.

    Under Python 2, `urlretrieve` relies on `FancyURLopener` from legacy
    `urllib` module, known to have issues with proxy management.

    Arguments:
        url: url to retrieve.
        filename: where to store the retrieved data locally.
        reporthook: a hook function that will be called once
            on establishment of the network connection and once
            after each block read thereafter.
            The hook will be passed three arguments;
            a count of blocks transferred so far,
            a block size in bytes, and the total size of the file.
        data: `data` argument passed to `urlopen`.
    """

    def chunk_read(response, chunk_size=8192, reporthook=None):
      content_type = response.info().get('Content-Length')
      total_size = -1
      if content_type is not None:
        total_size = int(content_type.strip())
      count = 0
      while True:
        chunk = response.read(chunk_size)
        count += 1
        if reporthook is not None:
          reporthook(count, chunk_size, total_size)
        if chunk:
          yield chunk
        else:
          break

    response = urlopen(url, data)
    with open(filename, 'wb') as fd:
      for chunk in chunk_read(response, reporthook=reporthook):
        fd.write(chunk)
else:
  from six.moves.urllib.request import urlretrieve

下载相关的代码

try:
      try:
        urlretrieve(origin, fpath, dl_progress)
      except HTTPError as e:
        raise Exception(error_msg.format(origin, e.code, e.msg))
      except URLError as e:
        raise Exception(error_msg.format(origin, e.errno, e.reason))
    except (Exception, KeyboardInterrupt) as e:
      if os.path.exists(fpath):
        os.remove(fpath)
      raise
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值