背景
我在重复hands on machine learning with scikit-learn,keras&tensorflow中的10.2.2中的代码时发现总是报错(我需要下载的数据是fashion-mnist数据集):
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
---------------------------------------------------------------------------
ConnectionAbortedError Traceback (most recent call last)
c:\users\fanxuezhe\appdata\local\programs\python\python37\lib\urllib\request.py in do_open(self, http_class, req, **http_conn_args)
1316 h.request(req.get_method(), req.selector, req.data, headers,
-> 1317 encode_chunked=req.has_header('Transfer-encoding'))
1318 except OSError as err: # timeout error
c:\users\fanxuezhe\appdata\local\programs\python\python37\lib\http\client.py in request(self, method, url, body, headers, encode_chunked)
1228 """Send a complete request to the server."""
-> 1229 self._send_request(method, url, body, headers, encode_chunked)
1230
c:\users\fanxuezhe\appdata\local\programs\python\python37\lib\http\client.py in _send_request(self, method, url, body, headers, encode_chunked)
1274 body = _encode(body, 'body')
-> 1275 self.endheaders(body, encode_chunked=encode_chunked)
1276
c:\users\fanxuezhe\appdata\local\programs\python\python37\lib\http\client.py in endheaders(self, message_body, encode_chunked)
1223 raise CannotSendHeader()
-> 1224 self._send_output(message_body, encode_chunked=encode_chunked)
1225
c:\users\fanxuezhe\appdata\local\programs\python\python37\lib\http\client.py in _send_output(self, message_body, encode_chunked)
1015 del self._buffer[:]
-> 1016 self.send(msg)
1017
c:\users\fanxuezhe\appdata\local\programs\python\python37\lib\http\client.py in send(self, data)
955 if self.auto_open:
--> 956 self.connect()
957 else:
c:\users\fanxuezhe\appdata\local\programs\python\python37\lib\http\client.py in connect(self)
1391 self.sock = self._context.wrap_socket(self.sock,
-> 1392 server_hostname=server_hostname)
1393
c:\users\fanxuezhe\appdata\local\programs\python\python37\lib\ssl.py in wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, session)
411 context=self,
--> 412 session=session
413 )
c:\users\fanxuezhe\appdata\local\programs\python\python37\lib\ssl.py in _create(cls, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, context, session)
852 raise ValueError("do_handshake_on_connect should not be specified for non-blocking sockets")
--> 853 self.do_handshake()
854 except (OSError, ValueError):
c:\users\fanxuezhe\appdata\local\programs\python\python37\lib\ssl.py in do_handshake(self, block)
1116 self.settimeout(None)
-> 1117 self._sslobj.do_handshake()
1118 finally:
ConnectionAbortedError: [WinError 10053] 你的主机中的软件中止了一个已建立的连接。
解决办法
根据错误的内容判断应该是下载过程中响应时间过长,导致链接中断,数据没有下载下来,看了源码发现是需要下载四个文件,这四个文件如下:
dirname = os.path.join('datasets', 'fashion-mnist')
base = 'https://storage.googleapis.com/tensorflow/tf-keras-datasets/'
files = [
'train-labels-idx1-ubyte.gz', 'train-images-idx3-ubyte.gz',
't10k-labels-idx1-ubyte.gz', 't10k-images-idx3-ubyte.gz'
]
但是我利用浏览器把路径和文件名合并然后输入浏览器地址栏之后发现可以下载,那么就是下载文件的源码出问题了,并不是被qiaNg了。假如想要快点解决这个问题,那么可以选择自己用浏览器把文件下载下来然后把文件拷贝到缓存文件夹中(%USERPROFILE%.keras\datasets\fashion-mnist),让程序认为数据已经下载好了,不用再下载了就行,这里%USERPROFILE%指的是当前用户的用户目录相当于linux下的“~”符号。
我这里可以把我下载的文件放在百度云放在链接中放在这里(链接:https://pan.baidu.com/s/1vzoVIYKv7nOPnWwgZ8zsNw
提取码:dx03 ),希望帮到有需要的人。
其他信息
另外通过查看源码发现在python3中,tensorflow是通过urlretrieve来下载文件的,问题主要应该出现在这里。
###利用sys.version_info检查python的版本
if sys.version_info[0] == 2:
def urlretrieve(url, filename, reporthook=None, data=None):
"""Replacement for `urlretrive` for Python 2.
Under Python 2, `urlretrieve` relies on `FancyURLopener` from legacy
`urllib` module, known to have issues with proxy management.
Arguments:
url: url to retrieve.
filename: where to store the retrieved data locally.
reporthook: a hook function that will be called once
on establishment of the network connection and once
after each block read thereafter.
The hook will be passed three arguments;
a count of blocks transferred so far,
a block size in bytes, and the total size of the file.
data: `data` argument passed to `urlopen`.
"""
def chunk_read(response, chunk_size=8192, reporthook=None):
content_type = response.info().get('Content-Length')
total_size = -1
if content_type is not None:
total_size = int(content_type.strip())
count = 0
while True:
chunk = response.read(chunk_size)
count += 1
if reporthook is not None:
reporthook(count, chunk_size, total_size)
if chunk:
yield chunk
else:
break
response = urlopen(url, data)
with open(filename, 'wb') as fd:
for chunk in chunk_read(response, reporthook=reporthook):
fd.write(chunk)
else:
from six.moves.urllib.request import urlretrieve
下载相关的代码
try:
try:
urlretrieve(origin, fpath, dl_progress)
except HTTPError as e:
raise Exception(error_msg.format(origin, e.code, e.msg))
except URLError as e:
raise Exception(error_msg.format(origin, e.errno, e.reason))
except (Exception, KeyboardInterrupt) as e:
if os.path.exists(fpath):
os.remove(fpath)
raise