问题
python requests 爬虫时报超时错误,具体如下:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 421, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 416, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib/python3.8/http/client.py", line 1332, in getresponse
response.begin()
File "/usr/lib/python3.8/http/client.py", line 303, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.8/http/client.py", line 264, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
socket.timeout: timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 719, in urlopen
retries = retries.increment(
File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 400, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/lib/python3/dist-packages/six.py", line 703, in reraise
raise value
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 665, in urlopen
httplib_response = self._make_request(
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 423, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 330, in _raise_timeout
raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='****', port=80): Read timed out.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/xll/code/dzy/ddjs/parsebj.py", line 131, in <module>
r = s.get(url=url)
File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 543, in get
return self.request('GET', url, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 530, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 643, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/requests/adapters.py", line 529, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPConnectionPool(host='****', port=80): Read timed out.
可能是网页访问慢,也可能是返回的资源太大。
解决办法:
方法一:
捕获异常,记录有问题的网页,下次再跑。
try:
get_img(zk_dic, single_result, bianhao)
except:
fp = open(log_file, 'a')
fp.write(str(name)+','+str(bianhao))
fp.write('\n')
fp.close()
方法二:
设置时间长一些的超时timeout
r = requests.get(url=url, headers=headers, timeout=60)