人工智能 – python3 爬虫:如何应对“反爬虫”? ---- 爬虫时Request网站时报错:http.client.RemoteDisconnected
1. 报错info
爬虫时,请求网站时报错:
http.client.RemoteDisconnected: Remote end closed connection without response
2. 原因:
服务器限制了部分User-Agent的访问
3. 解决方案: 添加User-Agent
例:
原来报错代码:
data = urllib.request.urlopen(url).read().decode("utf-8", "ignore")
改成:
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36'}
# 或者随便起个名字,如 headers = {'User-Agent': 'Firefox'}
req_data = urllib.request.Request(url, headers=headers)
data = urllib.request.urlopen(req_data).read().decode("utf-8", "ignore")
说明:
仅用于技术研究。请科学上网,没事别一直挂着爬别人的网站,免得给对方server造成压力。谢谢~