在使用 Python 爬虫库中的 urllib 或者 requests 库获取数据时,大多数网站都会对用户请求中的 User-Agent 进行检测,如果没有在请求头中设置 User-Agent,那么就会抛出如下异常:
http.client.RemoteDisconnected: Remote end closed connection without response
- urllib 设置 User-Agent 示例如下:
from urllib.request import urlopen, Request
# 要访问的地址
url = 'http://httpbin.org/get'
headers = {
'User-Agent': "Mozilla/5.0 (Windows; U; Windows NT 5.2) Gecko/2008070208 Firefox/3.0.1"
}
request = Request(url)
resp = urlopen(request)
print(resp.read().decode())
- requests 设置 User-Agent 示例如下:
import requests
url = 'http://httpbin.org/get'
headers = {
'User-Agent': "Mozilla/5.0 (Windows; U; Windows NT 5.2) Gecko/2008070208 Firefox/3.0.1"
}
resp = requests.get(url, headers=headers)
print(resp.text)