Python urllib user_agent
如果默认不更改Python的urllib user_agent,则user_agent的字符串是:[19/Apr/2020:10:02:44 +0800] "GET / HTTP/1.1" 200 47657 "-" "Python-urllib/3.7"
这里摘录的是Nginx的日志文件,可以看到最后一行是默认的Python的urllib user_agent。
如果需要自定义urllib的头(header),具体代码如下:# !/usr/bin/env python3
# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
from urllib.error import HTTPError, URLError
try:
req = Request('https://www.materialtools.com/')
req.add_header('User-agent',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36')
except (HTTPError, URLError) as e:
self._logger_write_file.error('执行 get_page_data 函数时出现错误,具体错误内容: {error_message}'.format(error_message=e))
return False
html = urlopen(req)
bsObj = BeautifulSoup(html.read())
print(bsObj)
或者在urllib2里面的写法如下:try:
from urllib.request import Request, urlopen # Python 3
except ImportError:
from urllib2 import Request, urlopen # Python 2
req = Request('http://api.company.com/items/details?country=US&language=en')
req.add_header('apikey', 'xxx')
content = urlopen(req).read()
print(content)
之后看看Nginx日志文件内容:[19/Apr/2020:10:34:25 +0800] "GET / HTTP/1.1" 200 47658 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36"
参考资料:
黄兵个人博客原创。