当爬虫遇到了像12306这种不信任证书的情况时,一般爬取会返回证书异常,因为12306的证书是自己颁发的而不是CA的。
解决:
from urllib.request import Request, urlopen
import ssl
request = Request('http://www.12306.cn/normhweb')
request.add_header(
'User-agent',
'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3554.0 Safari/537.36'
)
context = ssl._create_unverified_context()
res = urlopen(request, context=context)
with res:
pass