尝试根据AV号爬取B站封面。。。
尝试一:
首先尝试直接根据网址获取到原网页源码获取封面
比如直接根据av号访问
https://www.bilibili.com/video/av41949084
import ssl
import urllib.request
url = 'https://www.bilibili.com/video/av41949084'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.167 Safari/537.36',
'Referer': 'https://www.bilibili.com'} # 竟然是这里的错误
request = urllib.request.Request(url, headers=headers)
response = urllib.request.urlopen(request, context=ssl.SSLContext(ssl.PROTOCOL_TLSv1_2))
print(response)
content = response.read()
with open('bilibili.html', mode='wb') as f:
f.write(content)
B站还是做了一点反爬措施的。。。因此要添加一定的头信息。。一定要添加
‘Referer’: ‘https://w