爬取哔哩哔哩综合排行榜信息及视频弹幕内容
爬取所需
工具:python3,谷歌浏览器,pycharm
模块:requests,re,lxml
爬取思路
进入排行榜爬取所有的视频url,再依靠for循环依次提取弹幕内容
通过排行榜url获取所有视频url(使用XPath)![](https://img-blog.csdnimg.cn/20210207115026369.jpg?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzQ1MzgxNzUw,size_16,color_FFFFFF,t_70#pic_center)
url = 'https://www.bilibili.com/v/popular/rank/all' # 排行榜页面的url
head = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36",
}
r = requests.get(url, headers=head)
# print(r.text)
html = etree.HTML(r.text)
href_list = html.xpath('//*[@id="app"]/div[2]/div[2]/ul/li/div[2]/div[2]/a/@href') # 视频url
name_list = html.xpath('//*[@id="app"]/div[2]/div[2]/ul/li/div[2]/div[2]/a/text()') # 视频标题
# print(href_list)
# print(name_list)
运行结果如下: