Python3爬取B站视频全部评论
1.为什么有这篇文章
最近受朋友委托需要爬取B站视频下的评论作为他的分析数据,我上网查了很多相关教程和文章都没有爬取全部的评论,不能满足朋友的需求,只好自己动手,在此分享一下实现代码供大家学习交流。
2.相关代码
通过输入B站视频Bvid号,可以将视频下的评论全部保存到csv文件中。(说明一下:输出显示 总评论数 > 当前评论数,这并不是BUG,而是总评论数包含评论和回复,这里只是爬取了评论。)
import requests
import re
import time
import csv
#消息头信息
header={'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36',
}
#获取评论API
original_url = 'https://api.bilibili.com/x/v2/reply/main?jsonp=jsonp&next={}&type=1&oid={}&mode=3'
#时间戳转换成日期
def get_time(ctime):
timeArray = time.localtime(ctime)
otherStyleTime = time.strftime("%Y.%m.%d", timeArray)
return str(otherStyleTime)
#获取aid
def get_oid(bvid):
video_url = 'https://www.bilibili.com/video/' + bvid
page = requests.get(video_url, headers=header).text
aid = re.search(r'"aid":[0-9]+', page).group()[6:]
return aid
#边爬取评论边保存文件
def online_save(Bvid):
all_count = 0
oid = get_oid(Bvid)
page = 1
url = original_url.format(page, oid)
html = requests.get(url, headers=header)
data = html.json()
count = int(data['data']['cursor']['all_count'])
fname = Bvid + '_评论.csv'
with open(fname, 'w+', newline='', encoding='utf_8_sig') as f:
csv_writer=csv.writer(f)
csv_writer.writerow(["时间", "点赞", "评论"])
for i in data['data']['replies']:
message=i['content']['message']
message = re.sub('\s+', '', message)
ctime=get_time(i['ctime'])
like=i['like']
csv_writer.writerow([ctime,str(like),message])
all_count = all_count + 1
print('总评论数:{},当前评论数:{},爬取Page{}完毕。'.format(count, all_count, page))
time.sleep(5)
while all_count < count:
page += 1
url = original_url.format(page, oid)
try:
html = requests.get(url, headers=header)
data = html.json()
for i in data['data']['replies']:
message = i['content']['message']
ctime = get_time(i['ctime'])
like = i['like']
csv_writer.writerow([ctime, str(like), message])
# f.write(ctime+'\t' + str(like) + '\n')
# f.write(message)
# f.write('\n------------------------\n')
all_count = all_count + 1
print('总评论数:{},当前评论数:{},爬取Page{}完毕。'.format(count, all_count, page))
time.sleep(5)
except:
break
f.close()
if __name__=='__main__':
Bvid=input('输入视频Bvid:')
online_save(Bvid)
print('完成!')
运行结果如下:
分享完毕,希望可以帮到有需要的朋友。