爬取qq音乐的评论并生成词云
我们选取的是歌曲的周杰伦的 听妈妈的话
先看效果图
首先,我们进去qq音乐找到这首歌 网易云出来挨打
https://y.qq.com/n/yqq/song/002hXDfk0LX9KO.html
点击评论或者下拉就可以看到评论了。
按F12进入调页面,选择network,然后点击评论的下一页观察页面请求,出现一些图片,还有一个fcg开头的,观察响应界面
哦豁,我们神奇的发现评论数据藏在comment 对象下的commentlist 数组中,是json数据。
看一下Headers;
直接复制请求url,粘贴到地址栏,很幸运的是可以直接打开,可以直接打开就省事很多了:
观察不同页的评论的请求url
看来看去都没什么变化,变化的有两个:
- “pagenum” ,页数
- “lasthotcommentid” ,上一条热门评论的id
接下来划重点了,不要走神:
右键那个comment请求,复制,复制cURL。
得到这么一串:
curl 'https://c.y.qq.com/base/fcgi-bin/fcg_global_comment_h5.fcg?g_tk_new_20200303=5381&g_tk=5381&loginUin=0&hostUin=0&format=json&inCharset=utf8&outCharset=GB2312¬ice=0&platform=yqq.json&needNewCode=0&cid=205360772&reqtype=2&biztype=1&topid=102066257&cmd=8&needmusiccrit=0&pagenum=1&pagesize=25&lasthotcommentid=song_102066257_18578995_1591191607&domain=qq.com&ct=24&cv=10101010' \
-H 'authority: c.y.qq.com' \
-H 'sec-ch-ua: "\\Not\"A;Brand";v="99", "Chromium";v="84", "Microsoft Edge";v="84"' \
-H 'accept: application/json, text/javascript, */*; q=0.01' \
-H 'sec-ch-ua-mobile: ?0' \
-H 'user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.30 Safari/537.36 Edg/84.0.522.11' \
-H 'origin: https://y.qq.com' \
-H 'sec-fetch-site: same-site' \
-H 'sec-fetch-mode: cors' \
-H 'sec-fetch-dest: empty' \
-H 'referer: https://y.qq.com/n/yqq/song/002hXDfk0LX9KO.html' \
-H 'accept-language: zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6' \
-H 'cookie: pgv_pvid=2783427480; ts_uid=1847083617; pgv_pvi=8854632448; userAction=1; yqq_stat=0; pgv_info=ssid=s2338435723; pgv_si=s7558193152; ts_last=y.qq.com/n/yqq/song/002hXDfk0LX9KO.html' \
--compressed
生成请求代码:
import requests
url = 'https://c.y.qq.com/base/fcgi-bin/fcg_global_comment_h5.fcg'
querystring = {
"g_tk_new_20200303": "5381", "g_tk": "5381", "loginUin": "0", "hostUin": "0", "format": "json",
"inCharset": "utf8", "outCharset": "GB2312", "notice": "0", "platform": "yqq.json", "needNewCode": "0",
"cid": "205360772", "reqtype": "2", "biztype": "1", "topid": "102066257", "cmd": "8", "needmusiccrit": "0",
"pagenum": "1", "pagesize": "25", "lasthotcommentid": "song_10206