用python爬取某q音乐的评论

最新推荐文章于 2024-03-29 18:50:14 发布

戏不能停啊

最新推荐文章于 2024-03-29 18:50:14 发布

阅读量384

点赞数 2

文章标签： python

本文链接：https://blog.csdn.net/weixin_45578934/article/details/107591328

版权

如题，用python爬取q音乐的评论内容（新人第一次发帖，有错误的地方请指正，谢谢了）

代码中并没有什么难度，发帖是为了互相交流学习一下，有需要的可以去试一下

下面是全部的代码情况，引用的就只有三个库：requests，re 和 time，里面具体的代码都进行了简短的解释说明，目前代码里只提取了评论的昵称，评论内容和评论的时间，其他内容的话可以自己去试着提取。

下面访问的链接都是可以抓取到的

# 代码仅供交流使用，请勿非法使用
import requests
import re
import time


def get_comment(url):
    '''提取评论列表'''
    headers = {
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'
    }
    #在这里面找到topid  #data-id="273111528"
    res = requests.get(url,headers=headers)
    other = res.text
    #search和compile都是可以的
    what = re.search('data-id="(.*?)"',other)
    #what = re.compile('data-id="(.*?)"').search(other)
    topid = what.group(1)
    print(topid)
    #print(topid)
    #找到后替换下面链接的topid
    #pagenum代表页数，除了必要的参数，多余无需的参数已经删除
    #在这设定一个空列表和一个n作为页数
    b_list = []
    n = 0
    #设定一个死循环，然后爬到没有了就跳出循环
    while True:
        res = requests.get(
            'https://c.y.qq.com/base/fcgi-bin/fcg_global_comment_h5.fcg?format=json&platform=yqq.json&reqtype=2&biztype=1&topid=' + topid + '&cmd=8&pagenum='+str(n)+'&pagesize=25',
            headers=headers)
        print(res.text)
        pattern = re.compile(r'"nick" : "(.*?)"[\S\s]*?"rootcommentcontent" : "(.*?)"[\S\s]*?"time" : (.*?),')
        com_list = re.findall(pattern, res.text)
        print(com_list)
        if com_list == []:
            break
        for i in com_list:
            b_list.append(i)
        n += 1
    print('出来了')
    print(b_list)
    return b_list


def time_change(timeStamp):
    '''时间戳转时间'''
    timeArray = time.localtime(int(timeStamp))
    otherStyleTime = time.strftime("%Y--%m--%d %H:%M:%S", timeArray)
    #print(otherStyleTime)
    return otherStyleTime


if __name__ == '__main__':
    #找到歌的主页链接，然后输入
    #https://y.qq.com/n/yqq/song/0046jSRG3i7FBr.html
    test_list = get_comment('https://y.qq.com/n/yqq/song/0046jSRG3i7FBr.html')
    for coms in test_list:
        b_time = time_change(coms[2])
        print('评论者：{}\n评论内容：{}\n评论时间：{}'.format(coms[0],coms[1],b_time))
        #延时自己可以改时间
        time.sleep(3)

变换处理列表的思路，把所取到的内容转换成字典的形式，然后加入到列表中，这个方式虽然很清楚的显示内容，但是比第一种列表的方式较为繁琐，下面是代码：

          for i in com_list:
            #第二种方式：设定一个字典，把key和value设定成一个局部变量的字典，然后赋值后面加入一个列表
            c_dict = {}
            c_dict['评论者'] = i[0]
            c_dict['评论内容'] = i[1]
            c_dict['评论时间'] = time_change(i[2])
            b_list.append(c_dict)


 #出来提取的方式，也跟之前的列表处理差不多
    for coms in test_list:
        print('评论者：{}\n评论内容：{}\n评论时间：{}'.format(coms['评论者'], coms['评论内容'], coms['评论时间']))
        time.sleep(3)

属于第二种方式的另一种提取，通过for取字典的key和value，具体看代码：

    for coms in test_list:
        # 通过for读取字典的形式，循环提取key和value值，一条一条的显示
        for key, value in coms.items():
            print(key+':'+value)
            time.sleep(3)

水平有限，有错误或者更好的提取方式可以交流指正一下，谢谢了

戏不能停啊

关注

2
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
用python爬取某q音乐的评论

如题，用python爬取q音乐的评论内容（新人第一次发帖，有错误的地方请指正，谢谢了）代码中并没有什么难度，发帖是为了互相交流学习一下，有需要的可以去试一下下面是全部的代码情况，引用的就只有三个库：requests，re 和 time，里面具体的代码都进行了简短的解释说明，目前代码里只提取了评论的昵称，评论内容和评论的时间，其他内容的话可以自己去试着提取。下面访问的链接都是可以抓取到的# 代码仅供交流使用，请勿非法使用import requestsimport reimport ti
复制链接

扫一扫