python处理json数据——网易云评论爬取

最新推荐文章于 2024-08-10 23:14:03 发布

Dreamboat0707

最新推荐文章于 2024-08-10 23:14:03 发布

阅读量418

点赞数 1

文章标签： python json

本文链接：https://blog.csdn.net/qq_40659901/article/details/104220679

版权

python处理json数据——网易云评论爬取

- 准备
- 代码

准备

1.python 3.7

2.需要安装的库：
requests
jsonpath
pandas
time
fake_useragent

3.json数据网站链接

4.由于链接提供的json数据键值对不好观察，需要一个json格式转换工具
json转换工具

转换前
QQ截图20200208113203.png
转换后
QQ截图20200208113240.png
转换后

代码

import requests
import jsonpath
import pandas as pd
import time
from fake_useragent import UserAgent
ua = UserAgent()
headers = {'User-Agent': ua.random}
#时间转换,原始时间例如"time":1497320494166，转换成例如'2017-06-13 10:21:34.166'年月日时分秒
def stampToTime(stamp):
    datatime = time.strftime("%Y-%m-%d %H:%M:%S",time.localtime(float(str(stamp)[0:10])))
    datatime = datatime+'.'+str(stamp)[10:]
    return datatime
#获取json数据
def get_json(url):
    try:
        response = requests.get(url, headers=headers)
        if response.status_code == 200:
            json_text = response.json()
            return json_text
    except Exception:
        print('此页有问题！')
        return None
#获取评论时间，用户id，用户昵称，评论
def get_comments(url):
    data = []
    doc = get_json(url)#调用get_json()函数，获取json数据
    jobs = doc['hotComments']
    for job in jobs:
        dic = {}
        #使用jsonpath获取需要提取的键值对数据，jsonpath相关内容见官网地址
        #官网地址   https://pypi.org/project/jsonpath-ng/
        #调用stampToTime()函数转换时间
        dic['time'] = stampToTime(jsonpath.jsonpath(job, '$..time')[0])#时间
        dic['userId'] = jsonpath.jsonpath(job['user'], '$..userId')[0]  # 用户ID
        dic['nickname'] = jsonpath.jsonpath(job['user'], '$..nickname')[0]  # 用户名
        dic['content'] = jsonpath.jsonpath(job, '$..content')[0].replace('\r', '')#评论
        data.append(dic)
    return data

def main():
    #评论信息用json数据表示，url时json数据的网址
    url="http://music.163.com/api/v1/resource/comments/R_SO_4_483671599?limit=10&offset=0"
    comments=get_comments(url)
    #comments的类型时list，每个元素都是一个字典，将其每个元素改成list，方便写入csv
    # print(type(comments))
    commentslist=[]
    for single_comment in comments:
        singleuser = []
        singleuser.append(single_comment['time'])
        singleuser.append(single_comment['userId'])
        singleuser.append(single_comment['nickname'])
        singleuser.append(single_comment['content'])
        commentslist.append(singleuser)
    # print(commentslist)
    columnsName=['时间','用户ID','用户名','评论']
    #list没有to_csv的属性，也就是说list直接是转存不了为csv，
    # 为了解决这个问题，我们可以引入panas模块，使用其DataFrame属性。
    testdata=pd.DataFrame(columns=columnsName,data=commentslist)
    #将列表testdata存为csv文件
    testdata.to_csv('comments.csv',encoding='gbk')

if __name__ == "__main__":
    main()

Dreamboat0707

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
python处理json数据——网易云评论爬取

python处理json数据——网易云评论爬取准备代码准备1.python 3.72.需要安装的库：requestsjsonpathpandastimefake_useragent3.json数据网站链接4.由于链接提供的json数据键值对不好观察，需要一个json格式转换工具json转换工具转换前转换后转换后代码import requestsimport j...
复制链接

扫一扫