分析知乎评论

最新推荐文章于 2024-09-04 20:18:34 发布

Art_Int

最新推荐文章于 2024-09-04 20:18:34 发布

阅读量264

点赞数 1

分类专栏：爬虫文章标签： python

本文链接：https://blog.csdn.net/qq_45684803/article/details/108810406

版权

爬虫专栏收录该内容

1 篇文章 0 订阅

订阅专栏

使用python爬取知乎的评论并进行分析

经过查看众多网友的源码后，摸索出了爬取评论的较为快捷的方式。
打开某评论较多的话题，诸如：超短的笑抽的笑话
进入页面后，直接F12，进入Netword中的XHR，再进入root_comments?order=normal&limit=20&offset=0&status=open，查看responce，其中便是动态加载的json数据，这就是各位网友的精妙评论。
在这里插入图片描述
当最重要的数据的url链接找到后，问题就解决了一大半了，毕竟json数据可以用json库直接解析，十分方便快捷；加之某乎并没有反扒机制，对某些人员特别友好，那么我们直接用py来拉取数据吧。

import requests
import json

i = 0
while True:
    url = "https://www.zhihu.com/api/v4/answers/675151965/root_comments?order=normal&limit=20&offset={}&status=open".format(
        i)
    i += 20
    print(f"============正在打印{i / 20}页=============")
    headers = {
        'user-agent': '   ',
    }
    res = requests.get(url, headers=headers).content.decode('utf-8')
    jsonfile = json.loads(res)
    next_page = jsonfile['paging']['is_end']
    print(next_page)
    for data in jsonfile['data']:
        id = data['id']
        content = data['content']
        author = data['author']['member']['name']
        print(id, author, ':', content)
    if next_page == True:
        break

我们可以得到所需的评论：
在这里插入图片描述
当然我们会发现，有的评论下的回复并没有显示出来，于是我们继续对data进行遍历。

import requests
import json

i = 0
while True:
    url = "https://www.zhihu.com/api/v4/answers/675151965/root_comments?order=normal&limit=20&offset={}&status=open".format(
        i)
    i += 20
    print(f"=====================================================================正在打印{i / 20}页=====================================================================")
    headers = {
        'user-agent': '   ',
    }
    res = requests.get(url, headers=headers).content.decode('utf-8')
    jsonfile = json.loads(res)
    next_page = jsonfile['paging']['is_end']
    # print(next_page)
    for data in jsonfile['data']:
        # id = data['id']
        content = data['content']
        author = data['author']['member']['name']
        print(author, ':', content)
        for data_1 in data['child_comments']:
            # id = data_1['id']
            content = data_1['content']
            author = data_1['author']['member']['name']
            print('replay:', author, ':', content)
        print('-' * 150)
    if next_page == True:
        break

这样，我们就能得到评论区中所有的消息了。

Art_Int

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
分析知乎评论

使用python爬取知乎的评论并进行分析经过查看众多网友的源码后，摸索出了爬取评论的较为快捷的方式。打开某评论较多的话题，诸如：超短的笑抽的笑话进入页面后，直接F12，进入Netword中的XHR，再进入root_comments?order=normal&limit=20&offset=0&status=open，查看responce，其中便是动态加载的json数据，这就是各位网友的精妙评论。当最重要的数据的url链接找到后，问题就解决了一大半了，毕竟json数据可以用j
复制链接

扫一扫

专栏目录