Python获取中国大学MOOC某课程评论及其参与人数

最新推荐文章于 2025-02-21 01:38:32 发布

PyCrawlFlutter Lab

最新推荐文章于 2025-02-21 01:38:32 发布

阅读量1.7k

点赞数 2

分类专栏：网络爬虫案例文章标签： python 爬虫

本文链接：https://blog.csdn.net/Uncle_wangcode/article/details/129181883

版权

网络爬虫案例专栏收录该内容

10 篇文章

订阅专栏

在这里插入图片描述

文章目录

前言
一、需求
二、分析
三、运行结果

前言

本系列文章来源于真实的需求
本系列文章你来提我来做
本系列文章仅供学习参考

一、需求

1、课程参加人数

在这里插入图片描述

2、课程学员名称及其评论

在这里插入图片描述

二、分析

首先查看网页源代码是否有需要的数据

课程参加人数

在这里插入图片描述

课程学员名称及其评论

F12 打开浏览器工具进行抓包分析

课程学员名称及其评论

通过浏览器快速重新发起接口请求（Replay XHR),测试发送成功

在这里插入图片描述

分析接口参数

csrfKey 动态
courseId 课程id 固定
pageSize 页面固定
orderBy 固定

在这里插入图片描述

找到csrfKey来源，构造请求即可获取评论接口数据

通过全局搜索当前页面接口并未发现csrfkey

在这里插入图片描述

在进入Mook官网首页通过搜索csrfkey的值可以定位来源(清除cookie重新刷新获取首页接口数据，进入课程详情页刷新对比接口数据即可定位)

在这里插入图片描述

整体解决方案
1、首页获取set-cookie的值并处理
2、构造评论接口发起请求获取评论数据

# -*- encoding:utf-8 -*-
__author__ = "Nick"
__created_date__ = "2023/02/23"


import requests
import re
import json


# 请求头定义
HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36"
}


# 获取课程参与人数和评论
def get_course_participate_comment():
    # session 实例化
    session = requests.session()
    # 慕课主页url
    index_url = "https://www.icourse163.org/"
    # 慕课主页发起请求，获取后面需要的csrfKey
    index_res = session.get(index_url, headers=HEADERS)
    # 获取csrfKey
    key = index_res.cookies.get("NTESSTUDYSI")

    # 详细课程url
    course_url = "https://www.icourse163.org/course/XJTU-46016?from=searchPage&outVendor=zw_mooc_pcssjg_"
    course_res = session.get(url=course_url,headers=HEADERS)
    # 获取课程参与人数
    deal = re.compile(r'enrollCount : "(.*?)"')
    result = deal.findall(course_res.text)
    participate_person = result[0]

    #课程评论url

    comment_url = f"https://www.icourse163.org/web/j/mocCourseV2RpcBean.getCourseEvaluatePaginationByCourseIdOrTermId.rpc?csrfKey={key}"
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36",
        "referer": "https://www.icourse163.org/course/XJTU-46016?from=searchPage&outVendor=zw_mooc_pcssjg_",
    }

    # 写入文本文件中
    with open("../慕课课程参与人数和课程评价.txt", mode="w", encoding="utf-8") as f:
        f.write(f"课程参与人数: {participate_person}\n")
        # 获取所有评论数,这里就获取前7页数据
        for i in range(1,8):
            param = {
                "courseId": "268001",
                "pageIndex": i,
                "pageSize": "20",
                "orderBy": "3"
            }
            comment_res = session.post(url=comment_url, data=param,headers=headers)
            data = json.loads(comment_res.text)
            for count in range(len(data["result"]["list"])):
                user_name = data["result"]["list"][count]["userNickName"]
                content = data["result"]["list"][count]["content"]
                f.write(f"学员姓名:{user_name}, 评价:{content}\n")
                print("一条数据写入完成！")
            print("数据写入完毕！")


if __name__ == '__main__':
    get_course_participate_comment()