Python爬虫js分析

Code糕手

已于 2024-04-23 16:36:32 修改

阅读量1.5k

点赞数 5

分类专栏： Python 爬虫文章标签： python 爬虫 javascript

于 2021-11-29 23:11:56 首次发布

本文链接：https://blog.csdn.net/qq_36684855/article/details/121616330

版权

Python 爬虫专栏收录该内容

3 篇文章 4 订阅

订阅专栏

文章目录

前言
一、思路分析
- 1.watchPoint分析
- 2.ev分析
- 3.learningTokenId分析
- 4.courseId分析
- 5.uuid分析
二、代码实现
总结

前言

最近让去知到看网课，我寻思这不是折磨兄弟吗？当即我直接破防，然后决定搞一手，从问题根源来解决问题。

提示：以下是本篇文章正文内容，下面案例可供参考

一、思路分析

我们先打开F12开发者模式，播放个视频抓下接口看看嗷。
找到了很多接口，但是最有用的应该还是这个
在这里插入图片描述
为什么我要说它最有用了？
因为看接口名称我们就应该知道，这个接口和视频时长是相关的，做个测试大家就明白了。
我们打开一个视频，反复点播放和暂停就能看出来。

好了，现在我们找到了接口，下一步我们就来看一下它提交的参数有什么？
在这里插入图片描述
可以看到有6个参数，分别是：
watchPoint:
ev:
learingTokenId:
courseId:
uuid:
dateFormate:
我们现在就只知道最后一个参数是时间戳，其他的一个都不知道是啥，这咋办？
最好的方法就是直接复制一个参数名称到top文件夹里去搜索，我就用watchPoint测试。
在这里插入图片描述
我们点击去，再在js文件里面搜索watchPoint

我们直接找到了我们刚刚看的接口对应的函数，而且这里面就有着watchPoint、ev、learningTokenId、courseId四个参数了，而且还有个时间戳。
所以现在还就只有个uuid不知道是啥了。
我们先来看看这四个参数

1.watchPoint分析

我们继续在当前js文件里面搜索剩余的watchPoint，最后在一个叫learningTimeRecord的方法里面找到了watchPoint是如何生成的。

learningTimeRecord: function() {
                var t = parseInt(this.totalStudyTime / 5) + 2
                  , e = null == this.watchPointPost || "" == this.watchPointPost ? "0,1," : this.watchPointPost + ",";
                this.watchPointPost = e + t
            },

这里面出现了一个变量totalStudyTime ，我们在当前js文件里面搜索一下看看。
最后还是在totalStudyTimeFun方法中找到了这个变量

totalStudyTimeFun: function() {
                this.totalStudyTime += 5 * this.playRate,
                this.totalTimeFinish += 5 * this.playRate,
                this.playTimes += 5 * this.playRate,
                this.computeProgree()
            },

那这个变量到底是啥意思啊？其实在上面的图片中就有了，是当前学习总时间。而this.playRate又是啥了？是当前播放的倍速。我们可以直接默认为1。所以说this.totalStudyTime += 5，并且在最下面可以看到this.totalStudyTime初始是为0的。
我们现在再回到learningTimeRecord方法中
变量t 它是等于parseInt(this.totalStudyTime / 5) + 2；
变量e 它先做了一个判断当(null == this.watchPointPost || “” == this.watchPointPost)为true时，e就等于"0,1,"如果为false时 e就等于this.watchPointPost + “,”；
最后是this.watchPointPost = e + t。

那现在我们就有点迷糊了，就这啊？怎么感觉不对劲啊？我们继续在js文件里面查找，最后找到了个叫startTotalTimer的方法

startTotalTimer: function() {
                this.learningTimeRecordInterval = setInterval(this.learningTimeRecord, 1990),
                this.totalStudyTimeInterval = setInterval(this.totalStudyTimeFun, 4990),
                this.cacheInterval = setInterval(this.saveCacheIntervalTime, 18e4),
                this.databaseInterval = setInterval(this.saveDatabaseIntervalTime, 3e5)
            },

这一下就明了了嗷，这里有4个定时器，而里面有两个方法就是我们目前需要的。我们现在知道了每1990毫秒执行一次learningTimeRecord，每4990毫秒执行一次totalStudyTimeFun。我们用Python代码模拟一下看看

# 生成watchPoint
# videoSec为当前视频总长度
def generateWatchPoint(videoSec):
    # 初始化为0
    tolStudyTime = 0
    watchPointPost = ""
    # 因为要想一次性提交整个视频的时间的话
    # 那么我们用for循环来实现定时器，从0s开始，视频总长度结束
    for i in range(0, videoSec):
        # 取余来判断当前该执行哪一个
        if i % 2 == 0:
            t, e = learningTimeRecord(tolStudyTime, watchPointPost)
            watchPointPost = e + str(t)
        if i % 5 == 0:
            tolStudyTime += 5
    return watchPointPost


# 模拟learningTimeRecord
def learningTimeRecord(tolStudyTime, watchPointPost):
    t = int(tolStudyTime / 5) + 2
    if watchPointPost is None or watchPointPost == "":
        e = "0,1,"
    else:
        e = watchPointPost + ","
    return t, e

至此watchPoint参数我们算是搞定了嗷！！

2.ev分析

我们先把saveDatabaseIntervalTime整过来

saveDatabaseIntervalTime: function(t, e, n) {
                var o = this
                  , a = this.lessonId
                  , r = this.smallLessonId
                  , s = [this.recruitId, a, r, this.lastViewVideoId, this.videoDetail.chapterId, this.data.studyStatus, parseInt(this.playTimes), parseInt(this.totalStudyTime), i.i(p.g)(ablePlayerX("container").getPosition())]
                  , l = {
                    watchPoint: this.watchPointPost,
                    ev: this.D26666.Z(s),
                    learningTokenId: C.encode(this.preVideoInfo.studiedLessonDto.id),
                    courseId: this.courseId
                };
                console.log("提交进度时间:" + this.playTimes),
                console.log("观看总时间:" + this.totalStudyTime),
                m.a.saveDatabaseIntervalTime(l).then(function(i) {
                    o.playTimes = 0,
                    o.watchPointPost = "",
                    -10 == i.code ? (o.tipsDialog = !0,
                    o.tipsMsg = "同时播放多个视频，其他页面的学习进度将停止记录哦！",
                    o.tipsBtn = "我知道了") : 403 == i.code ? setTimeout(function() {
                        window.location.href = root + "/login/gologin?fromurl=" + encodeURIComponent(window.location.href)
                    }, 3e3) : 0 != i.code ? o.backDialog = !0 : o.saveDataFilish && t && (o.prelearningNote(t, e, n),
                    o.saveDataFilish = !1)
                })
            },

其实可以看到ev其实就是this.D26666.Z(s)，而s是啥了？变量s就是上面的
s = [this.recruitId, a, r, this.lastViewVideoId, this.videoDetail.chapterId, this.data.studyStatus, parseInt(this.playTimes), parseInt(this.totalStudyTime), i.i(p.g)(ablePlayerX(“container”).getPosition())]
分析一下数据：
开头五个参数都在/learning/videolist接口返回的json数据中
this.recruitId: 对应recruitId
a(this.lessonId): 对应lessonId
r(this.smallLessonId): 对应id（这个值有些课程没有所以需要自行适配）
this.lastViewVideoId: 对应videoId
this.videoDetail.chapterId: 对应chapterId
this.data.studyStatus: 固定值 “0”
parseInt(this.playTimes): 当前播放时长（上一次暂停到这一次暂停的时长）
parseInt(this.totalStudyTime): 该视频播放的总时长
i.i(p.g)(ablePlayerX(“container”).getPosition()): 当前视频总时间需要转换成hh:mm:ss形式
该视频总时长也在返回的json数据中对应的数据是videoSec

我们s搞清楚了，再来看看this.D26666.Z这个方法。这个方法直接搜索是找不到的，所以我建议直接打断点，然后直接定位过去

, , function(t, e, i) {
    "use strict";
    var n = {
        _a: "AgrcepndtslzyohCia0uS@",
        _b: "A0ilndhga@usreztoSCpyc",
        _c: "d0@yorAtlhzSCeunpcagis",
        _d: "zzpttjd",
        X: function(t) {
            for (var e = "", i = 0; i < t[this._c[8] + this._a[4] + this._c[15] + this._a[1] + this._a[8] + this._b[6]]; i++) {
                var n = t[this._a[3] + this._a[14] + this._c[18] + this._a[2] + this._b[18] + this._b[16] + this._c[0] + this._a[4] + this._b[0] + this._b[15]](i) ^ this._d[this._b[21] + this._b[6] + this._a[17] + this._c[5] + this._b[18] + this._c[4] + this._a[7] + this._a[4] + this._a[0] + this._c[7]](i % this._d[this._a[10] + this._b[13] + this._b[4] + this._a[1] + this._c[7] + this._a[14]]);
                e += this.Y(n)
            }
            return e
        },
        Y: function(t) {
            var e = t[this._c[7] + this._a[13] + this._a[20] + this._b[15] + this._a[2] + this._b[2] + this._c[15] + this._c[19]](16);
            return e = e[this._b[3] + this._a[4] + this._b[4] + this._a[1] + this._c[7] + this._c[9]] < 2 ? this._b[1] + e : e,
            e[this._a[9] + this._b[3] + this._c[20] + this._c[17] + this._c[13]](-4)
        },
        Z: function(t) {
            for (var e = "", i = 0; i < t.length; i++)
                e += t[i] + ";";
            return e = e.substring(0, e.length - 1),
            this.X(e)
        }
    };
    e.a = n
}

这不就找到了吗？我们这个就不模拟了，直接用Python的execjs去执行js代码就行。

3.learningTokenId分析

learningTokenId 是在/learning/prelearningNote接口中返回的id，且用Base64编码。

4.courseId分析

这个就很简单了就是上面watchPoint里面的courseId

5.uuid分析

uuid就在/login/getLoginUserInfo接口中，我们直接取出来就行了

二、代码实现

import execjs
import requests
import time
import base64

# recruitId
recruitId = ""
# courseId
courseId = ""
# 存储当前课程所有视频信息
videoInformationList = []
# 时间戳 毫秒级
t = time.time()
dateFormate = int(round(t * 1000) / 1000)
# 课程secret码
recruitAndCourseId = ""
# session
SESSION = input("请输入session登录：")
# 请求头
headers = {
    "Cookie": "SESSION=" + SESSION + ";"
}
# 用户id
uuid = ""
# 判断类型
typeNumbers = 0


ctx = execjs.compile("""
var _a = "AgrcepndtslzyohCia0uS@",
_b = "A0ilndhga@usreztoSCpyc",
_c = "d0@yorAtlhzSCeunpcagis",
_d = "zzpttjd";

function X(t) {
    for (var e = "", i = 0; i < t[_c[8] + _a[4] + _c[15] + _a[1] + _a[8] + _b[6]]; i++) {
        var n = t[_a[3] + _a[14] + _c[18] + _a[2] + _b[18] + _b[16] + _c[0] + _a[4] + _b[0] + _b[15]](i) ^ 
        _d[_b[21] + _b[6] + _a[17] + _c[5] + _b[18] + _c[4] + _a[7] + _a[4] + _a[0] + _c[7]]
        (i % _d[_a[10] + _b[13] + _b[4] + _a[1] + _c[7] + _a[14]]);
        e += Y(n)
    }
    return e
}

function Y(t) {
    var e = t[_c[7] + _a[13] + _a[20] + _b[15] + _a[2] + _b[2] + _c[15] + _c[19]](16);
    return e = e[_b[3] + _a[4] + _b[4] + _a[1] + _c[7] + _c[9]] < 2 ? _b[1] + e : e,
    e[_a[9] + _b[3] + _c[20] + _c[17] + _c[13]](-4)
}

function Z(t) {
    for (var e = "", i = 0; i < t.length; i++) {
        e += t[i] + ";";
    }
    return e = e.substring(0, e.length - 1), X(e);
}
""")


def getTolTime(tolTime):
    m, s = divmod(tolTime, 60)
    h, m = divmod(m, 60)
    tolTime = "%02d:%02d:%02d" % (h, m, s)
    return tolTime


def learningTimeRecord(tolStudyTime, watchPointPost):
    t = int(tolStudyTime / 5) + 2
    if watchPointPost is None or watchPointPost == "":
        e = "0,1,"
    else:
        e = watchPointPost + ","
    return t, e


def generateWatchPoint(videoSec):
    tolStudyTime = 0
    watchPointPost = ""
    for i in range(0, videoSec):
        if i % 2 == 0:
            t, e = learningTimeRecord(tolStudyTime, watchPointPost)
            watchPointPost = e + str(t)
        if i % 5 == 0:
            tolStudyTime += 5
    return watchPointPost


def submitData(k, studyTotalTime):
    resp = requests.post("https://studyservice.zhihuishu.com/learning/prelearningNote", {
        "ccCourseId": courseId,
        "chapterId": k["chapterId"],
        "isApply": 1,
        "lessonId": k["id"],
        "recruitId": recruitId,
        "videoId": k["videoId"],
        "uuid": uuid,
        "dateFormate": dateFormate
    }, headers=headers)
    learningTokenId = str(resp.json()["data"]["studiedLessonDto"]["id"])
    learningTokenId = base64.encodebytes(learningTokenId.encode("utf8")).decode()

    s = [recruitId, k["id"], 0, k["videoId"], k["chapterId"], "0", k["videoSec"] - studyTotalTime,
         k["videoSec"], getTolTime(k["videoSec"])]
    resp = requests.post("https://studyservice.zhihuishu.com/learning/saveDatabaseIntervalTime", {
        "watchPoint": generateWatchPoint(k["videoSec"]),
        "ev": ctx.call("Z", s),
        "learningTokenId": learningTokenId,
        "courseId": courseId,
        "uuid": uuid,
        "dateFormate": dateFormate
    }, headers=headers)
    return resp.json()


def submitData2(k, chapterId, studyTotalTime):
    resp = requests.post("https://studyservice.zhihuishu.com/learning/prelearningNote", {
        "ccCourseId": courseId,
        "chapterId": chapterId,
        "isApply": 1,
        "lessonId": k["lessonId"],
        "lessonVideoId": k["id"],
        "recruitId": recruitId,
        "videoId": k["videoId"],
        "uuid": uuid,
        "dateFormate": dateFormate
    }, headers=headers)
    learningTokenId = str(resp.json()["data"]["studiedLessonDto"]["id"])
    learningTokenId = base64.encodebytes(learningTokenId.encode("utf8")).decode()

    s = [recruitId, k["lessonId"], k["id"], k["videoId"], chapterId, "0", k["videoSec"] - studyTotalTime,
         k["videoSec"], getTolTime(k["videoSec"])]
    resp = requests.post("https://studyservice.zhihuishu.com/learning/saveDatabaseIntervalTime", {
        "watchPoint": generateWatchPoint(k["videoSec"]),
        "ev": ctx.call("Z", s),
        "learningTokenId": learningTokenId,
        "courseId": courseId,
        "uuid": uuid,
        "dateFormate": dateFormate
    }, headers=headers)
    return resp.json()


# 登录
resp = requests.get("https://studyservice.zhihuishu.com/login/getLoginUserInfo?dateFormate=" + str(dateFormate) + "000"
                    , headers=headers)
# 读取返回数据
data = resp.json()
if data["code"] == 200:
    # 赋值
    uuid = data["data"]["uuid"]
    print("             用户信息            ")
    print("realName：" + data["data"]["realName"])
    print("uuid：" + data["data"]["uuid"])
    print("username：" + data["data"]["username"])
    print("登录成功！")
    print()
else:
    print("登录失败！停止运行")
    quit()

recruitAndCourseId = input("请输入课程secret码：")
if recruitAndCourseId is None or recruitAndCourseId == "":
    print("获取失败！停止运行")
    quit()
else:
    # 获取当前课程所有视频信息
    resp = requests.post("https://studyservice.zhihuishu.com/learning/videolist", {
        "recruitAndCourseId": recruitAndCourseId,
        "uuid": uuid,
        "dateFormate": dateFormate
    }, headers=headers)
    if resp.json()["code"] == 0:
        content = resp.json()["data"]
        recruitId = content["recruitId"]
        courseId = content["courseId"]
        videoInformationList = content["videoChapterDtos"]
        print("获取课程所有视频信息成功！")
        print()
    else:
        print("获取课程所有视频信息失败！停止运行")
        quit()

print("正在查询当前视频学习情况。。。。")
data = {}
count = 0
for i in videoInformationList:
    videoLessons = i["videoLessons"]
    for j in videoLessons:
        data["lessonIds[" + str(count) + "]"] = j["id"]
        count += 1
count = 0
for i in videoInformationList:
    videoLessons = i["videoLessons"]
    for k in videoLessons:
        if len(k) == 9:
            videoSmallLessons = k["videoSmallLessons"]
            for l in videoSmallLessons:
                data["lessonVideoIds[" + str(count) + "]"] = l["id"]
                count += 1
data["recruitId"] = recruitId
data["uuid"] = uuid
data["dateFormate"] = dateFormate

resp = requests.post("https://studyservice.zhihuishu.com/learning/queryStuyInfo", data=data, headers=headers)
if resp.json()["code"] == 0:
    print("开始检测。。。。")
    print()
    queryStuyInfoData = resp.json()["data"]
    lessonList = {}
    lvList = {}
    if len(queryStuyInfoData) == 2:
        lessonList = resp.json()["data"]["lesson"]
        lvList = resp.json()["data"]["lv"]
    else:
        lessonList = resp.json()["data"]["lesson"]
    for m in videoInformationList:
        videoLessons = m["videoLessons"]
        for n in videoLessons:
            if len(n) == 10:
                lessonInformation = lessonList[str(n["id"])]
                print("视频名称：" + n["name"])
                print("视频总时长：" + getTolTime(n["videoSec"]))
                print("学习总时长：" + str(lessonInformation["studyTotalTime"]) + "s")
                if n["videoSec"] - lessonInformation["studyTotalTime"] < 50:
                    print("状态：已完成")
                    print()
                else:
                    print("状态：未完成")
                    choose = input("是否刷取该节视频？（Y/N）：")
                    if choose == "Y" or choose == "y":
                        result = submitData(n, lessonInformation["studyTotalTime"])
                        if result["code"] == 0:
                            if result["data"]["submitSuccess"]:
                                print("提交数据成功！")
                                print()
                            else:
                                print("提交数据失败！")
                        else:
                            print("请求失败！停止运行")
                    else:
                        print("刷取完成！停止运行")
                        quit()
            else:
                chapterId = n["chapterId"]
                videoSmallLessons = n["videoSmallLessons"]
                for p in videoSmallLessons:
                    lvInformation = lvList[str(p["id"])]
                    print("视频名称：" + p["name"])
                    print("视频总时长：" + getTolTime(p["videoSec"]))
                    print("学习总时长：" + str(lvInformation["studyTotalTime"]) + "s")
                    if p["videoSec"] - lvInformation["studyTotalTime"] < 50:
                        print("状态：已完成")
                        print()
                    else:
                        print("状态：未完成")
                        choose = input("是否刷取该节视频？（Y/N）：")
                        if choose == "Y" or choose == "y":
                            result = submitData2(p, chapterId, lvInformation["studyTotalTime"])
                            if result["code"] == 0:
                                if result["data"]["submitSuccess"]:
                                    print("提交数据成功！")
                                    print()
                                else:
                                    print("提交数据失败！")
                            else:
                                print("刷取完成！停止运行")
                                quit()
            time.sleep(1)
    print("全部刷取完成！感谢使用")

补充几点:
1.第一步需要先去播放界面打开开发者模式随便抓个包，拿到cookies里面SESSION
2.第二步需要复制当前课程的recruitAndCourseId，在播放界面的URL地址里面
图片放一下，好理解
在这里插入图片描述

总结

没啥总结的，就/learning/prelearningNote和/learning/videolist的参数需要你们自己去搞定，当然代码给出来了，可以自己看嗷。
别的不多说，直接跑路！
打扰了，告辞！！！
最新版本已在我GitHub更新

Code糕手

关注

5
点赞
踩
25

收藏

觉得还不错? 一键收藏
2
评论
Python爬虫js分析

文章目录前言一、思路分析1.watchPoint分析2.ev分析3.learningTokenId分析4.courseId分析5.uuid分析二、代码实现总结前言学校让去知到看网课，我寻思这不是折磨兄弟吗？当即我直接破防，然后决定搞一手，从问题根源来解决问题。提示：以下是本篇文章正文内容，下面案例可供参考一、思路分析我们先打开F12开发者模式，播放个视频抓下接口看看嗷。找到了很多接口，但是最有用的应该还是这个为什么我要说它最有用了？因为看接口名称我们就应该知道，这个接口和视频时长是相关
复制链接

扫一扫