Python爬虫js分析

文章目录

  • 前言
  • 一、思路分析
    • 1.watchPoint分析
    • 2.ev分析
    • 3.learningTokenId分析
    • 4.courseId分析
    • 5.uuid分析
  • 二、代码实现
  • 总结


前言

最近让去知到看网课,我寻思这不是折磨兄弟吗?当即我直接破防,然后决定搞一手,从问题根源来解决问题。


提示:以下是本篇文章正文内容,下面案例可供参考

一、思路分析

我们先打开F12开发者模式,播放个视频抓下接口看看嗷。
找到了很多接口,但是最有用的应该还是这个
在这里插入图片描述
为什么我要说它最有用了?
因为看接口名称我们就应该知道,这个接口和视频时长是相关的,做个测试大家就明白了。
我们打开一个视频,反复点播放和暂停就能看出来。
在这里插入图片描述
好了,现在我们找到了接口,下一步我们就来看一下它提交的参数有什么?
在这里插入图片描述
可以看到有6个参数,分别是:
watchPoint:
ev:
learingTokenId:
courseId:
uuid:
dateFormate:
我们现在就只知道最后一个参数是时间戳,其他的一个都不知道是啥,这咋办?
最好的方法就是直接复制一个参数名称到top文件夹里去搜索,我就用watchPoint测试。
在这里插入图片描述
我们点击去,再在js文件里面搜索watchPoint
在这里插入图片描述
我们直接找到了我们刚刚看的接口对应的函数,而且这里面就有着watchPoint、ev、learningTokenId、courseId四个参数了,而且还有个时间戳。
所以现在还就只有个uuid不知道是啥了。
我们先来看看这四个参数

1.watchPoint分析

我们继续在当前js文件里面搜索剩余的watchPoint,最后在一个叫learningTimeRecord的方法里面找到了watchPoint是如何生成的。

learningTimeRecord: function() {
                var t = parseInt(this.totalStudyTime / 5) + 2
                  , e = null == this.watchPointPost || "" == this.watchPointPost ? "0,1," : this.watchPointPost + ",";
                this.watchPointPost = e + t
            },

这里面出现了一个变量totalStudyTime ,我们在当前js文件里面搜索一下看看。
最后还是在totalStudyTimeFun方法中找到了这个变量

totalStudyTimeFun: function() {
                this.totalStudyTime += 5 * this.playRate,
                this.totalTimeFinish += 5 * this.playRate,
                this.playTimes += 5 * this.playRate,
                this.computeProgree()
            },

那这个变量到底是啥意思啊?其实在上面的图片中就有了,是当前学习总时间。而this.playRate又是啥了?是当前播放的倍速。我们可以直接默认为1。所以说this.totalStudyTime += 5,并且在最下面可以看到this.totalStudyTime初始是为0的。
我们现在再回到learningTimeRecord方法中
变量t 它是等于parseInt(this.totalStudyTime / 5) + 2;
变量e 它先做了一个判断当(null == this.watchPointPost || “” == this.watchPointPost)为true时,e就等于"0,1,"如果为false时 e就等于this.watchPointPost + “,”;
最后是this.watchPointPost = e + t。

那现在我们就有点迷糊了,就这啊?怎么感觉不对劲啊?我们继续在js文件里面查找,最后找到了个叫startTotalTimer的方法

startTotalTimer: function() {
                this.learningTimeRecordInterval = setInterval(this.learningTimeRecord, 1990),
                this.totalStudyTimeInterval = setInterval(this.totalStudyTimeFun, 4990),
                this.cacheInterval = setInterval(this.saveCacheIntervalTime, 18e4),
                this.databaseInterval = setInterval(this.saveDatabaseIntervalTime, 3e5)
            },

这一下就明了了嗷,这里有4个定时器,而里面有两个方法就是我们目前需要的。我们现在知道了每1990毫秒执行一次learningTimeRecord,每4990毫秒执行一次totalStudyTimeFun。我们用Python代码模拟一下看看

# 生成watchPoint
# videoSec为当前视频总长度
def generateWatchPoint(videoSec):
    # 初始化为0
    tolStudyTime = 0
    watchPointPost = ""
    # 因为要想一次性提交整个视频的时间的话
    # 那么我们用for循环来实现定时器,从0s开始,视频总长度结束
    for i in range(0, videoSec):
        # 取余来判断当前该执行哪一个
        if i % 2 == 0:
            t, e = learningTimeRecord(tolStudyTime, watchPointPost)
            watchPointPost = e + str(t)
        if i % 5 == 0:
            tolStudyTime += 5
    return watchPointPost


# 模拟learningTimeRecord
def learningTimeRecord(tolStudyTime, watchPointPost):
    t = int(tolStudyTime / 5) + 2
    if watchPointPost is None or watchPointPost == "":
        e = "0,1,"
    else:
        e = watchPointPost + ","
    return t, e

至此watchPoint参数我们算是搞定了嗷!!

2.ev分析

我们先把saveDatabaseIntervalTime整过来

saveDatabaseIntervalTime: function(t, e, n) {
                var o = this
                  , a = this.lessonId
                  , r = this.smallLessonId
                  , s = [this.recruitId, a, r, this.lastViewVideoId, this.videoDetail.chapterId, this.data.studyStatus, parseInt(this.playTimes), parseInt(this.totalStudyTime), i.i(p.g)(ablePlayerX("container").getPosition())]
                  , l = {
                    watchPoint: this.watchPointPost,
                    ev: this.D26666.Z(s),
                    learningTokenId: C.encode(this.preVideoInfo.studiedLessonDto.id),
                    courseId: this.courseId
                };
                console.log("提交进度时间:" + this.playTimes),
                console.log("观看总时间:" + this.totalStudyTime),
                m.a.saveDatabaseIntervalTime(l).then(function(i) {
                    o.playTimes = 0,
                    o.watchPointPost = "",
                    -10 == i.code ? (o.tipsDialog = !0,
                    o.tipsMsg = "同时播放多个视频,其他页面的学习进度将停止记录哦!",
                    o.tipsBtn = "我知道了") : 403 == i.code ? setTimeout(function() {
                        window.location.href = root + "/login/gologin?fromurl=" + encodeURIComponent(window.location.href)
                    }, 3e3) : 0 != i.code ? o.backDialog = !0 : o.saveDataFilish && t && (o.prelearningNote(t, e, n),
                    o.saveDataFilish = !1)
                })
            },

其实可以看到ev其实就是this.D26666.Z(s),而s是啥了?变量s就是上面的
s = [this.recruitId, a, r, this.lastViewVideoId, this.videoDetail.chapterId, this.data.studyStatus, parseInt(this.playTimes), parseInt(this.totalStudyTime), i.i(p.g)(ablePlayerX(“container”).getPosition())]
分析一下数据:
开头五个参数都在/learning/videolist接口返回的json数据中
this.recruitId: 对应recruitId
a(this.lessonId): 对应lessonId
r(this.smallLessonId): 对应id(这个值有些课程没有 所以需要自行适配)
this.lastViewVideoId: 对应videoId
this.videoDetail.chapterId: 对应chapterId
this.data.studyStatus: 固定值 “0”
parseInt(this.playTimes): 当前播放时长(上一次暂停到这一次暂停的时长)
parseInt(this.totalStudyTime): 该视频播放的总时长
i.i(p.g)(ablePlayerX(“container”).getPosition()): 当前视频总时间需要转换成hh:mm:ss形式
该视频总时长也在返回的json数据中 对应的数据是videoSec

我们s搞清楚了,再来看看this.D26666.Z这个方法。这个方法直接搜索是找不到的,所以我建议直接打断点,然后直接定位过去

, , function(t, e, i) {
    "use strict";
    var n = {
        _a: "AgrcepndtslzyohCia0uS@",
        _b: "A0ilndhga@usreztoSCpyc",
        _c: "d0@yorAtlhzSCeunpcagis",
        _d: "zzpttjd",
        X: function(t) {
            for (var e = "", i = 0; i < t[this._c[8] + this._a[4] + this._c[15] + this._a[1] + this._a[8] + this._b[6]]; i++) {
                var n = t[this._a[3] + this._a[14] + this._c[18] + this._a[2] + this._b[18] + this._b[16] + this._c[0] + this._a[4] + this._b[0] + this._b[15]](i) ^ this._d[this._b[21] + this._b[6] + this._a[17] + this._c[5] + this._b[18] + this._c[4] + this._a[7] + this._a[4] + this._a[0] + this._c[7]](i % this._d[this._a[10] + this._b[13] + this._b[4] + this._a[1] + this._c[7] + this._a[14]]);
                e += this.Y(n)
            }
            return e
        },
        Y: function(t) {
            var e = t[this._c[7] + this._a[13] + this._a[20] + this._b[15] + this._a[2] + this._b[2] + this._c[15] + this._c[19]](16);
            return e = e[this._b[3] + this._a[4] + this._b[4] + this._a[1] + this._c[7] + this._c[9]] < 2 ? this._b[1] + e : e,
            e[this._a[9] + this._b[3] + this._c[20] + this._c[17] + this._c[13]](-4)
        },
        Z: function(t) {
            for (var e = "", i = 0; i < t.length; i++)
                e += t[i] + ";";
            return e = e.substring(0, e.length - 1),
            this.X(e)
        }
    };
    e.a = n
}

这不就找到了吗?我们这个就不模拟了,直接用Python的execjs去执行js代码就行。

3.learningTokenId分析

learningTokenId 是在/learning/prelearningNote接口中返回的id,且用Base64编码。

4.courseId分析

这个就很简单了就是上面watchPoint里面的courseId

5.uuid分析

uuid就在/login/getLoginUserInfo接口中,我们直接取出来就行了

二、代码实现

import execjs
import requests
import time
import base64

# recruitId
recruitId = ""
# courseId
courseId = ""
# 存储当前课程所有视频信息
videoInformationList = []
# 时间戳 毫秒级
t = time.time()
dateFormate = int(round(t * 1000) / 1000)
# 课程secret码
recruitAndCourseId = ""
# session
SESSION = input("请输入session登录:")
# 请求头
headers = {
    "Cookie": "SESSION=" + SESSION + ";"
}
# 用户id
uuid = ""
# 判断类型
typeNumbers = 0


ctx = execjs.compile("""
var _a = "AgrcepndtslzyohCia0uS@",
_b = "A0ilndhga@usreztoSCpyc",
_c = "d0@yorAtlhzSCeunpcagis",
_d = "zzpttjd";

function X(t) {
    for (var e = "", i = 0; i < t[_c[8] + _a[4] + _c[15] + _a[1] + _a[8] + _b[6]]; i++) {
        var n = t[_a[3] + _a[14] + _c[18] + _a[2] + _b[18] + _b[16] + _c[0] + _a[4] + _b[0] + _b[15]](i) ^ 
        _d[_b[21] + _b[6] + _a[17] + _c[5] + _b[18] + _c[4] + _a[7] + _a[4] + _a[0] + _c[7]]
        (i % _d[_a[10] + _b[13] + _b[4] + _a[1] + _c[7] + _a[14]]);
        e += Y(n)
    }
    return e
}

function Y(t) {
    var e = t[_c[7] + _a[13] + _a[20] + _b[15] + _a[2] + _b[2] + _c[15] + _c[19]](16);
    return e = e[_b[3] + _a[4] + _b[4] + _a[1] + _c[7] + _c[9]] < 2 ? _b[1] + e : e,
    e[_a[9] + _b[3] + _c[20] + _c[17] + _c[13]](-4)
}

function Z(t) {
    for (var e = "", i = 0; i < t.length; i++) {
        e += t[i] + ";";
    }
    return e = e.substring(0, e.length - 1), X(e);
}
""")


def getTolTime(tolTime):
    m, s = divmod(tolTime, 60)
    h, m = divmod(m, 60)
    tolTime = "%02d:%02d:%02d" % (h, m, s)
    return tolTime


def learningTimeRecord(tolStudyTime, watchPointPost):
    t = int(tolStudyTime / 5) + 2
    if watchPointPost is None or watchPointPost == "":
        e = "0,1,"
    else:
        e = watchPointPost + ","
    return t, e


def generateWatchPoint(videoSec):
    tolStudyTime = 0
    watchPointPost = ""
    for i in range(0, videoSec):
        if i % 2 == 0:
            t, e = learningTimeRecord(tolStudyTime, watchPointPost)
            watchPointPost = e + str(t)
        if i % 5 == 0:
            tolStudyTime += 5
    return watchPointPost


def submitData(k, studyTotalTime):
    resp = requests.post("https://studyservice.zhihuishu.com/learning/prelearningNote", {
        "ccCourseId": courseId,
        "chapterId": k["chapterId"],
        "isApply": 1,
        "lessonId": k["id"],
        "recruitId": recruitId,
        "videoId": k["videoId"],
        "uuid": uuid,
        "dateFormate": dateFormate
    }, headers=headers)
    learningTokenId = str(resp.json()["data"]["studiedLessonDto"]["id"])
    learningTokenId = base64.encodebytes(learningTokenId.encode("utf8")).decode()

    s = [recruitId, k["id"], 0, k["videoId"], k["chapterId"], "0", k["videoSec"] - studyTotalTime,
         k["videoSec"], getTolTime(k["videoSec"])]
    resp = requests.post("https://studyservice.zhihuishu.com/learning/saveDatabaseIntervalTime", {
        "watchPoint": generateWatchPoint(k["videoSec"]),
        "ev": ctx.call("Z", s),
        "learningTokenId": learningTokenId,
        "courseId": courseId,
        "uuid": uuid,
        "dateFormate": dateFormate
    }, headers=headers)
    return resp.json()


def submitData2(k, chapterId, studyTotalTime):
    resp = requests.post("https://studyservice.zhihuishu.com/learning/prelearningNote", {
        "ccCourseId": courseId,
        "chapterId": chapterId,
        "isApply": 1,
        "lessonId": k["lessonId"],
        "lessonVideoId": k["id"],
        "recruitId": recruitId,
        "videoId": k["videoId"],
        "uuid": uuid,
        "dateFormate": dateFormate
    }, headers=headers)
    learningTokenId = str(resp.json()["data"]["studiedLessonDto"]["id"])
    learningTokenId = base64.encodebytes(learningTokenId.encode("utf8")).decode()

    s = [recruitId, k["lessonId"], k["id"], k["videoId"], chapterId, "0", k["videoSec"] - studyTotalTime,
         k["videoSec"], getTolTime(k["videoSec"])]
    resp = requests.post("https://studyservice.zhihuishu.com/learning/saveDatabaseIntervalTime", {
        "watchPoint": generateWatchPoint(k["videoSec"]),
        "ev": ctx.call("Z", s),
        "learningTokenId": learningTokenId,
        "courseId": courseId,
        "uuid": uuid,
        "dateFormate": dateFormate
    }, headers=headers)
    return resp.json()


# 登录
resp = requests.get("https://studyservice.zhihuishu.com/login/getLoginUserInfo?dateFormate=" + str(dateFormate) + "000"
                    , headers=headers)
# 读取返回数据
data = resp.json()
if data["code"] == 200:
    # 赋值
    uuid = data["data"]["uuid"]
    print("             用户信息            ")
    print("realName:" + data["data"]["realName"])
    print("uuid:" + data["data"]["uuid"])
    print("username:" + data["data"]["username"])
    print("登录成功!")
    print()
else:
    print("登录失败!停止运行")
    quit()

recruitAndCourseId = input("请输入课程secret码:")
if recruitAndCourseId is None or recruitAndCourseId == "":
    print("获取失败!停止运行")
    quit()
else:
    # 获取当前课程所有视频信息
    resp = requests.post("https://studyservice.zhihuishu.com/learning/videolist", {
        "recruitAndCourseId": recruitAndCourseId,
        "uuid": uuid,
        "dateFormate": dateFormate
    }, headers=headers)
    if resp.json()["code"] == 0:
        content = resp.json()["data"]
        recruitId = content["recruitId"]
        courseId = content["courseId"]
        videoInformationList = content["videoChapterDtos"]
        print("获取课程所有视频信息成功!")
        print()
    else:
        print("获取课程所有视频信息失败!停止运行")
        quit()

print("正在查询当前视频学习情况。。。。")
data = {}
count = 0
for i in videoInformationList:
    videoLessons = i["videoLessons"]
    for j in videoLessons:
        data["lessonIds[" + str(count) + "]"] = j["id"]
        count += 1
count = 0
for i in videoInformationList:
    videoLessons = i["videoLessons"]
    for k in videoLessons:
        if len(k) == 9:
            videoSmallLessons = k["videoSmallLessons"]
            for l in videoSmallLessons:
                data["lessonVideoIds[" + str(count) + "]"] = l["id"]
                count += 1
data["recruitId"] = recruitId
data["uuid"] = uuid
data["dateFormate"] = dateFormate

resp = requests.post("https://studyservice.zhihuishu.com/learning/queryStuyInfo", data=data, headers=headers)
if resp.json()["code"] == 0:
    print("开始检测。。。。")
    print()
    queryStuyInfoData = resp.json()["data"]
    lessonList = {}
    lvList = {}
    if len(queryStuyInfoData) == 2:
        lessonList = resp.json()["data"]["lesson"]
        lvList = resp.json()["data"]["lv"]
    else:
        lessonList = resp.json()["data"]["lesson"]
    for m in videoInformationList:
        videoLessons = m["videoLessons"]
        for n in videoLessons:
            if len(n) == 10:
                lessonInformation = lessonList[str(n["id"])]
                print("视频名称:" + n["name"])
                print("视频总时长:" + getTolTime(n["videoSec"]))
                print("学习总时长:" + str(lessonInformation["studyTotalTime"]) + "s")
                if n["videoSec"] - lessonInformation["studyTotalTime"] < 50:
                    print("状态:已完成")
                    print()
                else:
                    print("状态:未完成")
                    choose = input("是否刷取该节视频?(Y/N):")
                    if choose == "Y" or choose == "y":
                        result = submitData(n, lessonInformation["studyTotalTime"])
                        if result["code"] == 0:
                            if result["data"]["submitSuccess"]:
                                print("提交数据成功!")
                                print()
                            else:
                                print("提交数据失败!")
                        else:
                            print("请求失败!停止运行")
                    else:
                        print("刷取完成!停止运行")
                        quit()
            else:
                chapterId = n["chapterId"]
                videoSmallLessons = n["videoSmallLessons"]
                for p in videoSmallLessons:
                    lvInformation = lvList[str(p["id"])]
                    print("视频名称:" + p["name"])
                    print("视频总时长:" + getTolTime(p["videoSec"]))
                    print("学习总时长:" + str(lvInformation["studyTotalTime"]) + "s")
                    if p["videoSec"] - lvInformation["studyTotalTime"] < 50:
                        print("状态:已完成")
                        print()
                    else:
                        print("状态:未完成")
                        choose = input("是否刷取该节视频?(Y/N):")
                        if choose == "Y" or choose == "y":
                            result = submitData2(p, chapterId, lvInformation["studyTotalTime"])
                            if result["code"] == 0:
                                if result["data"]["submitSuccess"]:
                                    print("提交数据成功!")
                                    print()
                                else:
                                    print("提交数据失败!")
                            else:
                                print("刷取完成!停止运行")
                                quit()
            time.sleep(1)
    print("全部刷取完成!感谢使用")

补充几点:
1.第一步需要先去播放界面打开开发者模式随便抓个包,拿到cookies里面SESSION
2.第二步需要复制当前课程的recruitAndCourseId,在播放界面的URL地址里面
图片放一下,好理解
在这里插入图片描述
在这里插入图片描述


总结

没啥总结的,就/learning/prelearningNote和/learning/videolist的参数需要你们自己去搞定,当然代码给出来了,可以自己看嗷。
别的不多说,直接跑路!
打扰了,告辞!!!
最新版本已在我GitHub更新

  • 5
    点赞
  • 25
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值