【爬虫36例01】【已失效,订阅请谨慎】_signature参数分析

本文介绍了如何分析和解析PC端头条网页的_signature参数,包括遇到的问题、解决方法和完整sign.js文件的分析,涉及Python和JavaScript环境的设置与调试。
摘要由CSDN通过智能技术生成

【温馨提醒】

【环境】python 3.7, nodejs 12.18.2, windows 10 64位
【提醒】此文仅适用PC端头条web版本。不能用于tiktok、抖音!!订阅前请谨慎

【重点踩坑点】

① 【Cannot read property ‘sign’ of undefined】,sign未定义,下文有解决方式
在这里插入图片描述
② _signature加密传入的url需要添加二级域名 /toutiao,否则拿不到数据。
首页、用户页、详情页需要添加二级域名,其他接口不一定(比如文章发布接口就不需要加)

比如首页“推荐”栏目的url:
https://www.toutiao.com/api/pc/feed/?min_behot_time=0&category=__all__&utm_source=toutiao&widen=1&tadrequire=true
获取_signature时,要给sign.js传入加入二级域名的url
https://www.toutiao.com/toutiao/api/pc/feed/?min_behot_time=0&category=__all__&utm_source=toutiao&widen=1&tadrequire=true

③ python文件中的 useragent ,要与加密 sign.js 文件中 useragent 统一。
【如果不一致的话,生成的 _signature 拿不到数据】
【大家一定记得改自己的ua,js和python都要改】

url 分析

随便打开网页版一个界面,示例这里打开的是 “热点“ 分栏,https://www.toutiao.com/ch/news_hot/
我们向下滑动页面,不断加载出新的内容。
F12,打开 Network 的 XHR 标签,继续下滑头条网页,观察网页请求链接。
以下为三个示例链接,我们分析一下:

https://www.toutiao.com/api/pc/feed/?max_behot_time=1593976908&category=__all__&utm_source=toutiao&widen=1&tadrequire=true&_signature=_02B4Z6wo00d01zpHuZwAAIBCdwlrxaqqUH86Qr0AAJGcHAhZpQ3J5FlvtL7YPc7aHHkzMj8.4OCcbDzsZLdx9nyJFsORucCKvpjaNa7XZXlWKlGeT1Axyx3wjBwVHdSG-pNe9BUjC6ZDDLQ19d

https://www.toutiao.com/api/pc/feed/?max_behot_time=1593966482&category=__all__&utm_source=toutiao&widen=1&tadrequire=true&_signature=_02B4Z6wo00d013nPwKQAAIBCNIES.61rCDd5ysQAAIF5HAhZpQ3J5FlvtL7YPc7aHHkzMj8.4OCcbDzsZLdx9nyJFsORucCKvpjaNa7XZXlWKlGeT1Axyx3wjBwVHdSG-pNe9BUjC6ZDDLQ1f0

https://www.toutiao.com/api/pc/feed/?max_behot_time=1593958007&category=__all__&utm_source=toutiao&widen=1&tadrequire=true&_signature=_02B4Z6wo00d01cR.V4gAAIBAiTGF0ZJPBfXEelMAAC4gHAhZpQ3J5FlvtL7YPc7aHHkzMj8.4OCcbDzsZLdx9nyJFsORucCKvpjaNa7XZXlWKlGeT1Axyx3wjBwVHdSG-pNe9BUjC6ZDDLQ1e2

经过比较发现关键变量有:max_behot_time、_signature。

max_behot_time 分析

max_behot_time 的数值看似是时间戳,但是比较发现,并不是访问链接时的真实时间戳。
推断是由特定函数生成。
我们观察一下网页请求返回的 json 数据。发现除了返回的新闻内容之外,还有一个 next,包含 max_behot_time 的值。
在这里插入图片描述
通过比较发现,这个 next 中 max_behot_time 的值,正是页面下滑时,下一个请求 url 中 max_behot_time,充当 “页码”的作用。由于 next 的值可以直接获取,我们就不必分析其生成函数了。
【另外,第一次获取新闻列表时,没有 max_behot_time ,而是 min_behot_time = 0,请求成功后返回下一页的max_behot_time值】

_signature 分析

F12,打开 Sources 全局搜索 ”_signature“ 。
我们只看到1条结果。nice ~
格式化显示代码,定位到 ”_signature“ 位置,下图红色方框处。
在这里插入图片描述

在关键函数结尾行打断点,刷新页面。
点击 3 处的 “逐步执行” 按钮(因为头条会调用很多次加密函数,比如获取城市、获取天气等,我们要找到能获取到新闻文章的那次加密) 。
直到 4 处的 url 如下图所示。
此时 _signature 确实是我们要的数值,由变量 r 赋值。
变量 r 由函数 r = I (a, e) 生成,我们把鼠标放到 I (a, e),跳转到目标函数。(函数名可能不一致)
在这里插入图片描述
跳转到下图,我们在结尾打断点观察数值。
在这里插入图片描述
变量 c 就是 _signature 的值,由 window.byted_acrawler.sign(i) 生成,i 为待请求的url。
(注意url中间加了二级域名)
鼠标放在 a.sign 上,点击弹出的 f a(),跳转到目标函数。
在这里插入图片描述
跳转到这里,大概700行。

var glb;
(glb = "undefined" == typeof window ? global : window)._$jsvmprt = function(b, a, f) {
    function e() {
        if ("undefined" == typeof Reflect || !Reflect.construct)
            return !1;
        if (Reflect.construct.sham)
            return !1;
        if ("function" == typeof Proxy)
            return !0;
        try {
            return Date.prototype.toString.call(Reflect.construct(Date, [], (function() {}
            ))),
            !0
        } catch (b) {
            return !1
        }
    }
    
………………省略好几百行………………

function K(b, a, f, e, d, c, n, i) {
        var r, t;
        null == c && (c = this),
        d && !d.d && (d.d = 0,
        d.$0 = d,
        d[1] = {});
        var o = {}
          , l = o.d = d ? d.d + 1 : 0;
        for (o["$" + l] = o,
        t = 0; t < l; t++)
            o[r = "$" + t] = d[r];
        for (t = 0,
        l = o.length = e.length; t < l; t++)
            o[t] = e[t];
        return i && !B[a] && F(b, a, 2 * f),
        B[a] ? G(b, a, f, 0, o, c, null, 1)[1] : G(b, a, f, 0, o, c, null, 0)[1]
    }
}
,
(glb = "undefined" == typeof window ? global : window)._$jsvmprt(…………………………省略好多字符………………………………);

我们把上述代码保存为单独的文件,比如 sign.js。
在结尾加上两行代码测试一下输出:

sign = window.byted_acrawler.sign({url:""https://www.toutiao.com/toutiao/api/pc/feed/?min_behot_time=0&category=__all__&utm_source=toutiao&widen=1&tadrequire=true""});
console.log(sign);

我是在 pycharm 中安装了 node.js 插件,所以可以在 pycharm 中直接运行。

真正的麻烦刚刚开始

我们运行之后发现一系列报错,需要添加一系列参数,下面一步一步来。
上述代码运行时报错如下:
在这里插入图片描述
【window is not defined】,那么我们需要补一下浏览器环境:

window = global;

运行一下,报错如下:
在这里插入图片描述
【Cannot read property ‘referrer’ of undefined】,referrer未定义,用头条主页补一下:

window.document = {referrer: "https://www.toutiao.com/"};

运行一下,报错如下:
在这里插入图片描述
【Cannot read property ‘sign’ of undefined】,sign未定义:

重点来了

经过多次调试、踩坑发现,我们只需要把 window.byted_acrawler.sign 加密函数的最后一行中
(glb = “undefined” == typeof window ? global : window)._$jsvmprt 后面得数组中,第三个参数 【“undefined” != typeof exports ? exports : void 0 】改成 【void 0】
在这里插入图片描述
更改后
在这里插入图片描述
更改后执行,报错如下:
在这里插入图片描述
【Cannot read property ‘href’ of undefined】,href未定义,补一下:
那么 href 在哪里呢?我们打开页面,F12打开控制台,输入 “window.location” 后回车,可见下图:
在这里插入图片描述

我们在 window.location 中添加 href 即可,为了更安全,我们把 location 中其他参数也添加进去。

var glb;
window = global;
window.document = {referrer: "https://www.toutiao.com/"}
window.location = {
    hash: "",
    host: "www.toutiao.com",
    hostname: "www.toutiao.com",
    href: "https://www.toutiao.com",
    origin: "https://www.toutiao.com",
    pathname: "/",
    port: "",
    protocol: "https:",
    search: "",
}

…………这里是复制的window.byted_acrawler.sign代码…………

sign = window.byted_acrawler.sign({url:"https://www.toutiao.com/toutiao/api/pc/feed/?min_behot_time=0&category=__all__&utm_source=toutiao&widen=1&tadrequire=true"});
console.log(sign);

运行一下,报错如下:
在这里插入图片描述
【Cannot read property ‘userAgent’ of undefined】,userAgent 未定义,补一下:
打开页面,F12打开控制台,输入 “window.navigator” 后回车,可见下图:
在这里插入图片描述

我们在 window.navigator 中添加 userAgent 即可,为了更安全,我们把 navigator 中其他参数也添加进去。

var glb;
window = global;
window.document = {referrer: "https://www.toutiao.com/"}
window.location = {
    hash: "",
    host: "www.toutiao.com",
    hostname: "www.toutiao.com",
    href: "https://www.toutiao.com",
    origin: "https://www.toutiao.com",
    pathname: "/",
    port: "",
    protocol: "https:",
    search: "",
}
window.navigator={
    appCodeName: "Mozilla",
    appName: "Netscape",
    appVersion: "5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36",
    cookieEnabled: true,
    deviceMemory: 8,
    doNotTrack: null,
    hardwareConcurrency: 4,
    language: "zh-CN",
    languages: ["zh-CN", "zh"],
    maxTouchPoints: 0,
    onLine: true,
    platform: "Win32",
    product: "Gecko",
    productSub: "20030107",
    userAgent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36",
    vendor: "Google Inc.",
    vendorSub: "",
}

…………这里是复制的window.byted_acrawler.sign代码…………

sign = window.byted_acrawler.sign({url:"https://www.toutiao.com/toutiao/api/pc/feed/?min_behot_time=0&category=__all__&utm_source=toutiao&widen=1&tadrequire=true"});
console.log(sign);

我们运行一下,有结果了!!
在这里插入图片描述
但是,这只是 _signature 的一部分,还不是正常的长度。
多次踩坑才发现:真实网页是带 cookie 访问的,我们的模拟环境没有 cookie,接下来我们添加 cookie。

cookies = '…………your cookies…………'
for(let cookie of cookies.split(";")){
    tmp = cookie.split("=");
    _f3(tmp[0],tmp[1],1800)
}

运行一下,得到了完整的 _signature 值。
在这里插入图片描述
页面可以正常访问,也能获取到数据:
在这里插入图片描述

完整的 sign.js

var glb;
window = global;
window.document = {referrer: "https://www.toutiao.com/"}
window.location = {
    hash: "",
    host: "www.toutiao.com",
    hostname: "www.toutiao.com",
    href: "https://www.toutiao.com",
    origin: "https://www.toutiao.com",
    pathname: "/",
    port: "",
    protocol: "https:",
    search: "",
}
window.navigator={
    appCodeName: "Mozilla",
    appName: "Netscape",
    appVersion: "5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36",
    cookieEnabled: true,
    deviceMemory: 8,
    doNotTrack: null,
    hardwareConcurrency: 4,
    language: "zh-CN",
    languages: ["zh-CN", "zh"],
    maxTouchPoints: 0,
    onLine: true,
    platform: "Win32",
    product: "Gecko",
    productSub: "20030107",
    userAgent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36",
    vendor: "Google Inc.",
    vendorSub: "",
}

window._$jsvmprt = function(b, a, f) {
    function e() {
        if ("undefined" == typeof Reflect || !Reflect.construct)
            return !1;
        if (Reflect.construct.sham)
            return !1;
        if ("function" == typeof Proxy)
            return !0;
        try {
            return Date.prototype.toString.call(Reflect.construct(Date, [], (function() {}
            ))),
            !0
        } catch (b) {
            return !1
        }
    }
    function d(b, a, f) {
        return (d = e() ? Reflect.construct : function(b, a, f) {
            var e = [null];
            e.push.apply(e, a);
            var d = new (Function.bind.apply(b, e));
            return f && c(d, f.prototype),
            d
        }
        ).apply(null, arguments)
    }
    function c(b, a) {
        return (c = Object.setPrototypeOf || function(b, a) {
            return b.__proto__ = a,
            b
        }
        )(b, a)
    }
    function n(b) {
        return function(b) {
            if (Array.isArray(b)) {
                for (var a = 0, f = new Array(b.length); a < b.length; a++)
                    f[a] = b[a];
                return f
            }
        }(b) || function(b) {
            if (Symbol.iterator in Object(b) || "[object Arguments]" === Object.prototype.toString.call(b))
                return Array.from(b)
        }(b) || function() {
            throw new TypeError("Invalid attempt to spread non-iterable instance")
        }()
    }
    for (var i = [], r = 0, t = [], o = 0, l = function(b, a) {
        var f = b[a++]
          , e = b[a]
          , d = parseInt("" + f + e, 16);
        if (d >> 7 == 0)
            return [1, d];
        if (d >> 6 == 2) {
            var c = parseInt("" + b[++a] + b[++a], 16);
            return d &= 63,
            [2, c = (d <<= 8) + c]
        }
        if (d >> 6 == 3) {
            var n = parseInt("" + b[++a] + b[++a], 16)
              , i = parseInt("" + b[++a] + b[++a], 16);
            return d &= 63,
            [3, i = (d <<= 16) + (n <<= 8) + i]
        }
    }, u = function(b, a) {
        var f = parseInt("" + b[a] + b[a + 1], 16);
        return f = f > 127 ? -256 + f : f
    }, s = function(b, a) {
        var f = parseInt("" + b[a] + b[a + 1] + b[a + 2] + b[a + 3], 16);
        return f = f > 32767 ? -65536 + f : f
    }, p = function(b, a) {
        var f = parseInt("" + b[a] + b[a + 1] + b[a + 2] + b[a + 3] + b[a + 4] + b[a + 5] + b[a + 6] + b[a + 7], 16);
        return f = f > 2147483647 ? 0 + f : f
    }, y = function(b, a) {
        return parseInt("" + b[a] + b[a + 1], 16)
    }, v = function(b, a) {
        return parseInt("" + b[a] + b[a + 1] + b[a + 2] + b[a + 3], 16)
    }, g = g || this || window, h = Object.keys || function(b) {
        var a = {}
          , f = 0;
        for (var e in b)
            a[f++] = e;
        return a.length = f,
        a
    }
    , m = (b.length,
    0), I = "", C = m; C < m + 16; C++) {
        var q = "" + b[C++] + b[C];
        q = parseInt(q, 16),
        I += String.fromCharCode(q)
    }
    if ("HNOJ@?RC" != I)
        throw new Error("error magic number " + I);
    m += 16;
    parseInt("" + b[m] + b[m + 1], 16);
    m += 8,
    r = 0;
    for (var w = 0; w < 4; w++) {
        var S = m + 2 * w
          , R = "" + b[S++] + b[S]
          , x = parseInt(R, 16);
        r += (3 & x) << 2 * w
    }
    m += 16,
    m += 8;
    var z = parseInt("" + b[m] + b[m + 1] + b[m + 2] + b[m + 3] + b[m + 4] + b[m + 5] + b[m + 6] + b[m + 7], 16)
      , O = z
      , E = m += 8
      , j = v(b, m += z);
    j[1];
    m += 4,
    i = {
        p: [],
        q: []
    };
    for (var A = 0; A < j; A++) {
        for (var D = l(b, m), T = m += 2 * D[0], $ = i.p.length, P = 0; P < D[1]; P++) {
            var U = l(b, T);
            i.p.push(U[1]),
            T += 2 * U[0]
        }
        m = T,
        i.q.push([$, i.p.length])
    }
    var _ = {
        5: 1,
        6: 1,
        70: 1,
        22: 1,
        23: 1,
        37: 1,
        73: 1
    }
      , k = {
        72: 1
    }
      , M = {
        74: 1
    }
      , H = {
        11: 1,
        12: 1,
        24: 1,
        26: 1,
        27: 1,
        31: 1
    }
      , J = {
        10: 1
    }
      , N = {
        2: 1,
        29: 1,
        30: 1,
        20: 1
    }
      , B = []
      , W = [];
    function F(b, a, f) {
        for (var e = a; e < a + f; ) {
            var d = y(b, e);
            B[e] = d,
            e += 2;
            k[d] ? (W[e] = u(b, e),
            e += 2) : _[d] ? (W[e] = s(b, e),
            e += 4) : M[d] ? (W[e] = p(b, e),
            e += 8) : H[d] ? (W[e] = y(b, e),
            e += 2) : J[d] ? (W[e] = v(b, e),
            e += 4) : N[d] && (W[e] = v(b, e),
            e += 4)
        }
    }
    return K(b, E, O / 2, [], a, f);
    function G(b, a, f, e, c, l, m, I) {
        null == l && (l = this);
        var C, q, w, S = [], R = 0;
        m && (C = m);
        var x, z, O = a, E = O + 2 * f;
        if (!I)
            for (; O < E; ) {
                var j = parseInt("" + b[O] + b[O + 1], 16);
                O += 2;
                var A = 3 & (x = 13 * j % 241);
                if (x >>= 2,
                A > 2) {
                    A = 3 & x;
                    if (x >>= 2,
                    A > 2)
                        (A = x) < 2 ? (C = S[R--],
                        S[R] = S[R] < C) : A < 9 ? (z = y(b, O),
                        O += 2,
                        S[R] = S[R][z]) : A < 11 ? S[++R] = !0 : A < 13 ? (C = S[R--],
                        S[R] = S[R] >>> C) : A < 15 && (S[++R] = p(b, O),
                        O += 8);
                    else if (A > 1) {
                        (A = x) < 6 || (A < 8 ? C = S[R--] : A < 10 ? (C = S[R--],
                        S[R] = S[R] ^ C) : A < 12 && (z = s(b, O),
                        t[++o] = [[O + 4, z - 3], 0, 0],
                        O += 2 * z - 2))
                    } else if (A > 0) {
                        if ((A = x) > 7)
                            C = S[R--],
                            S[R] = S[R]in C;
                        else if (A > 5)
                            S[R] = ++S[R];
                        else if (A > 3)
                            z = y(b, O),
                            O += 2,
                            C = c[z],
                            S[++R] = C;
                        else if (A > 1) {
                            var D = 0
                              , T = S[R].length
                              , $ = S[R];
                            S[++R] = function() {
                                var b = D < T;
                                if (b) {
                                    var a = $[D++];
                                    S[++R] = a
                                }
                                S[++R] = b
                            }
                        }
                    } else {
                        if ((A = x) < 2) {
                            for (z = v(b, O),
                            A = "",
                            P = i.q[z][0]; P < i.q[z][1]; P++)
                                A += String.fromCharCode(r ^ i.p[P]);
                            A = +A,
                            O += 4,
                            S[++R] = A
                        } else
                            A < 4 ? (C = S[R--],
                            S[R] = S[R] - C) : A < 6 ? (C = S[R--],
                            S[R] = S[R] === C) : A < 15 && (C = S[R],
                            S[R] = S[R - 1],
                            S[R - 1] = C)
                    }
                } else if (A > 1) {
                    A = 3 & x;
                    if (x >>= 2,
                    A > 2)
                        (A = x) > 7 ? (C = S[R--],
                        S[R] = S[R] | C) : A > 5 ? (z = y(b, O),
                        O += 2,
                        S[++R] = c["$" + z]) : A > 3 && (z = s(b, O),
                        t[o][0] && !t[o][2] ? t[o][1] = [O + 4, z - 3] : t[o++] = [0, [O + 4, z - 3], 0],
                        O += 2 * z - 2);
                    else if (A > 1) {
                        if ((A = x) > 13)
                            S[++R] = !1;
                        else if (A > 6)
                            C = S[R--],
                            S[R] = S[R]instanceof C;
                        else if (A > 4)
                            C = S[R--],
                            S[R] = S[R] % C;
                        else if (A > 2)
                            if (S[R--])
                                O += 4;
                            else {
                                if ((z = s(b, O)) < 0) {
                                    I = 1,
                                    F(b, a, 2 * f),
                                    O += 2 * z - 2;
                                    break
                                }
                                O += 2 * z - 2
                            }
                        else if (A > 0) {
                            for (z = v(b, O),
                            C = "",
                            P = i.q[z][0]; P < i.q[z][1]; P++)
                                C += String.fromCharCode(r ^ i.p[P]);
                            S[++R] = C,
                            O += 4
                        }
                    } else if (A > 0) {
                        (A = x) < 1 ? S[++R] = g : A < 3 ? (C = S[R--],
                        S[R] = S[R] + C) : A < 5 ? (C = S[R--],
                        S[R] = S[R] == C) : A < 14 && (C = S[R - 1],
                        q = S[R],
                        S[++R] = C,
                        S[++R] = q)
                    } else {
                        (A = x) < 2 ? (C = S[R--],
                        S[R] = S[R] > C) : A < 9 ? (z = v(b, O),
                        O += 4,
                        q = R + 1,
                        S[R -= z - 1] = z ? S.slice(R, q) : []) : A < 11 ? (z = y(b, O),
                        O += 2,
                        C = S[R--],
                        c[z] = C) : A < 13 ? (C = S[R--],
                        S[R] = S[R] >> C) : A < 15 && (S[++R] = s(b, O),
                        O += 4)
                    }
                } else if (A > 0) {
                    A = 3 & x;
                    if (x >>= 2,
                    A > 2)
                        if ((A = x) < 1)
                            S[R] = !S[R];
                        else if (A < 3) {
                            if ((z = s(b, O)) < 0) {
                                I = 1,
                                F(b, a, 2 * f),
                                O += 2 * z - 2;
                                break
                            }
                            O += 2 * z - 2
                        } else
                            A < 5 ? (C = S[R--],
                            S[R] = S[R] / C) : A < 7 ? (C = S[R--],
                            S[R] = S[R] !== C) : A < 14 && (S[++R] = l);
                    else if (A > 1) {
                        (A = x) < 2 ? S[++R] = C : A < 4 ? (C = S[R--],
                        S[R] = S[R] <= C) : A < 11 ? (C = S[R -= 2][S[R + 1]] = S[R + 2],
                        R--) : A < 13 && (C = S[R],
                        S[++R] = C)
                    } else if (A > 0) {
                        if ((A = x) < 8)
                            q = S[R--],
                            C = delete S[R--][q];
                        else if (A < 10) {
                            for (z = v(b, O),
                            A = "",
                            P = i.q[z][0]; P < i.q[z][1]; P++)
                                A += String.fromCharCode(r ^ i.p[P]);
                            O += 4,
                            S[R] = S[R][A]
                            //console.log(S[R]);
                            // if (S[R]) {
                            //     S[R] = S[R][A]
                            // };
                            //console.log("1");
                        } else
                            A < 12 ? (C = S[R--],
                            S[R] = S[R] << C) : A < 14 && (S[++R] = u(b, O),
                            O += 2)
                    } else {
                        if ((A = x) < 5) {
                            z = s(b, O);
                            try {
                                if (t[o][2] = 1,
                                1 == (C = G(b, O + 4, z - 3, [], c, l, null, 0))[0])
                                    return C
                            } catch (m) {
                                if (t[o] && t[o][1] && 1 == (C = G(b, t[o][1][0], t[o][1][1], [], c, l, m, 0))[0])
                                    return C
                            } finally {
                                if (t[o] && t[o][0] && 1 == (C = G(b, t[o][0][0], t[o][0][1], [], c, l, null, 0))[0])
                                    return C;
                                t[o] = 0,
                                o--
                            }
                            O += 2 * z - 2
                        } else
                            A < 7 ? (z = y(b, O),
                            O += 2,
                            S[R -= z] = 0 === z ? new S[R] : d(S[R], n(S.slice(R + 1, R + z + 1)))) : A < 9 && (C = S[R--],
                            S[R] = S[R] & C)
                    }
                } else {
                    A = 3 & x;
                    if (x >>= 2,
                    A < 1) {
                        if ((A = x) < 1)
                            return [1, S[R--]];
                        if (A < 5)
                            C = S[R--],
                            S[R] = S[R] * C;
                        else if (A < 7)
                            C = S[R--],
                            S[R] = S[R] != C;
                        else if (A < 14)
                            q = S[R--],
                            w = S[R--],
                            (A = S[R--]).x === G ? A.y >= 1 ? S[++R] = K(b, A.c, A.l, q, A.z, w, null, 1) : (S[++R] = K(b, A.c, A.l, q, A.z, w, null, 0),
                            A.y++) : S[++R] = A.apply(w, q);
                        else if (A < 16) {
                            z = s(b, O),
                            (U = function a() {
                                var f = arguments;
                                return a.y > 0 ? K(b, a.c, a.l, f, a.z, this, null, 0) : (a.y++,
                                K(b, a.c, a.l, f, a.z, this, null, 0))
                            }
                            ).c = O + 4,
                            U.l = z - 2,
                            U.x = G,
                            U.y = 0,
                            U.z = c,
                            S[R] = U,
                            O += 2 * z - 2
                        }
                    } else if (A < 2) {
                        (A = x) < 4 ? (q = S[R--],
                        (A = S[R]).x === G ? A.y >= 1 ? S[R] = K(b, A.c, A.l, [q], A.z, w, null, 1) : (S[R] = K(b, A.c, A.l, [q], A.z, w, null, 0),
                        A.y++) : S[R] = A(q)) : A < 6 ? S[R -= 1] = S[R][S[R + 1]] : A < 8 ? S[R] = --S[R] : A < 10 && (C = S[R--],
                        S[R] = typeof C)
                    } else if (A < 3) {
                        if ((A = x) < 7)
                            S[R] = h(S[R]);
                        else if (A < 9) {
                            for (C = S[R--],
                            z = v(b, O),
                            A = "",
                            P = i.q[z][0]; P < i.q[z][1]; P++)
                                A += String.fromCharCode(r ^ i.p[P]);
                            O += 4,
                            S[R--][A] = C
                        } else if (A < 13)
                            throw S[R--]
                    } else {
                        (A = x) < 1 ? S[++R] = null : A < 3 ? (C = S[R--],
                        S[R] = S[R] >= C) : A < 12 && (S[++R] = void 0)
                    }
                }
            }
        if (I)
            for (; O < E; ) {
                j = B[O];
                O += 2;
                A = 3 & (x = 13 * j % 241);
                if (x >>= 2,
                A > 2) {
                    A = 3 & x;
                    if (x >>= 2,
                    A < 1) {
                        if ((A = x) > 13)
                            C = S[R],
                            S[R] = S[R - 1],
                            S[R - 1] = C;
                        else if (A > 4)
                            C = S[R--],
                            S[R] = S[R] === C;
                        else if (A > 2)
                            C = S[R--],
                            S[R] = S[R] - C;
                        else if (A > 0) {
                            for (z = W[O],
                            A = "",
                            P = i.q[z][0]; P < i.q[z][1]; P++)
                                A += String.fromCharCode(r ^ i.p[P]);
                            A = +A,
                            O += 4,
                            S[++R] = A
                        }
                    } else if (A < 2) {
                        if ((A = x) > 7)
                            C = S[R--],
                            S[R] = S[R]in C;
                        else if (A > 5)
                            S[R] = ++S[R];
                        else if (A > 3)
                            z = W[O],
                            O += 2,
                            C = c[z],
                            S[++R] = C;
                        else if (A > 1) {
                            D = 0,
                            T = S[R].length,
                            $ = S[R];
                            S[++R] = function() {
                                var b = D < T;
                                if (b) {
                                    var a = $[D++];
                                    S[++R] = a
                                }
                                S[++R] = b
                            }
                        }
                    } else if (A < 3) {
                        (A = x) > 10 ? (z = W[O],
                        t[++o] = [[O + 4, z - 3], 0, 0],
                        O += 2 * z - 2) : A > 8 ? (C = S[R--],
                        S[R] = S[R] ^ C) : A > 6 && (C = S[R--])
                    } else {
                        (A = x) < 2 ? (C = S[R--],
                        S[R] = S[R] < C) : A < 9 ? (z = W[O],
                        O += 2,
                        S[R] = S[R][z]) : A < 11 ? S[++R] = !0 : A < 13 ? (C = S[R--],
                        S[R] = S[R] >>> C) : A < 15 && (S[++R] = W[O],
                        O += 8)
                    }
                } else if (A > 1) {
                    A = 3 & x;
                    if (x >>= 2,
                    A > 2)
                        (A = x) < 5 ? (z = W[O],
                        t[o][0] && !t[o][2] ? t[o][1] = [O + 4, z - 3] : t[o++] = [0, [O + 4, z - 3], 0],
                        O += 2 * z - 2) : A < 7 ? (z = W[O],
                        O += 2,
                        S[++R] = c["$" + z]) : A < 9 && (C = S[R--],
                        S[R] = S[R] | C);
                    else if (A > 1) {
                        if ((A = x) > 13)
                            S[++R] = !1;
                        else if (A > 6)
                            C = S[R--],
                            S[R] = S[R]instanceof C;
                        else if (A > 4)
                            C = S[R--],
                            S[R] = S[R] % C;
                        else if (A > 2)
                            S[R--] ? O += 4 : O += 2 * (z = W[O]) - 2;
                        else if (A > 0) {
                            for (z = W[O],
                            C = "",
                            P = i.q[z][0]; P < i.q[z][1]; P++)
                                C += String.fromCharCode(r ^ i.p[P]);
                            S[++R] = C,
                            O += 4
                        }
                    } else if (A > 0) {
                        (A = x) < 1 ? S[++R] = g : A < 3 ? (C = S[R--],
                        S[R] = S[R] + C) : A < 5 ? (C = S[R--],
                        S[R] = S[R] == C) : A < 14 && (C = S[R - 1],
                        q = S[R],
                        S[++R] = C,
                        S[++R] = q)
                    } else {
                        (A = x) > 13 ? (S[++R] = W[O],
                        O += 4) : A > 11 ? (C = S[R--],
                        S[R] = S[R] >> C) : A > 9 ? (z = W[O],
                        O += 2,
                        C = S[R--],
                        c[z] = C) : A > 7 ? (z = W[O],
                        O += 4,
                        q = R + 1,
                        S[R -= z - 1] = z ? S.slice(R, q) : []) : A > 0 && (C = S[R--],
                        S[R] = S[R] > C)
                    }
                } else if (A > 0) {
                    A = 3 & x;
                    if (x >>= 2,
                    A < 1)
                        if ((A = x) < 5) {
                            z = W[O];
                            try {
                                if (t[o][2] = 1,
                                1 == (C = G(b, O + 4, z - 3, [], c, l, null, 0))[0])
                                    return C
                            } catch (m) {
                                if (t[o] && t[o][1] && 1 == (C = G(b, t[o][1][0], t[o][1][1], [], c, l, m, 0))[0])
                                    return C
                            } finally {
                                if (t[o] && t[o][0] && 1 == (C = G(b, t[o][0][0], t[o][0][1], [], c, l, null, 0))[0])
                                    return C;
                                t[o] = 0,
                                o--
                            }
                            O += 2 * z - 2
                        } else
                            A < 7 ? (z = W[O],
                            O += 2,
                            S[R -= z] = 0 === z ? new S[R] : d(S[R], n(S.slice(R + 1, R + z + 1)))) : A < 9 && (C = S[R--],
                            S[R] = S[R] & C);
                    else if (A < 2) {
                        if ((A = x) > 12)
                            S[++R] = W[O],
                            O += 2;
                        else if (A > 10)
                            C = S[R--],
                            S[R] = S[R] << C;
                        else if (A > 8) {
                            for (z = W[O],
                            A = "",
                            P = i.q[z][0]; P < i.q[z][1]; P++)
                                A += String.fromCharCode(r ^ i.p[P]);
                            O += 4,
                            S[R] = S[R][A]
                        } else
                            A > 6 && (q = S[R--],
                            C = delete S[R--][q])
                    } else if (A < 3) {
                        (A = x) < 2 ? S[++R] = C : A < 4 ? (C = S[R--],
                        S[R] = S[R] <= C) : A < 11 ? (C = S[R -= 2][S[R + 1]] = S[R + 2],
                        R--) : A < 13 && (C = S[R],
                        S[++R] = C)
                    } else {
                        (A = x) > 12 ? S[++R] = l : A > 5 ? (C = S[R--],
                        S[R] = S[R] !== C) : A > 3 ? (C = S[R--],
                        S[R] = S[R] / C) : A > 1 ? O += 2 * (z = W[O]) - 2 : A > -1 && (S[R] = !S[R])
                    }
                } else {
                    A = 3 & x;
                    if (x >>= 2,
                    A > 2)
                        (A = x) < 1 ? S[++R] = null : A < 3 ? (C = S[R--],
                        S[R] = S[R] >= C) : A < 12 && (S[++R] = void 0);
                    else if (A > 1) {
                        if ((A = x) < 7)
                            S[R] = h(S[R]);
                        else if (A < 9) {
                            for (C = S[R--],
                            z = W[O],
                            A = "",
                            P = i.q[z][0]; P < i.q[z][1]; P++)
                                A += String.fromCharCode(r ^ i.p[P]);
                            O += 4,
                            S[R--][A] = C
                        } else if (A < 13)
                            throw S[R--]
                    } else if (A > 0) {
                        (A = x) > 8 ? (C = S[R--],
                        S[R] = typeof C) : A > 6 ? S[R] = --S[R] : A > 4 ? S[R -= 1] = S[R][S[R + 1]] : A > 2 && (q = S[R--],
                        (A = S[R]).x === G ? A.y >= 1 ? S[R] = K(b, A.c, A.l, [q], A.z, w, null, 1) : (S[R] = K(b, A.c, A.l, [q], A.z, w, null, 0),
                        A.y++) : S[R] = A(q))
                    } else {
                        var U;
                        if ((A = x) > 14)
                            z = W[O],
                            (U = function a() {
                                var f = arguments;
                                return a.y > 0 ? K(b, a.c, a.l, f, a.z, this, null, 0) : (a.y++,
                                K(b, a.c, a.l, f, a.z, this, null, 0))
                            }
                            ).c = O + 4,
                            U.l = z - 2,
                            U.x = G,
                            U.y = 0,
                            U.z = c,
                            S[R] = U,
                            O += 2 * z - 2;
                        else if (A > 12)
                            q = S[R--],
                            w = S[R--],
                            (A = S[R--]).x === G ? A.y >= 1 ? S[++R] = K(b, A.c, A.l, q, A.z, w, null, 1) : (S[++R] = K(b, A.c, A.l, q, A.z, w, null, 0),
                            A.y++) : S[++R] = A.apply(w, q);
                        else if (A > 5)
                            C = S[R--],
                            S[R] = S[R] != C;
                        else if (A > 3)
                            C = S[R--],
                            S[R] = S[R] * C;
                        else if (A > -1)
                            return [1, S[R--]]
                    }
                }
            }
        return [0, null]
    }
    function K(b, a, f, e, d, c, n, i) {
        var r, t;
        null == c && (c = this),
        d && !d.d && (d.d = 0,
        d.$0 = d,
        d[1] = {});
        var o = {}
          , l = o.d = d ? d.d + 1 : 0;
        for (o["$" + l] = o,
        t = 0; t < l; t++)
            o[r = "$" + t] = d[r];
        for (t = 0,
        l = o.length = e.length; t < l; t++)
            o[t] = e[t];
        return i && !B[a] && F(b, a, 2 * f),
        B[a] ? G(b, a, f, 0, o, c, null, 1)[1] : G(b, a, f, 0, o, c, null, 0)[1]
    }
}
,
window._$jsvmprt("", [, , void 0, "undefined" != typeof module ? module : void 0, "undefined" != typeof define ? define : void 0, "undefined" != typeof Object ? Object : void 0, void 0, "undefined" != typeof TypeError ? TypeError : void 0, "undefined" != typeof document ? document : void 0, "undefined" != typeof InstallTrigger ? InstallTrigger : void 0, "undefined" != typeof safari ? safari : void 0, "undefined" != typeof Date ? Date : void 0, "undefined" != typeof Math ? Math : void 0, "undefined" != typeof navigator ? navigator : void 0, "undefined" != typeof location ? location : void 0, "undefined" != typeof history ? history : void 0, "undefined" != typeof Image ? Image : void 0, "undefined" != typeof console ? console : void 0, "undefined" != typeof PluginArray ? PluginArray : void 0, "undefined" != typeof indexedDB ? indexedDB : void 0, "undefined" != typeof DOMException ? DOMException : void 0, "undefined" != typeof parseInt ? parseInt : void 0, "undefined" != typeof String ? String : void 0, "undefined" != typeof Array ? Array : void 0, "undefined" != typeof Error ? Error : void 0, "undefined" != typeof JSON ? JSON : void 0, "undefined" != typeof Promise ? Promise : void 0, "undefined" != typeof WebSocket ? WebSocket : void 0, "undefined" != typeof eval ? eval : void 0, "undefined" != typeof setTimeout ? setTimeout : void 0, "undefined" != typeof encodeURIComponent ? encodeURIComponent : void 0, "undefined" != typeof encodeURI ? encodeURI : void 0, "undefined" != typeof Request ? Request : void 0, "undefined" != typeof Headers ? Headers : void 0, "undefined" != typeof decodeURIComponent ? decodeURIComponent : void 0, "undefined" != typeof RegExp ? RegExp : void 0]);

cookies = process.argv[3];
//cookies = 'MONITOR_WEB_ID=1dc79f28-d11f-4af3-8484-61b43adfeca3; ttwid=1%7CguGfII75IGd61Z_IfxMScfZpSWJcypmSJKoJ91YbdTc%7C1618888965%7C28fa374e120d75588fea3080529423fe90138f74ae626d88a7959062601a1521; tt_webid=6953075147956192776; csrftoken=3e66dec3e69746a815b8f1d013023f24; ttcid=c5ece7f03d4f4c25b27f8b008522569815; s_v_web_id=verify_knpgow0v_3p0v1kx1_elWP_4gdn_BpeS_2pdrAvkfSlDl; tt_scid=5PCJu8qWGB1F-wrfBlHc-4TTl0YJikHEHvSyHNCp2LAp43H2E9Jamboyq6ngo-QE18d6'
for(let cookie of cookies.split(";")){
    tmp = cookie.split("=");
    _f3(tmp[0],tmp[1],1800)
}

function _f3(e, t, o) {
    o && (window.sessionStorage && window.sessionStorage.setItem(e, t), window.localStorage && window.localStorage.setItem(e, t));
    var n = 31536e6;
    document.cookie = e + "=; expires=Mon, 20 Sep 1970 00:00:00 UTC; path=/;",
    document.cookie = e + "=" + t + "; expires=" + new Date((new Date).getTime() + n).toGMTString() + "; path=/;"
}


function get_detail(page_id){
    _signature = window.byted_acrawler.sign('',page_id)
    return _signature
}

function get_page(url){
    _signature = window.byted_acrawler.sign({url:url})
    return _signature
}

url = process.argv[2];
//url = 'https://www.toutiao.com/toutiao/api/pc/feed/?category=profile_all&utm_source=toutiao&visit_user_token=MS4wLjABAAAAiAce5qhH31TeuB3UdpFMV8u-uwy2LnoiqI10uZHqAt8&max_behot_time=1618887418542'
console.log(get_page(url))

toutiao_redian.py

“热点”频道为例

import hashlib, os, time, requests, random


def get_signature(url, cookies):
    sign = os.popen('node sign.js {url} {cookies}'.format(url='"' + url + '"', cookies='"' + cookies + '"')).read()
    return "&_signature=" + sign


def parse(max_behot_time=0):
    headers = {
        'authority': 'www.toutiao.com',
        'sec-ch-ua': 'Google',
        'accept': 'application/json, text/plain, */*',
        'sec-ch-ua-mobile': '?0',
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36',
        'sec-fetch-site': 'same-origin',
        'sec-fetch-mode': 'cors',
        'sec-fetch-dest': 'empty',
        'referer': 'https://www.toutiao.com/',
        'accept-language': 'zh-CN,zh;q=0.9',
        'cookie': '………your cookies………',
    }

    base_url = 'https://www.toutiao.com'

    if max_behot_time == 0:
        param = '/api/pc/feed/?min_behot_time=0&category=news_hot&utm_source=toutiao&widen=1&tadrequire=true'
    else:
        param = '/api/pc/feed/?max_behot_time={}&category=news_hot&utm_source=toutiao&widen=1&tadrequire=true'.format(max_behot_time)

    sign_url = base_url + "/toutiao" + param
    signature = get_signature(sign_url, headers["cookie"]).replace('\n', '')

    path = param + signature
    headers['path'] = path

    req_url = base_url + param + signature
    # print(req_url)
    response = requests.get(url=req_url, headers=headers)
    # print(response.text)
    
    for k in response.json().get('data'):
        k_dic = {}
        k_dic['source'] = k.get('source') # 发布账号
        k_dic['behot_time'] = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(int(k.get('behot_time'))))# 发布时间
        k_dic['title'] = k.get('title') # 文章标题
        k_dic['source_url'] = 'https://www.toutiao.com' + k.get('source_url') # 文章链接
        print(k_dic)
    next = response.json().get('next').get('max_behot_time')
    print("next:",next)
    if next != 0:
        time.sleep(random.randint(3,6))
        print('--------即将抓取下一页--------')
        parse(next)


if __name__ == '__main__':
    parse()
评论 72
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

彡千

赏杯咖啡

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值