Python爬虫大片之逆向爬取某度翻译

前期准备:pip install PyExecJS安装,逆向会涉及到js代码,所以要下载node.js,execjs库需要依赖于node.js,因此需要先安装node.js,然后在安装PyExecJS。

1.安装PyExecJS库教程(PyCharm)

第一步打开PyCharm(指定一个文件夹,可以把指定的文件夹拖到PyCharm应用上):

第二步输入:pip install PyExecJS

注意事项:如果下载很慢的话,Ctrl + C 退出,然后可以:

1.使用清华源下载(临时使用):

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple some-package

2.如果后面下载Python包时,不想那么慢,可以更改默认配置:

升级 pip 到最新的版本 (>=10.0.0) 后进行配置:

python -m pip install --upgrade pip
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple

如果您到 pip 默认源的网络连接较差,临时使用本镜像站来升级 pip:

python -m pip install -i https://pypi.tuna.tsinghua.edu.cn/simple --upgrade pip

以上2个操作皆在Terminal中进行。

2.安装node.js(https://nodejs.org/en

2.1 点击网址后将进入一下画面:

2.2 下载完以后,点击它,傻瓜式安装

3.某度翻译网页分析

第一步:搜索某度翻译:

第二步:F12或者是鼠标右击,点击检查,点击网络(notework)

第三步:找接口

如果出现空白不要慌:

刷新以后,会出现多个接口:


4.对找到的接口分析

第一步:为了保证找到的正确的,我们在换其它的词进行翻译

点击以后

第二步:看接口对应的链接是否含有其它看不懂或者是没有规律的

第三步:搜索对应的sign对应的js文件

然后搜Ctrl + f 输入paramData进行搜索(光标要定在这个js文件中)

5.对找到sign对应的js分析其中的算法

第一步找到对应的算法:

第二步如何找到函数的全部代码:

第三步把对应的代码建立在PyCharm中:

第四步运行js代码:

第五步可以去网页上看r的值:

可看出r是一个常数:320305.131321201,接着在对应的js中声明变量var r= 320305.131321201

接着运行:

又去看网页上n的值:

接着运行:

终于找到了sign

5.写爬虫代码

第一步:获取对应的东西

第二步:然后把复制的放在这个网站(https://www.spidertools.cn/#/curl2Request)里生成对应的爬虫代码

6.全部代码

Python代码:

import  execjs
import requests
with open("baidu.js",'r')as f :
    dast = f.read()
    d = execjs.compile(dast)
pd = input("请你输入想要翻译的内容:")
import json
cookies = {
    'BIDUPSID': 'D53F07CEEF689C2FFBEF843AEA90DE40',
    'PSTM': '1682315982',
    'BAIDUID': 'D53543225F2614EECC29693E00C52712:FG=1',
    'APPGUIDE_10_6_6': '1',
    'REALTIME_TRANS_SWITCH': '1',
    'FANYI_WORD_SWITCH': '1',
    'HISTORY_SWITCH': '1',
    'SOUND_SPD_SWITCH': '1',
    'SOUND_PREFER_SWITCH': '1',
    'Hm_lvt_64ecd82404c51e03dc91cb9e8c025574': '1697525220,1698133480,1698732983',
    'BAIDUID_BFESS': 'D53543225F2614EECC29693E00C52712:FG=1',
    'delPer': '0',
    'BA_HECTOR': 'a1a4ag0l8l0l00a4248404ak1ik1btb1r',
    'ZFY': 's6rwTHvs:BfIfkeDPYeayMegd2h6WSslDBkkQaEwBQBA:C',
    'RT': '"z=1&dm=baidu.com&si=f133dd20-163c-41a2-b854-6e864df9dd1b&ss=loe0yt8k&sl=6&tt=1q9w&bcn=https%3A%2F%2Ffclog.baidu.com%2Flog%2Fweirwood%3Ftype%3Dperf&ld=42mg&ul=952h&hd=955r"',
    'BDRCVFR[L29SIzvbcb0]': '0GUpX2T7xvfP1T3nhwCQhPEUf',
    'PSINO': '6',
    'BDRCVFR[fb3VbsUruOn]': 'H964g_Y6U4Rfj6dnjTsnWnkg1nzgv99',
    'H_PS_PSSID': '39530_39523_39522_39497_39467_26350_39564_22158',
    'BDORZ': 'B490B5EBF6F3CD402E515D22BCDA1598',
    'Hm_lpvt_64ecd82404c51e03dc91cb9e8c025574': '1698740789',
    'ab_sr': '1.0.1_YThmZjM3NTZiYzU2MzYzMzQyODAwYWJkN2JmYjgyZTQ1NjQ5YjU4OTIwNTc2MDIwMDU1OWRjNGFjZmIzMTg3YWNmMzY2MjU5MTQyNjJmYWZhNjY3ZDA4MWI4YmVlZTkxNDNmZDg1NzA5YjgyYzFlOTlhOWMwMGYwNjI2NmJiNGI0MWM1ZWIzMmUzMTYzZjNmNmU3YmM1Njk1NzY4M2Q1Yw==',
}

headers = {
    'Accept': '*/*',
    'Accept-Language': 'zh-CN,zh;q=0.9',
    'Acs-Token': '1698740789190_1698740805365_eT9z0e3gC0pCVTs0Fww8t9+oa5AAsWmy2PG2bIcEvEfTRHDq2KGE0tzycFFdxQuuy3JyRXZM1GFMr1hLmGT5eDttyMiqbojD4FaD1+dgYH48IK8nJuOI/Vo3HHfhmJyUXaeAptGfbmE+p+pl4YyQL49GUrvL8tbtwuTEYYXC36p5yEd46EDCI4R6YDuD0okRe0NY/ibbZUBQvPeDP8JOU5lBVdtHB+ez/lhnToBr6HB9YS0YNFNlAfHNcPkSNjeVlVSnpJkkk8yrt3VruuWKpyzX4HUyGGJXZ5A7dUDwTKytSDJko+L0epdDoiO/6vDBfxesEV+HsgK13OpTYIO2Qtioo8gZhZIzuV8Da+WK/cAKlViRsvB90IUs8JQ/2pjNgkCbSemuaBqV/hod7RDwLpJlQKjO/y0pGmaZESKyzsijr95S6OogC+EPLr8F8op2PZ/XkiJSOBGeGZhVwYkZqwIUuLP8jxSVVvVDeHMBKuUG4mTAqRXnKptZ7CoMfK4h',
    'Connection': 'keep-alive',
    'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
    # 'Cookie': 'BIDUPSID=D53F07CEEF689C2FFBEF843AEA90DE40; PSTM=1682315982; BAIDUID=D53543225F2614EECC29693E00C52712:FG=1; APPGUIDE_10_6_6=1; REALTIME_TRANS_SWITCH=1; FANYI_WORD_SWITCH=1; HISTORY_SWITCH=1; SOUND_SPD_SWITCH=1; SOUND_PREFER_SWITCH=1; Hm_lvt_64ecd82404c51e03dc91cb9e8c025574=1697525220,1698133480,1698732983; BAIDUID_BFESS=D53543225F2614EECC29693E00C52712:FG=1; delPer=0; BA_HECTOR=a1a4ag0l8l0l00a4248404ak1ik1btb1r; ZFY=s6rwTHvs:BfIfkeDPYeayMegd2h6WSslDBkkQaEwBQBA:C; RT="z=1&dm=baidu.com&si=f133dd20-163c-41a2-b854-6e864df9dd1b&ss=loe0yt8k&sl=6&tt=1q9w&bcn=https%3A%2F%2Ffclog.baidu.com%2Flog%2Fweirwood%3Ftype%3Dperf&ld=42mg&ul=952h&hd=955r"; BDRCVFR[L29SIzvbcb0]=0GUpX2T7xvfP1T3nhwCQhPEUf; PSINO=6; BDRCVFR[fb3VbsUruOn]=H964g_Y6U4Rfj6dnjTsnWnkg1nzgv99; H_PS_PSSID=39530_39523_39522_39497_39467_26350_39564_22158; BDORZ=B490B5EBF6F3CD402E515D22BCDA1598; Hm_lpvt_64ecd82404c51e03dc91cb9e8c025574=1698740789; ab_sr=1.0.1_YThmZjM3NTZiYzU2MzYzMzQyODAwYWJkN2JmYjgyZTQ1NjQ5YjU4OTIwNTc2MDIwMDU1OWRjNGFjZmIzMTg3YWNmMzY2MjU5MTQyNjJmYWZhNjY3ZDA4MWI4YmVlZTkxNDNmZDg1NzA5YjgyYzFlOTlhOWMwMGYwNjI2NmJiNGI0MWM1ZWIzMmUzMTYzZjNmNmU3YmM1Njk1NzY4M2Q1Yw==',
    'Origin': 'https://fanyi.baidu.com',
    'Referer': 'https://fanyi.baidu.com/translate?aldtype=16047&query=%E7%9A%84&keyfrom=baidu&smartresult=dict&lang=auto2zh',
    'Sec-Fetch-Dest': 'empty',
    'Sec-Fetch-Mode': 'cors',
    'Sec-Fetch-Site': 'same-origin',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36',
    'X-Requested-With': 'XMLHttpRequest',
    'sec-ch-ua': '"Chromium";v="118", "Google Chrome";v="118", "Not=A?Brand";v="99"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': '"Windows"',
}

params = {
    'from': 'zh',
    'to': 'en',
}

data = {
    'from': 'zh',
    'to': 'en',
    'query': pd,
    'transtype': 'enter',
    'simple_means_flag': '3',
    'sign': d.call("js",pd),
    'token': '4c31967aaeec4f6c9c8208d5f3baf9fa',
    'domain': 'common',
    'ts': '1698740805348',
}

response = requests.post('https://fanyi.baidu.com/v2transapi', params=params, cookies=cookies, headers=headers, data=data)
dataJson = json.loads(response.text)
print(dataJson["trans_result"]["data"][0]['dst'])


js代码:

function n(t, e) {
            for (var n = 0; n < e.length - 2; n += 3) {
                var r = e.charAt(n + 2);
                r = "a" <= r ? r.charCodeAt(0) - 87 : Number(r),
                r = "+" === e.charAt(n + 1) ? t >>> r : t << r,
                t = "+" === e.charAt(n) ? t + r & 4294967295 : t ^ r
            }
            return t
        }
var r = '320305.131321201'
var js = function(t) {
            var o, i = t.match(/[\uD800-\uDBFF][\uDC00-\uDFFF]/g);
            if (null === i) {
                var a = t.length;
                a > 30 && (t = "".concat(t.substr(0, 10)).concat(t.substr(Math.floor(a / 2) - 5, 10)).concat(t.substr(-10, 10)))
            } else {
                for (var s = t.split(/[\uD800-\uDBFF][\uDC00-\uDFFF]/), c = 0, u = s.length, l = []; c < u; c++)
                    "" !== s[c] && l.push.apply(l, function(t) {
                        if (Array.isArray(t))
                            return e(t)
                    }(o = s[c].split("")) || function(t) {
                        if ("undefined" != typeof Symbol && null != t[Symbol.iterator] || null != t["@@iterator"])
                            return Array.from(t)
                    }(o) || function(t, n) {
                        if (t) {
                            if ("string" == typeof t)
                                return e(t, n);
                            var r = Object.prototype.toString.call(t).slice(8, -1);
                            return "Object" === r && t.constructor && (r = t.constructor.name),
                            "Map" === r || "Set" === r ? Array.from(t) : "Arguments" === r || /^(?:Ui|I)nt(?:8|16|32)(?:Clamped)?Array$/.test(r) ? e(t, n) : void 0
                        }
                    }(o) || function() {
                        throw new TypeError("Invalid attempt to spread non-iterable instance.\nIn order to be iterable, non-array objects must have a [Symbol.iterator]() method.")
                    }()),
                    c !== u - 1 && l.push(i[c]);
                var p = l.length;
                p > 30 && (t = l.slice(0, 10).join("") + l.slice(Math.floor(p / 2) - 5, Math.floor(p / 2) + 5).join("") + l.slice(-10).join(""))
            }
            for (var d = "".concat(String.fromCharCode(103)).concat(String.fromCharCode(116)).concat(String.fromCharCode(107)), h = (null !== r ? r : (r = window[d] || "") || "").split("."), f = Number(h[0]) || 0, m = Number(h[1]) || 0, g = [], y = 0, v = 0; v < t.length; v++) {
                var _ = t.charCodeAt(v);
                _ < 128 ? g[y++] = _ : (_ < 2048 ? g[y++] = _ >> 6 | 192 : (55296 == (64512 & _) && v + 1 < t.length && 56320 == (64512 & t.charCodeAt(v + 1)) ? (_ = 65536 + ((1023 & _) << 10) + (1023 & t.charCodeAt(++v)),
                g[y++] = _ >> 18 | 240,
                g[y++] = _ >> 12 & 63 | 128) : g[y++] = _ >> 12 | 224,
                g[y++] = _ >> 6 & 63 | 128),
                g[y++] = 63 & _ | 128)
            }
            for (var b = f, w = "".concat(String.fromCharCode(43)).concat(String.fromCharCode(45)).concat(String.fromCharCode(97)) + "".concat(String.fromCharCode(94)).concat(String.fromCharCode(43)).concat(String.fromCharCode(54)), k = "".concat(String.fromCharCode(43)).concat(String.fromCharCode(45)).concat(String.fromCharCode(51)) + "".concat(String.fromCharCode(94)).concat(String.fromCharCode(43)).concat(String.fromCharCode(98)) + "".concat(String.fromCharCode(43)).concat(String.fromCharCode(45)).concat(String.fromCharCode(102)), x = 0; x < g.length; x++)
                b = n(b += g[x], w);
            return b = n(b, k),
            (b ^= m) < 0 && (b = 2147483648 + (2147483647 & b)),
            "".concat((b %= 1e6).toString(), ".").concat(b ^ f)
        }

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值