爬虫-requests库(三)post请求提交data的使用 — 在线百度翻译(含sign破解)

本文展示了如何通过解析并应用百度翻译的加密算法,改造YoudaoTranslate.py,成功绕过有道词典的反爬机制,实现实时翻译。介绍了关键的sign计算步骤和使用execjs调用JavaScript函数来获取正确签名的过程。
摘要由CSDN通过智能技术生成

以下内容,仅用于学习交流,如有侵权,请联系我删除。

使用下requests的post 方法,实现在线翻译,有道词典版本:
YoudaoTranslate:

import requests
import json
from fake_useragent import UserAgent


#定义url
url = "http://fanyi.youdao.com/translate_o?smartresult=dict,rule"

#设置request header
ua = UserAgent()
headers = {
    "User-Agent":ua.random
}

#交互输入关键字,封装至参数
word = input('enter a word to translate:')
data = {
	"i": word,
	"from": "AUTO",
	"to": "AUTO",
	"smartresult": "dict",
	"client": "fanyideskweb",
	"salt": "16139844601955",
	"sign": "3848b8d687ceebc75f707e67446b302e",
	"lts": "1613984460195",
	"bv": "e2a78ed30c66e16a857c5b6486a1d326",
	"doctype": "json",
	"version": "2.1",
	"keyfrom": "fanyi.web",
	"action": "lan-select"
}

# 发送请求,获取服务器给的响应
response = requests.post(url=url,data=data,headers=headers)
print(response.text)
dic = response.json()

#数据存储
with open('./file/'+ word + '.json','w',encoding='utf-8') as fp:
    json.dump(dic,fp=fp)
#提示完成
print("爬取完成!")

输入‘苹果’,获取下结果,打开看下:

{"errorCode": 50}

翻车了,有道词典的反爬机制,post提交的表单中有加密校验数据,看下这4个比较像
苹果1:

"salt": "16139844601955",
"sign": "3848b8d687ceebc75f707e67446b302e",
"lts": "1613984460195",
"bv": "e2a78ed30c66e16a857c5b6486a1d326",

翻译个香蕉试下:
香蕉:

"salt": "16139881928628",
"sign": "cb03cc38942f03041377142dea882b7c",
"lts": "1613988192862",
"bv": "e2a78ed30c66e16a857c5b6486a1d326",

bv没变化,排除。 再次翻译苹果
苹果2:

"salt": "16139885727316",
"sign": "c45efe7de329e3f1081d02ab427d52a2",
"lts": "1613988572731",

注意看苹果1和苹果2对比 salt,sign,lts都不一样,加密算法中应该到了时间戳,比较麻烦。
再看下百度翻译,翻译‘苹果’时的请求数据

{
	"from": "zh",
	"to": "en",
	"query": "苹果",
	"transtype": "realtime",
	"simple_means_flag": "3",
	"sign": "927377.705952",
	"token": "176a3e7a500893c9efb677c76f32b266",
	"domain": "common"
}

有sign 和 token 加密验证信息,那必须解析1个,解析百度翻译要容易写
a .sign 和 token 只验证了 sign ,
b. sign的值只依赖于 query的值,和时间戳、当前用户等无关,由js代码计算。
相关js代码

baidujs.js

var i = "320305.131321201"

function n(r, o) {
    for (var t = 0; t < o.length - 2; t += 3) {
        var a = o.charAt(t + 2);
        a = a >= "a" ? a.charCodeAt(0) - 87 : Number(a), a = "+" === o.charAt(t + 1) ? r >>> a : r << a, r = "+" === o.charAt(t) ? r + a & 4294967295 : r ^ a
    }
    return r
}


function e(r) {
    var o = r.match(/[\uD800-\uDBFF][\uDC00-\uDFFF]/g);
    if (null === o) {
        var t = r.length;
        t > 30 && (r = "" + r.substr(0, 10) + r.substr(Math.floor(t / 2) - 5, 10) + r.substr(-10, 10))
    } else {
        for (var e = r.split(/[\uD800-\uDBFF][\uDC00-\uDFFF]/), C = 0, h = e.length, f = []; h > C; C++) "" !== e[C] && f.push.apply(f, a(e[C].split(""))), C !== h - 1 && f.push(o[C]);
        var g = f.length;
        g > 30 && (r = f.slice(0, 10).join("") + f.slice(Math.floor(g / 2) - 5, Math.floor(g / 2) + 5).join("") + f.slice(-10).join(""))
    }
    var u = void 0, l = "" + String.fromCharCode(103) + String.fromCharCode(116) + String.fromCharCode(107);
    u = null !== i ? i : (i = window[l] || "") || "";
    for (var d = u.split("."), m = Number(d[0]) || 0, s = Number(d[1]) || 0, S = [], c = 0, v = 0; v < r.length; v++) {
        var A = r.charCodeAt(v);
        128 > A ? S[c++] = A : (2048 > A ? S[c++] = A >> 6 | 192 : (55296 === (64512 & A) && v + 1 < r.length && 56320 === (64512 & r.charCodeAt(v + 1)) ? (A = 65536 + ((1023 & A) << 10) + (1023 & r.charCodeAt(++v)), S[c++] = A >> 18 | 240, S[c++] = A >> 12 & 63 | 128) : S[c++] = A >> 12 | 224, S[c++] = A >> 6 & 63 | 128), S[c++] = 63 & A | 128)
    }
    for (var p = m, F = "" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(97) + ("" + String.fromCharCode(94) + String.fromCharCode(43) + String.fromCharCode(54)), D = "" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(51) + ("" + String.fromCharCode(94) + String.fromCharCode(43) + String.fromCharCode(98)) + ("" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(102)), b = 0; b < S.length; b++) p += S[b], p = n(p, F);
    return p = n(p, D), p ^= s, 0 > p && (p = (2147483647 & p) + 2147483648), p %= 1e6, p.toString() + "." + (p ^ m)
}

把这个文件保存在项目根路径下,用execjs库调用js的方法,我们做个测试

import execjs


word = input('enter a word to translate:')
with open('baidujs.js') as f:
    jsData = f.read()
p = execjs.compile(jsData).call('e',word)
print("sign value is:" + str(p))

运行看下结果:

enter a word to translate:苹果
sign value is:927377.705952
enter a word to translate:香蕉
sign value is:816986.562283

OK, 用最开始的 YoudaoTranslate.py 改造个下
BaiduTranslate.py

import requests
import json
import execjs
from fake_useragent import UserAgent


#定义url
url = "https://fanyi.baidu.com/v2transapi"

#设置request header
ua = UserAgent()
headers = {
    'Accept': '*/*',
    'Accept-Language': 'zh-CN,zh;q=0.9',
    'Connection': 'keep-alive',
    'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
    'Cookie': '改成自己抓包里的cookie',
    'Host': 'fanyi.baidu.com',
    'Origin': 'https://fanyi.baidu.com',
    'Referer': 'https://fanyi.baidu.com/',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36',
    'X-Requested-With': 'XMLHttpRequest',
    #"User-Agent":ua.random
}


#交互输入关键字
word = input('enter a word to translate:')
#计算sign
with open('baidujs.js') as f:
    jsData = f.read()
p = execjs.compile(jsData).call('e',word)

#封装至参数
data = {
	"from": "zh",
	"to": "en",
	"query": word,
	"transtype": "realtime",
	"simple_means_flag": "3",
    "sign": p,
    "token": "176a3e7a500893c9efb677c76f32b266",
	"domain": "common"
}



# 发送请求,获取服务器给的响应
response = requests.post(url=url,data=data,headers=headers)
dic = response.json()

#数据存储
with open('./file/'+ word + '.json','w',encoding='utf-8') as fp:
    json.dump(dic,fp=fp,ensure_ascii=False)
#提示完成
print("爬取完成!")

运行查看结果:

enter a word to translate:橘子
爬取完成!

橘子.json
在这里插入图片描述

  • 2
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值