爬虫——破解有道翻译 js md5加密
一、通过抓包获取有道翻译api
如图:
二、分析请求方式,数据传输
-
1. 请求头
经过分析发现cookie的**___rl__test__cookies参数是时间戳**
-
2. params
参数都是固定的,所以不需要改,直接传进去就可以了。
-
3. data
data中有4个参数是加密的,salt,sign,ts,bv
三、分析加密方式
- bv:将浏览器标识(“5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36”)进行md5加密
- ts:时间戳
- salt:ts 拼接一个[0,10)之间随机数
- sign:将这个**fanyideskweb{输入的内容}{salt}Nw(nmmbP%A-r6U3EUn]Aj"**进行md5加密
- cookie:时间戳
以下是JavaScript源码(由于经过压缩并且修改了变量名,所以可读性很差):
四、编写python爬虫
需要有四个模块:
- requests:爬虫主模块
- time:
time.time()
用于产生时间戳 - random:产生随机数
- hashlib:md5加密
import requests
import time
import random
import hashlib
def get_cipher(input_words):
# bv = n.md5(navigator.appVersion),
# ts = "" + (new Date).getTime(),
# salt = ts + parseInt(10 * Math.random(), 10)
# sign: n.md5("fanyideskweb" + key + salt + "Nw(nmmbP%A-r6U3EUn]Aj")
cipher = {}
md5 = hashlib.md5()
app_version = '5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36'
md5.update(app_version.encode())
cipher['bv'] = md5.hexdigest()
cipher['ts'] = int(time.time()*1000)
del md5 # 主动把原来那个md5对象释放
md5 = hashlib.md5() # 重新创建md5对象
cipher['salt'] = f'{cipher["ts"]}{random.randint(0,10)}'
md5.update(f"fanyideskweb{input_words}{cipher['salt']}Nw(nmmbP%A-r6U3EUn]Aj".encode())
cipher['sign'] = md5.hexdigest()
return cipher
def translate(input_words):
url = 'http://fanyi.youdao.com/translate_o'
cipher = get_cipher(input_words)
___rl__test__cookies = int(time.time() * 1000)
print(cipher)
params = {
'smartresult': 'dict',
'smartresult': 'rule'
}
data = {
'i': input_words,
'from': 'AUTO',
'to': 'AUTO',
'smartresult': 'dict',
'client': 'fanyideskweb',
'salt': cipher['salt'],
'sign': cipher['sign'],
'ts': cipher['ts'],
'bv': cipher['bv'],
'doctype': 'json',
'version': '2.1',
'keyfrom': 'fanyi.web',
'action': 'FY_BY_REALTlME'
}
print(url)
headers = {
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
'Content-Length': str(len(str(data).encode())),
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Cookie': f'OUTFOX_SEARCH_USER_ID=1988932807@39.179.63.34; OUTFOX_SEARCH_USER_ID_NCOO=1465447971.2625; _ntes_nnid=c207828eb003f1bcb7e40bf96b5e4798,1587631980566; JSESSIONID=abcLiLImTPEXUaHBeNrix; DICT_UGC=be3af0da19b5c5e6aa4e17bd8d90b28a|; ___rl__test__cookies={___rl__test__cookies}',
'Host': 'fanyi.youdao.com',
'Origin': 'http://fanyi.youdao.com',
'Pragma': 'no-cache',
'Referer': 'http://fanyi.youdao.com/?keyfrom=dict2.top',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest'
}
# print(len(str(data).encode()))
response = requests.post(url, params=params, headers=headers, data=data)
print(response.status_code)
print(response.text)
# import json
# result = json.loads(response.text)
# print(result)
# print(result['translateResult'])
if __name__ == "__main__":
# get_cipher('transform')
while True:
sentence = input('输入你要翻译的句子(q退出):')
if sentence.upper() == 'Q':
break
translate(sentence)
经过测试成功爬取了有道翻译内容
例如:输入transaction,得到它的翻译
本案例测试于2020年5月15日