一、确定爬取目标
受害者网站:https://fanyi.youdao.com/
二、数据解密过程分析
看一下几个接口
很明显返回的数据被加密了,根据之前的经验,返回的大概率是json格式的数据
响应数据加密无非三种形式:
从经验来看应该是第一种
思路有:搜索法(decrypt);在该接口内部搜索JSON.parse(
今天我们直接开大,使用hook JSON来通杀返回JSON类型的加密数据
时机是在数据加密与解密之间
简单调试一下,就找到解密位置了
继续向下跟到O函数,很明显是aes-cbc解密模式,经过测试发现是标准的aes-cbc算法
三、请求加密参数生成
然后我们只需要正确请求webtranslate接口即可
首先看请求头,重放测试有没有cookie反爬(重放的时候注意松开断点,不然可能返回数据为0B)
发现是会有影响的(可以先静态配置进行请求,如果不行再改)
然后观察表单数据,只有sign参数比较可疑
我们搜索sign值,发现并没有匹配到接口,说明是动态生成的加密参数
请求参数加密直接跟栈即可,注意观察url,不要跟错接口了
往上跟栈到这里看不到表单数据了,就在附近找参数e,t如何被加工的即可,于是找到了k(t)函数,或者上下翻翻看也会有意外之喜哦~
ok,关键参数直接一网打尽好吧
而sign其实就是由md5算法生成
四、代码实现
具体代码实现分两步走:
1、获取加密的响应数据
2、解密响应数据
首先来实现请求获取加密数据
注意要在webtranslate接口的时候进行分析
注:不同接口结果不同
补充:
crypto和crypto-js区别
得到结果可以静态固定参数和浏览器生成的对比一下
在编写代码的时候遇到错误request error:500(服务器拒绝访问)
注意:这种一般是请求参数出现错误,先检查get/post是否正确,然后就是post的请求参数一定是缺一不可!!!这个和请求头略有不同!!
中间还由于没有填写cookie,而出现爬取不全的情况,加上静态cookie之后。顺利拿到了完整的响应数据:
生成sign参数的js代码:
const crypto = require('crypto'); //别写成crypto-js了
const d = "fanyideskweb";
const u = "webfanyi"
//暂时写死
var t = 'fsdsogkndfokasodnaso'
function get_ts(){
return (new Date).getTime();
}
function get_sign(e, t) {
return _(`client=${d}&mysticTime=${e}&product=${u}&key=${t}`)
}
function _(e){
var x=0;
return crypto.createHash("md5").update(e.toString()).digest("hex")
}
/*
var ts = get_ts();
var ret = get_sign(ts,t)
console.log(ret)
console.log('over')*/
请求数据的python代码:
import requests
import time
import execjs
headers = {
"Connection": "keep-alive",
"Pragma": "no-cache",
"Cache-Control": "no-cache",
"Accept": "application/json, text/plain, */*",
"Content-Type": "application/x-www-form-urlencoded",
"sec-ch-ua-mobile": "?0",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36 Core/1.94.253.400 QQBrowser/12.6.5678.400",
"Origin": "https://fanyi.youdao.com",
"Sec-Fetch-Site": "same-site",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Dest": "empty",
"Referer": "https://fanyi.youdao.com/",
"Accept-Language": "zh-CN,zh;q=0.9"
}
cookies = {
"OUTFOX_SEARCH_USER_ID": "16045087@183.209.166.141",
"OUTFOX_SEARCH_USER_ID_NCOO": "1755943084.195078"
}
secretKey = 'fsdsogkndfokasodnaso'
query = '你好啊'
url = "https://dict.youdao.com/webtranslate"
with open('有道翻译.js','r',encoding='utf-8') as f:
ctx = execjs.compile(f.read())
mil_ts = int(time.time() * 1000)
sign = ctx.call('get_sign',mil_ts,secretKey)
data = {
"i": query,
"from": "auto",
"to": "",
"useTerm": "false",
"domain": "0",
"dictResult": "true",
"keyid": "webfanyi",
"sign": sign,
"client": "fanyideskweb",
"product": "webfanyi",
"appVersion": "1.0.0",
"vendor": "web",
'pointParam':'client,mysticTime,product',
"mysticTime": mil_ts,
"keyfrom": "fanyi.web",
"mid": "1",
"screen": "1",
"model": "1",
"network": "wifi",
"abtest": "0",
"yduuid": "abcdefg"
}
response = requests.post(url, headers=headers, cookies=cookies,data=data)
print(response.text)
print(response)
下一步就是来解密响应数据
继续扣代码,ok,直接测试通过
然后是编写python代码,结果运行时报错:
1)UnicodeDecodeError: 'gbk' codec can't decode byte 0xaf in position 205: illegal multibyte sequence
2)AttributeError: 'NoneType' object has no attribute 'replace'
解决方法和之前一样:
import subprocess
from functools import partial
subprocess.Popen = partial(subprocess.Popen, encoding="utf-8")
再次运行就可以正确解密了
用于解密的js代码
var key = "ydsecret://query/key/B*RGygVywfNBwpmBaZg*WT7SIOUP2T0C9WHMZN39j^DAdaZhAnxvGcCY6VYFwnHl"
var iv = "ydsecret://query/iv/C@lZe2YzHtZ2CYgaXKSVfsb7Y4QWHjITPPZ0nQp87fBeJ!Iv6v^6fvi2WN@bYpJ4"
function T(e){
return crypto.createHash("md5").update(e).digest()
}
function decrypt_youdao(e,t,o){
if (!e)
return null;
const a = Buffer.alloc(16, T(t))
, n = Buffer.alloc(16, T(o))
, r = crypto.createDecipheriv("aes-128-cbc", a, n);
let l = r.update(e, "base64", "utf-8");
return l += r.final("utf-8"),
l
}
完整python代码实现:
import subprocess
from functools import partial
subprocess.Popen = partial(subprocess.Popen, encoding="utf-8")
import requests
import time
import execjs
headers = {
"Connection": "keep-alive",
"Pragma": "no-cache",
"Cache-Control": "no-cache",
"Accept": "application/json, text/plain, */*",
"Content-Type": "application/x-www-form-urlencoded",
"sec-ch-ua-mobile": "?0",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36 Core/1.94.253.400 QQBrowser/12.6.5678.400",
"Origin": "https://fanyi.youdao.com",
"Sec-Fetch-Site": "same-site",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Dest": "empty",
"Referer": "https://fanyi.youdao.com/",
"Accept-Language": "zh-CN,zh;q=0.9"
}
cookies = {
"OUTFOX_SEARCH_USER_ID": "16045087@183.209.166.141",
"OUTFOX_SEARCH_USER_ID_NCOO": "1755943084.195078"
}
secretKey = 'fsdsogkndfokasodnaso'
query = '你好啊'
url = "https://dict.youdao.com/webtranslate"
with open('有道翻译.js','r',encoding='utf-8') as f:
ctx = execjs.compile(f.read())
mil_ts = int(time.time() * 1000)
sign = ctx.call('get_sign',mil_ts,secretKey)
data = {
"i": query,
"from": "auto",
"to": "",
"useTerm": "false",
"domain": "0",
"dictResult": "true",
"keyid": "webfanyi",
"sign": sign,
"client": "fanyideskweb",
"product": "webfanyi",
"appVersion": "1.0.0",
"vendor": "web",
'pointParam':'client,mysticTime,product',
"mysticTime": mil_ts,
"keyfrom": "fanyi.web",
"mid": "1",
"screen": "1",
"model": "1",
"network": "wifi",
"abtest": "0",
"yduuid": "abcdefg"
}
response = requests.post(url, headers=headers, cookies=cookies,data=data)
print(response.text)
print(response)
key = "ydsecret://query/key/B*RGygVywfNBwpmBaZg*WT7SIOUP2T0C9WHMZN39j^DAdaZhAnxvGcCY6VYFwnHl"
iv = "ydsecret://query/iv/C@lZe2YzHtZ2CYgaXKSVfsb7Y4QWHjITPPZ0nQp87fBeJ!Iv6v^6fvi2WN@bYpJ4"
plain_text = ctx.call('decrypt_youdao',response.text,key,iv)
print(plain_text)
ok,以上就是今天的全部内容了,喜欢的话记得点赞收藏哦🎃🎃