Python爬虫之JS逆向分析实战
base64加密是为了防水
- base64 处理字符串
- JS加密逆向
import base64
import requests
import handout
import time
import json
from hashlib import md5
获取真实的url地址
doc = handout.Handout('/handout')
string = 'aHR0cDovL3d3dy5kZGt5LmNvbS9jb21tb2RpdHkuaHRtbD9kZGt5Y2FjaGU9YTdiMTllODc5ZDJmMmYyNzlkMzU2ZjVhZmE2ZDVjZmY='
string = str(base64.b64decode(string), encoding='utf-8')
- 里面/product/queryOrgcodeProduct…有药品信息
- 里面可疑的参数为sign,通过搜索找到这样一段代码:
for (var f = t.get("method") + p + r, y = MD5(f), v = e + "?sign=" + y, k = 0; k < l.length; k++)
(t.get(l[k]) + "").indexOf("+") >= 0 || -1 != (t.get(l[k]) + "").indexOf("&") ? v += "&" + l[k] + "=" + encodeURIComponent(t.get(l[k])) : "pageUrl" == l[k] ? v += "&" + l[k] + "=" + encodeURIComponent(t.get(l[k])) : v += "&" + l[k] + "=" + t.get(l[k]);
return v
- 这里的 sign 值是 y,y的值是通过MD5(f)获取的,而f又是t.get(“method”) + p + r的结果。
因此需要知道t,p,r的值即可,而就在上面,告诉了我们怎么生成t:
t.put("t", u),
t.put("v", "1.0"),
t.containsKey("versionName") || t.put("versionName", _this.versionName),
t.put("plat", _this.getPlat()),
t.put("platform", _this.getPlatform()),
!t.containsKey("userId") || t.containsKey("loginToken") || t.containsKey("uDate") ? t.containsKey("userId") || _this.getUserId() && (t.put("loginToken", _this.getLoginToken()),
t.put("uDate", _this.getUDate()),
t.put("userId", _this.getUserId())) : (t.put("loginToken", _this.getLoginToken()),
t.put("uDate", _this.getUDate()));
这里没有method, 但是通过搜索,可以查到:
loadProduct:function(index){
var _this = this;
var _url = $$.online ? 'http://product.ddky.com/product/queryOrgcodeProductListForB2C.htm' : 'http://192.168.89.38/product/queryOrgcodeProductListForB2C.htm';
var params = new $$.DMap();
params.put('orgcode', _this.codes[index]);
params.put( 'orderTypeId' , '0');
params.put( 'shopId' , _this.shopId);
params.put( 'pageNo' , '1');
params.put( 'pageSize' , '100');
params.put( 'method' , 'ddsy.product.query.orgcode.product.list.b2c');
params.put( 'versionName' , '3.2.0');
var req = $$.getRequestURL( _url , params );
$$.sendAjax( req , function(res){
if(res.code == 0){
_this.renderList(res.data.productList);
}else{
alert(res.msg);
}
})
},
method = 'ddsy.product.query.orgcode.product.list.b2c
method知道了,怎么知道p与r?
打断点知道了
r = "6C57AB91A1308E26B797F4CD382AC79D"
p = "methodddsy.product.query.orgcode.product.list.b2corderTypeId0orgcode010101,010104pageNo1pageSize100platH5platformH5shopId-1t2019-9-24 16:51:10v1.0versionName3.2.0"
,可是这是个啥?
试了很多次,好像r就是个固定值。
t中有一些变量this.codes,this.shopId,经过搜索,发现代码:
var product = {
init:function(){
this.loadProduct(0);
this.clickEvent()
},
shopId:'-1',
codes:[
'010101,010104', // 感冒用药
'010502,010503,010504,010505,010506,010507',//儿童用药
'010401,010402,010403,010404,010406,010407',//风湿骨伤
'010801,010802,010803,010804,010901,010902,010807',//两性健康
'011303,010609',//三高用药
'011501,011502,011503,011504,011505,010301,010302,010303,010305,010306,010307,011601,011602,011603,011605,010201,010202,010203,011401,011402,011403,011404,010701,010702,010703,010704,010706,010707,010709,010710,010711,011101,011106',//其他药品
'020101,020103,020105,020201,020202,020203',//名贵滋补
'020403,020404,020406,020409,020414',//药食同源
'020301,020307,020310,020311,020315',//中药饮片
'050301,050303,050305,050401,050601,050602,050604,050605,050606,050101,050106,050102',//医疗器械
'030101,030102,030103,030107,030108,030109,030110,030111,030113',//营养健康
'030201,030202,030204',//营养食品
'040101,040501,040502,040404,040201,040204',//成人用品
'060901,061103,060603,060604,060607,060609,060610,060505,060301,060302,060304,060306,060803,060804,060805,060402,060403,060201,060701,060702',//彩妆个护
'060104'//婴幼用品
],
再仔细分析p发现后面就是一大溜的需要携带的参数,都是t中的参数,因此,我们的逻辑是,找到这些参数的值,组装成t,并用t组装成p,
再用t.get(“method”) + p + r组成f,最后再给f来个md5加密,即输出sign
def get_sign():
time_stamp = time.time()
local_time = time.localtime(time_stamp)
str_time = time.strftime("%Y-%m-%d %H:%M:%S", local_time)
l = ["method", "orderTypeId", "orgcode", "pageNo", "pageSize",
"plat", "platform", "shopId", "t", "v", "versionName"]
t = {
'method': 'ddsy.product.query.orgcode.product.list.b2c',
'orderTypeId': '0',
'orgcode': '010502,010503,010504,010505,010506,010507',
'pageNo': '1',
'pageSize': '100',
'plat': 'H5',
'platform': 'H5',
'shopId': '-1',
't': '{}'.format(str_time),
'v': '1.0',
'versionName': '3.2.0'
}
p = ''
for i in range(0, 11):
m = l[i]
p += m + t.get(m)
f = t['method'] + p + '6C57AB91A1308E26B797F4CD382AC79D'
# print(f)
sign = md5value(f).upper()
# print(sign)
return sign
def md5value(s):
a = md5(s.encode()).hexdigest()
return a
param = {'sign': get_sign()}
url = 'http://product.ddky.com/product/queryOrgcodeProductListForB2C.htm?orgcode=010101,010104&pageNo=1&pageSize=1000&shopId=-1'
text = requests.get(url, params=param).text
text = json.loads(text)
medicines = text['data']['productList']
for medicine in medicines:
print(medicine['name'], medicine['productSpecifications'])
最后将handout输出
doc.show()