前言
本案例难度一般,需要逆向两个加密参数,对于初学者还是比较麻烦的,希望通过我的分析过程,能帮助大家快速理解!
分析
首先找到评论数据接口,看看有没有什么反扒手段?
发现有两个请求参数都加密了,params应该都知道,请求接口时提交的参数,那我们就全局搜索encSecKey
下图应该就是生成encSecKey的地方,通过bKB8t函数生成,那我们继续看bKB8t是干嘛的
js生成代码
var bKB8t = window.asrsea(JSON.stringify(i7b), buU5Z(["流泪", "强"]), buU5Z(Rg1x.md), buU5Z(["爱心", "女孩", "惊恐", "大笑"]));
JSON.stringify(i7b)应该是我们请求的参数,其余的都是固定生成的,我们在补代码的时候,可以直接写死!
var b = "010001";
var c = "00e0b509f6259df8642dbc35662901477df22677ec152b5ff68ace615bb7b725152b3ab17a876aea8a5aa76d2e417629ec4ee341f56135fccf695280104e0312ecbda92557c93870114af6c9d05c4f7f0c3685b7a46bee255932575cce10b424d813cfe4875d3e82047b97ddef52741d546b8e289dc6935b3ece0462db0a22b8e7";
var d = "0CoJUm6Qyw8W8jud";
继续找window.asrsea这个方法
把这个方法整段都抠出来,然后缺什么就补什么
还需要大家自己扣出来的函数方法:CryptoJS,setMaxDigits,encryptedString,RSAKeyPair,这些都是没有定义的,一定要补全,不然是跑不起来的,因为扣代码过程比较麻烦,图文不太好讲解,就不具体说明了
代码过程
使用execjs库运行js,得到加密参数(我这里已经补全了js)
import requests, execjs, json
def encrypt(params):
with open("music163.js", "r", encoding="utf-8") as f:
ctx = execjs.compile(f.read())
result = json.loads(ctx.call("encrypt", json.dumps(params))) # 将请求参数编码成json字符串
return result
评论API:
aHR0cHM6Ly9tdXNpYy4xNjMuY29tL3dlYXBpL3YxL3Jlc291cmNlL2NvbW1lbnRzL1JfU09fNF8xMzg0MDI2ODg5P2NzcmZfdG9rZW49
请求评论接口
def get_page(page):
params = {
"csrf_token": "",
"limit": "20", # 每页限制个数
"offset": str(page * 20), # 控制翻页,每次增加20
"rid": "R_SO_4_1384026889", # R_SO_4_ 加上歌曲id
"total": "true" if i == 0 else "false", # 只有第1页为true,后面均为false
}
url = "https://music.163.com/weapi/v1/resource/comments/R_SO_4_1384026889?csrf_token="
try:
response = requests.post(url, headers=headers, data=encrypt(params))
if response.status_code == 200:
return response.json()
else:
print("爬取失败:", response.status_code)
except Exception as e:
print("ERROR:", e)
return None
获取评论数据
def parse(response):
comments = response["comments"]
for comment in comments:
content = comment["content"]
nickname = comment["user"]["nickname"]
print(nickname + ":" + content + "\n")
成果展示
完整代码
import requests, execjs, json
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36',
}
def encrypt(params):
with open("music163.js", "r", encoding="utf-8") as f:
ctx = execjs.compile(f.read())
result = json.loads(ctx.call("encrypt", json.dumps(params))) # 将请求参数编码成json字符串
return result
def get_page(page):
params = {
"csrf_token": "",
"limit": "20", # 每页评论数
"offset": str(page * 20), # 每次增加20
"rid": "R_SO_4_1303289043", # R_SO_4_ 加上歌曲id
"total": "true" if i == 0 else "false", # 只有第1页为true,后面均为false
}
url = "https://music.163.com/weapi/v1/resource/comments/R_SO_4_1303289043?csrf_token="
try:
response = requests.post(url, headers=headers, data=encrypt(params))
if response.status_code == 200:
return response.json()
else:
print("爬取失败:", response.status_code)
return None
except Exception as e:
print("ERROR:", e)
return None
def parse_page(response):
if response:
comments = response["comments"]
for comment in comments:
content = comment["content"]
nickname = comment["user"]["nickname"]
print(nickname + ":" + content + "\n")
if __name__ == "__main__":
for i in range(3): # 爬取几页就写几页
html = get_page(i)
parse_page(html)
js代码就不分享出来了,要是有需要的可私信我领取
点关注不迷路,本文若对你有帮助,烦请三连支持一下 ❤️❤️❤️
各位的支持和认可就是我最大的动力❤️❤️❤️