歌单
1.获取歌名,作者
本来想逆向搜索歌单的,结果酷我实在是,一坨,
改变策略吧,我们爬取推荐歌单的音乐,先打开一个歌单。
搜索了一下Network,发现是静态的,那下面就是想办法提取歌名,作者,一起他们的链接了。方法用很多,我这里推荐xpath,他这种都是结构化的,都是一层一层的,所有不太好自己去写xpath,要自己复制,我建议复制两个,如何对比下差异。
//*[@id="__layout"]/div/div[2]/div/div[1]/div[2]/div[1]/div[2]/div[1]/ul/li[1]/div[2]/a
//*[@id="__layout"]/div/div[2]/div/div[1]/div[2]/div[1]/div[2]/div[1]/ul/li[2]/div[2]/a
我复制了两个xpath,可以发现,前面都一样,不同再u/li后面,那么我们要的是所有li的a标签里的内容,那修改xpath
//*[@id="__layout"]/div/div[2]/div/div[1]/div[2]/div[1]/div[2]/div[1]/ul//li/div[2]/a
这样就可以匹配了,要匹配什么属性,就在/a后面加/@属性即可。
title:'//*[@id="__layout"]/div/div[2]/div/div[1]/div[2]/div[1]/div[2]/div[1]/ul//li/div[2]/a/@title'
url:'//*[@id="__layout"]/div/div[2]/div/div[1]/div[2]/div[1]/div[2]/div[1]/ul//li/div[2]/a/@href'
author:'//[@id="__layout"]/div/div[2]/div/div[1]/div[2]/div[1]/div[2]/div[1]/ul/li//div[3]/span/text()'
作者的获得方法,和前面一致,不再赘述。
2.获取歌曲下载链接
我们先选择一首歌曲,去播放。
随便点了一首,把network清空,然后点击立即播放。发现了歌曲Mp3链接。可以多点几个,对比下payload.
mid就是上面链接的数字,其他参数都是固定的,reqld是个加密参数,需要js逆向。
全局搜索,找到第一个js文件。
搜索reqId,找到第一个给上断点,然后刷新,发现没有暂停,说明不是这个,继续往下找
找到这个,给上断点,刷新。发现暂停了。
n明显是我们要的,继续分析n。n=c()(),继续网上,找c的定义。
往上翻找到了c的定义,这是明显的webpack,如果不明白webpack是上面的朋友,自己去搜一下,我这里主要讲解逆向,不提别的了。
重新给c打上断点,然后放掉阻拦,刷新。
我们选中n.n然后点击蓝色的链接。跳到了webpack模块里,我们把整个js文件复制下来。
复制好了后,声明一下window=global,再自己随便定义一个变量,后面我们要调用这个模块,所有需要一个变量。发现代码里面全是d.什么,说明入口函数是d,,我们在模块末尾把d赋值给自己定义的变量。
回到之前的js文件,把这两个变量定义扣下来。
之前凡汐过了,reqId就是等于c,等于c()()那我们直接赋值,把n改成直接定义的变量,然后我们尝试输出reqld。
,
发现报错了,说明我们少模块。我们进入的模块入口函数。
、
提示我们少109模块。
我们直接在console里面输入n(109)查找位置,点击输出的内容,即可定位。
我们定位到了之后,收缩代码,把他拷贝下来。
这里的webpack模块是列表类型的,我们修改一下,把[]改成{}变成字典。
复制进来。继续输出,提示缺少204.205,继续同样的方法找。
我们再次调试输出,reqld征程输出,写一个入口函数,方便python调用。
写一个函数,把Mid提取出来,把api_url改装。
提取出Mid获得reqld,尝试发起请求。
请求失败,我们添加请求头试试,我一般尝试不喜欢加,因为有些网站不检测。
经过调试,headers里需要这两个,然后把自己的cookie复制进去。
这样就能请求到了。单首的已经获取到了,下面就是要用多线程获取全部的列表了。
这是基础内容,不做讲解,然后就可以去分别下载了。
编写下载。
修改一下get_info的参数,这样就可以爬取任意参数了。
源码:
import re
import threading
import execjs
import requests
import parsel
cookies = {
'_ga': 'GA1.2.1458867714.1695815628',
'_gid': 'GA1.2.1093541141.1696499426',
'Hm_lvt_cdb524f42f0ce19b169a8071123a4797': '1695815628,1696499426,1696509109',
'Hm_lpvt_cdb524f42f0ce19b169a8071123a4797': '1696513188',
'_ga_ETPBRPM9ML': 'GS1.2.1696509109.4.1.1696513188.60.0.0',
'Hm_Iuvt_cdb524f42f0cer9b268e4v7y734w5esq24': 'JBE4ymzMGyfDAJ4fmQTZyNiXBRXDm2ZG',
}
headers = {
'Secret': '6afb5aff80965e273c89b414a8d33bcd77b068454e291814811757d747c4322804faecff',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36',
}
def get_info(url):
res = requests.get(url=url).text
select = parsel.Selector(res)
url = select.xpath(
'//*[@id="__layout"]/div/div[2]/div/div[1]/div[2]/div[1]/div[2]/div[1]/ul//li/div[2]/a/@href').getall()
title = select.xpath(
'//*[@id="__layout"]/div/div[2]/div/div[1]/div[2]/div[1]/div[2]/div[1]/ul//li/div[2]/a/@title').getall()
author = select.xpath(
'//*[@id="__layout"]/div/div[2]/div/div[1]/div[2]/div[1]/div[2]/div[1]/ul/li//div[3]/span/text()').getall()
for i,j,k in zip(url,title,author):
t=threading.Thread(args=(i,j,k),target=get_mp3)
t.start()
def get_mp3(url,title,author):
mid=re.findall("/play_detail/(.*)",url)[0]
with open("kuwo.js",'r') as f:
js=f.read()
js=execjs.compile(js)
reqld=js.call("start")
api_url=f"https://bd.kuwo.cn/api/v1/www/music/playUrl?mid={mid}&type=music&httpsStatus=1&reqId={reqld}&plat=web_www&from="
res=requests.get(url=api_url,headers=headers,cookies=cookies).json()
audio_url=res['data']['url']
audio=requests.get(url=audio_url).content
with open(f"data/kuwo/{title}.mp3","wb") as f:
f.write(audio)
print(title+"下载成功!!!")
url=input()
get_info(url)
js:
window=global
var nuo
!function(e) {
function n(data) {
for (var n, t, d = data[0], l = data[1], f = data[2], i = 0, m = []; i < d.length; i++)
t = d[i],
Object.prototype.hasOwnProperty.call(o, t) && o[t] && m.push(o[t][0]),
o[t] = 0;
for (n in l)
Object.prototype.hasOwnProperty.call(l, n) && (e[n] = l[n]);
for (h && h(data); m.length; )
m.shift()();
return c.push.apply(c, f || []),
r()
}
function r() {
for (var e, i = 0; i < c.length; i++) {
for (var n = c[i], r = !0, t = 1; t < n.length; t++) {
var l = n[t];
0 !== o[l] && (r = !1)
}
r && (c.splice(i--, 1),
e = d(d.s = n[0]))
}
return e
}
var t = {}
, o = {
32: 0
}
, c = [];
function d(n) {
console.log(n)
if (t[n])
return t[n].exports;
var r = t[n] = {
i: n,
l: !1,
exports: {}
};
return e[n].call(r.exports, r, r.exports, d),
r.l = !0,
r.exports
}
d.e = function(e) {
var n = []
, r = o[e];
if (0 !== r)
if (r)
n.push(r[2]);
else {
var t = new Promise((function(n, t) {
r = o[e] = [n, t]
}
));
n.push(r[2] = t);
var c, script = document.createElement("script");
script.charset = "utf-8",
script.timeout = 120,
d.nc && script.setAttribute("nonce", d.nc),
script.src = function(e) {
return d.p + "" + ({
0: "commons/5b7f9e1d",
1: "vendors/f2d66b02",
2: "vendors/0f68e262",
5: "pages/album_detail/_index",
6: "pages/blackshark/index",
7: "pages/callback",
8: "pages/down/index",
9: "pages/downtingshu/index",
10: "pages/index",
11: "pages/logout/index",
12: "pages/musician/index",
13: "pages/musician/page",
14: "pages/mvplay/_index",
15: "pages/mvs/index",
16: "pages/play_detail/_index",
17: "pages/playlist_detail/_index",
18: "pages/playlists/index",
19: "pages/rankList/index",
20: "pages/search",
21: "pages/search/album",
22: "pages/search/list",
23: "pages/search/mv",
24: "pages/search/playlist",
25: "pages/search/singers",
26: "pages/singer_detail/_index",
27: "pages/singer_detail/index/album",
28: "pages/singer_detail/index/index",
29: "pages/singer_detail/index/info",
30: "pages/singer_detail/index/mv",
31: "pages/singers/index"
}[e] || e) + "." + {
0: "7f2c0cc",
1: "587cb3e",
2: "4deec49",
5: "504c7f9",
6: "c52f389",
7: "ffdfc35",
8: "b4aa1c8",
9: "d621113",
10: "c3974e6",
11: "28722e8",
12: "d5af07f",
13: "5215e8f",
14: "f8ebf2a",
15: "1d8d1c3",
16: "21b1a5f",
17: "64090ae",
18: "0407eb0",
19: "a5a4c3a",
20: "df68ca7",
21: "87fa638",
22: "8e8f3e5",
23: "d5141ec",
24: "db31910",
25: "3b69ddc",
26: "5ca0d98",
27: "362866f",
28: "d44680b",
29: "c53e7d8",
30: "6c9f4d7",
31: "1cd60fe"
}[e] + ".js"
}(e);
var l = new Error;
c = function(n) {
script.onerror = script.onload = null,
clearTimeout(f);
var r = o[e];
if (0 !== r) {
if (r) {
var t = n && ("load" === n.type ? "missing" : n.type)
, c = n && n.target && n.target.src;
l.message = "Loading chunk " + e + " failed.\n(" + t + ": " + c + ")",
l.name = "ChunkLoadError",
l.type = t,
l.request = c,
r[1](l)
}
o[e] = void 0
}
}
;
var f = setTimeout((function() {
c({
type: "timeout",
target: script
})
}
), 12e4);
script.onerror = script.onload = c,
document.head.appendChild(script)
}
return Promise.all(n)
}
,
d.m = e,
d.c = t,
d.d = function(e, n, r) {
d.o(e, n) || Object.defineProperty(e, n, {
enumerable: !0,
get: r
})
}
,
d.r = function(e) {
"undefined" != typeof Symbol && Symbol.toStringTag && Object.defineProperty(e, Symbol.toStringTag, {
value: "Module"
}),
Object.defineProperty(e, "__esModule", {
value: !0
})
}
,
d.t = function(e, n) {
if (1 & n && (e = d(e)),
8 & n)
return e;
if (4 & n && "object" == typeof e && e && e.__esModule)
return e;
var r = Object.create(null);
if (d.r(r),
Object.defineProperty(r, "default", {
enumerable: !0,
value: e
}),
2 & n && "string" != typeof e)
for (var t in e)
d.d(r, t, function(n) {
return e[n]
}
.bind(null, t));
return r
}
,
d.n = function(e) {
var n = e && e.__esModule ? function() {
return e.default
}
: function() {
return e
}
;
return d.d(n, "a", n),
n
}
,
d.o = function(object, e) {
return Object.prototype.hasOwnProperty.call(object, e)
}
,
d.p = "https://h5static.kuwo.cn/www/kw-www/",
d.oe = function(e) {
throw console.error(e),
e
}
;
var l = window.webpackJsonp = window.webpackJsonp || []
, f = l.push.bind(l);
l.push = n,
l = l.slice();
for (var i = 0; i < l.length; i++)
n(l[i]);
var h = f;
r()
nuo=d
}({
"109":function(t, e, n) {
var r, o, l = n(204), c = n(205), d = 0, h = 0;
t.exports = function(t, e, n) {
var i = e && n || 0
, b = e || []
, f = (t = t || {}).node || r
, v = void 0 !== t.clockseq ? t.clockseq : o;
if (null == f || null == v) {
var m = l();
null == f && (f = r = [1 | m[0], m[1], m[2], m[3], m[4], m[5]]),
null == v && (v = o = 16383 & (m[6] << 8 | m[7]))
}
var y = void 0 !== t.msecs ? t.msecs : (new Date).getTime()
, w = void 0 !== t.nsecs ? t.nsecs : h + 1
, dt = y - d + (w - h) / 1e4;
if (dt < 0 && void 0 === t.clockseq && (v = v + 1 & 16383),
(dt < 0 || y > d) && void 0 === t.nsecs && (w = 0),
w >= 1e4)
throw new Error("uuid.v1(): Can't create more than 10M uuids/sec");
d = y,
h = w,
o = v;
var x = (1e4 * (268435455 & (y += 122192928e5)) + w) % 4294967296;
b[i++] = x >>> 24 & 255,
b[i++] = x >>> 16 & 255,
b[i++] = x >>> 8 & 255,
b[i++] = 255 & x;
var _ = y / 4294967296 * 1e4 & 268435455;
b[i++] = _ >>> 8 & 255,
b[i++] = 255 & _,
b[i++] = _ >>> 24 & 15 | 16,
b[i++] = _ >>> 16 & 255,
b[i++] = v >>> 8 | 128,
b[i++] = 255 & v;
for (var A = 0; A < 6; ++A)
b[i + A] = f[A];
return e || c(b)
}
},
"204":function(t, e) {
var n = "undefined" != typeof crypto && crypto.getRandomValues && crypto.getRandomValues.bind(crypto) || "undefined" != typeof msCrypto && "function" == typeof window.msCrypto.getRandomValues && msCrypto.getRandomValues.bind(msCrypto);
if (n) {
var r = new Uint8Array(16);
t.exports = function() {
return n(r),
r
}
} else {
var o = new Array(16);
t.exports = function() {
for (var t, i = 0; i < 16; i++)
0 == (3 & i) && (t = 4294967296 * Math.random()),
o[i] = t >>> ((3 & i) << 3) & 255;
return o
}
}
},
"205":function(t, e) {
for (var n = [], i = 0; i < 256; ++i)
n[i] = (i + 256).toString(16).substr(1);
t.exports = function(t, e) {
var i = e || 0
, r = n;
return [r[t[i++]], r[t[i++]], r[t[i++]], r[t[i++]], "-", r[t[i++]], r[t[i++]], "-", r[t[i++]], r[t[i++]], "-", r[t[i++]], r[t[i++]], "-", r[t[i++]], r[t[i++]], r[t[i++]], r[t[i++]], r[t[i++]], r[t[i++]]].join("")
}
}
});
function start(){
var l = nuo(109)
, c = nuo.n(l)
var reqld=c()()
return reqld
}
感谢观看,觉得有点难是正常的,爬虫就是这样的,但是,如果你连正常的写爬虫代码都觉得吃力,那还打好基础,不要急于求成。