4-9

爬取qq音乐

一:分析搜索界面

怎么找到歌曲信息

分别搜索不同的歌手或者个名,可以发现只有歌单列表是变化的!

当我们观察网页url时,随着网页的加载请求而变化只是网址后 面的值

复制浏览器的url得: y.qq.com/portal/sear…

对应

y.qq.com/portal/sear…

对应

这里应该是将中文进行了编码再传入url中,这样就得到了如何找到搜索界面的url!!哈哈哈! 但是只搜素一页的内容,尝试改变url的page值,发现页面内的歌曲在变化!

好的,准备爬取一波!

import requests
import json
from bs4 import BeautifulSoup
url = 'https://c.y.qq.com/soso/fcgi-bin/client_search_cp?ct=24&qqmusic_ver=1298&new_json=1&remoteplace=txt.yqq.center&searchid=47789770466433535&t=0&aggr=1&cr=1&catZhida=1&lossless=0&flag_qc=0&p=1&n=10&w=%E5%BC%A0%E5%9B%BD%E8%8D%A3&g_tk=5381&loginUin=0&hostUin=0&format=json&inCharset=utf8&outCharset=utf-8&notice=0&platform=yqq.json&needNewCode=0'
resp = requests.get(url)
resp.encoding='utf-8'
a = json.loads(resp.text)
print(a)
复制代码

但是!我xx

爬取的内容居然没有歌曲信息!?

那初步判断是通过js获取的,于是要分析请求网页的过程,浏览器F12,切换到控制台,刷新网页,Ctrl+F搜索一下找到相关包!

发现一个叫做

是一个json文件,一层层点开发现所需要的信息!那这样我们就是通过该url,请求链接得到响应的json字符串,然后用python去解析

看josn格式的文件太乱了,发现了一个解码的神奇网站!点击www.bejson.com/jsonviewern…

那就可以轻易找到歌曲信息了,在网站的右侧显示了概览,曲信息是在songlist下面,是一个list,里面包含了每首歌曲的信息,每首歌是一个json字典对象。歌曲的id,歌曲名直接在'id'和'titile'下,而歌手名在名叫作singer的list下的第一个字典下。那就可爬取啦!

首先要分析json数据,解析json字符串,转换为python对象jsondata=json.loads() 。 上代码:

import requests
import json
from bs4 import BeautifulSoup
url = 'https://c.y.qq.com/soso/fcgi-bin/client_search_cp?ct=24&qqmusic_ver=1298&new_json=1&remoteplace=txt.yqq.center&searchid=47789770466433535&t=0&aggr=1&cr=1&catZhida=1&lossless=0&flag_qc=0&p=1&n=10&w=%E5%BC%A0%E5%9B%BD%E8%8D%A3&g_tk=5381&loginUin=0&hostUin=0&format=json&inCharset=utf8&outCharset=utf-8&notice=0&platform=yqq.json&needNewCode=0'
resp = requests.get(url)
resp.encoding='utf-8'
a = json.loads(resp.text)
b = a.get('data').get('song').get('list')
#print(b)
songs_list = []
for i in b:
    result = {}
    result['id'] = i.get('id')
    result['title'] = i.get('title')
    result['singer'] = i.get('singer')[0].get('title')
    songs_list.append(result)
    print(result)
复制代码

得到结果:

{'id': 105602369, 'title': '风继续吹', 'singer': '张国荣'}
{'id': 471461, 'title': '当爱已成往事', 'singer': '张国荣'}
{'id': 4899362, 'title': '当年情', 'singer': '张国荣'}
{'id': 1375623, 'title': '沉默是金', 'singer': '张国荣'}
{'id': 3961, 'title': '玻璃之情', 'singer': '张国荣'}
{'id': 106731742, 'title': '倩女幽魂', 'singer': '张国荣'}
{'id': 4787727, 'title': '千千阙歌 (90 Live)', 'singer': '张国荣'}
{'id': 1377649, 'title': '风再起时', 'singer': '张国荣'}
{'id': 7132726, 'title': '我 (国语)', 'singer': '张国荣'}
{'id': 163233, 'title': '共同渡过', 'singer': '张国荣'}
复制代码

哈哈哈,找到歌曲信息了!分割线


现在来构造搜索url,找到刚才json包的url为

https://c.y.qq.com/soso/fcgi-bin/client_search_cp?ct=24&qqmusic_ver=1298&new_json=1&remoteplace=txt.yqq.center&searchid=38127408304238659&t=0&aggr=1&cr=1&catZhida=1&lossless=0&flag_qc=0&p=1&n=10&w=%E5%BC%A0%E5%9B%BD%E8%8D%A3&g_tk=5381&loginUin=0&hostUin=0&format=json&inCharset=utf8&outCharset=utf-8&notice=0&platform=yqq.json&needNewCode=0
复制代码

对比其他url

https://c.y.qq.com/soso/fcgi-bin/client_search_cp?ct=24&qqmusic_ver=1298&new_json=1&remoteplace=txt.yqq.song&searchid=56386297828639744&t=0&aggr=1&cr=1&catZhida=1&lossless=0&flag_qc=0&p=1&n=10&w=%E6%88%91%E6%9B%BE&g_tk=5381&loginUin=0&hostUin=0&format=json&inCharset=utf8&outCharset=utf-8&notice=0&platform=yqq.json&needNewCode=0
复制代码

查看到w后面跟的就是要搜索的关键词!那就把要搜索的中文编码之后传进去就可以了!这次加上浏览器头部信息啦,假装我是一只浏览器,我不是爬虫啦!

import requests
from urllib import parse
import json
import urllib
from bs4 import BeautifulSoup
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36'}
url = 'https://c.y.qq.com/soso/fcgi-bin/client_search_cp?ct=24&qqmusic_ver=1298&new_json=1&remoteplace=txt.yqq.song&searchid=64768420417553403&t=0&aggr=1&cr=1&catZhida=1&lossless=0&flag_qc=0&\
 p=1&n=10&{}&g_tk=1531112714&loginUin=3237707674&hostUin=0&format=json&inCharset=utf8&outCharset=utf-8&notice=0&platform=yqq.json&needNewCode=0'

word = '陈奕迅'
dict = {'w': word}
url_data = parse.urlencode(dict)   # 将word编码
resp = requests.get(url.format(url_data), headers=headers)
resp.encoding='utf-8'
a = json.loads(resp.text)
b = a.get('data').get('song').get('list')
#print(b)
songs_list = []
for i in b:
    result = {}
    result['id'] = i.get('id')
    result['title'] = i.get('title')
    result['singer'] = i.get('singer')[0].get('title')
    songs_list.append(result)
    print(result)
复制代码

得到结果:

{'id': 1313990, 'title': '红玫瑰', 'singer': '陈奕迅'}
{'id': 1249550, 'title': '富士山下', 'singer': '陈奕迅'}
{'id': 4830342, 'title': '十年', 'singer': '陈奕迅'}
{'id': 1313993, 'title': '好久不见', 'singer': '陈奕迅'}
{'id': 9059607, 'title': '不要说话', 'singer': '陈奕迅'}
{'id': 1313988, 'title': '淘汰', 'singer': '陈奕迅'}
{'id': 4907894, 'title': '单车', 'singer': '陈奕迅'}
{'id': 1251166, 'title': '浮夸', 'singer': '陈奕迅'}
{'id': 1313992, 'title': '爱情转移', 'singer': '陈奕迅'}
{'id': 4907901, 'title': 'K歌之王 (粤语)', 'singer': '陈奕迅'}
复制代码

成功了,终于实现了搜索歌曲功能了!另外在操作中发现歌曲的搜索得到的歌曲数目是可以通过改变url中的n的值来实现的,如果将n=10,改成n=100那么将得到包含100首歌曲信息的文件!

那要怎么找到下载歌曲文件呢?




找到歌曲播放界面,尝试找到歌曲的音频文件,网页的音频文件一般在控制台的Media中可以找到,尝试寻找,浏览器F12,切换到控制台,刷新网页

一个个打开文件,但是只有最后一个文件是歌曲文件

哈哈,歌曲文件也找到了,那就观察他的url,尝试构造歌曲的url

http://183.60.23.15/amobile.music.tc.qq.com/C400003msXea3kjDlz.m4a?guid=3719823069&vkey=FB0C17C7CDDF0BEBF3518AA8019BF295AE625C0DB17D989AAC19EAA1EE91B86200110C50E5874EAE3A0E120DF12F0870306C0943B285BA45&uin=0&fromtag=66

http://14.152.88.149/amobile.music.tc.qq.com/C400004YMXMx3Yo5vE.m4a?guid=3719823069&vkey=A9569680E2B3C6214A462C8EC8A160B83FC6CFEF009E22670721AD0F04EC0749BFC90337215249BA96BC6AB93532ACE75102D66DF73AC493&uin=0&fromtag=66

对比两个url,vkey,和C40000+XXXXXXX,那能找到这两个参数就好了,Ctrl+f找一下!

找到好几个文件都有这个,同样复制到解析网站解析!发现目标啦!

就是这个叫做什么Mid的东西!注意到之前搜索歌曲的时候也有叫做mid的东西,一对比(ctrl+f)发现也在其中,那就可以直接在搜索歌曲信息的时候找出来啦!

将mid加入到爬取的信息中:

那么接着找Vkey啦!! Ctrl+f找一下,同样的方法,不变的配方!你懂的啦!

发现josn文件!开心开心,抓紧解析一下! 发现敌军!
发现veky在req->data下,将其提取出来!

import requests
from urllib import parse
import json
import urllib
from bs4 import BeautifulSoup
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36'}
url = 'https://u.y.qq.com/cgi-bin/musicu.fcg?-=getplaysongvkey2954502924310327&g_tk=5381&loginUin=0&hostUin=0&format=json&inCharset=utf8&outCharset=utf-8&notice=0&platform=yqq.json&needNewCode=0&data=%7B%22req%22%3A%7B%22module%22%3A%22CDN.SrfCdnDispatchServer%22%2C%22method%22%3A%22GetCdnDispatch%22%2C%22param%22%3A%7B%22guid%22%3A%223719823069%22%2C%22calltype%22%3A0%2C%22userip%22%3A%22%22%7D%7D%2C%22req_0%22%3A%7B%22module%22%3A%22vkey.GetVkeyServer%22%2C%22method%22%3A%22CgiGetVkey%22%2C%22param%22%3A%7B%22guid%22%3A%223719823069%22%2C%22songmid%22%3A%5B%22000bSg2U4GcrUi%22%5D%2C%22songtype%22%3A%5B0%5D%2C%22uin%22%3A%220%22%2C%22loginflag%22%3A1%2C%22platform%22%3A%2220%22%7D%7D%2C%22comm%22%3A%7B%22uin%22%3A0%2C%22format%22%3A%22json%22%2C%22ct%22%3A24%2C%22cv%22%3A0%7D%7D'
resp = requests.get(url.format(url_data), headers=headers)
resp.encoding='utf-8'
a = json.loads(resp.text)
b = a.get('req').get('data').get('vkey')
print(b)
复制代码

得到:

C3BDBA226168243D649A67EF479BF6C2F0CA827800422E7590E7F6B3E551DA853E74227E4B6550D9D9F0066124A8F0D3CAFA0499C329D25D
复制代码

就是想要的vkey啦!

结果发现找错了.........,是另外一个字典下的vkey啦


是在这个目录下!

在该目录下可以看到一个叫做purl的,他居然把mid和vkey都直接融合到了一起,太有爱了是不是啊!!那就不用客气直接用就行了!

import requests
from urllib import parse
import json
import urllib
from bs4 import BeautifulSoup
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36'}
url ='https://u.y.qq.com/cgi-bin/musicu.fcg?-=getplaysongvkey8774412539618848&g_tk=360481176&loginUin=3237707674&hostUin=0&format=json&inCharset=utf8&outCharset=utf-8&notice=0&platform=yqq.json&needNewCode=0&data=%7B%22req%22%3A%7B%22module%22%3A%22CDN.SrfCdnDispatchServer%22%2C%22method%22%3A%22GetCdnDispatch%22%2C%22param%22%3A%7B%22guid%22%3A%223719823069%22%2C%22calltype%22%3A0%2C%22userip%22%3A%22%22%7D%7D%2C%22req_0%22%3A%7B%22module%22%3A%22vkey.GetVkeyServer%22%2C%22method%22%3A%22CgiGetVkey%22%2C%22param%22%3A%7B%22guid%22%3A%223719823069%22%2C%22songmid%22%3A%5B%220032TY8H2bEqEP%22%5D%2C%22songtype%22%3A%5B0%5D%2C%22uin%22%3A%223237707674%22%2C%22loginflag%22%3A1%2C%22platform%22%3A%2220%22%7D%7D%2C%22comm%22%3A%7B%22uin%22%3A3237707674%2C%22format%22%3A%22json%22%2C%22ct%22%3A24%2C%22cv%22%3A0%7D%7D'
resp = requests.get(url, headers=headers)
resp.encoding='utf-8'
a = json.loads(resp.text)
b = a.get('req_0').get('data').get('midurlinfo')[0].get('vkey')
c = a.get('req_0').get('data').get('midurlinfo')[0].get('purl')
print(c)
url_2 = 'http://dl.stream.qqmusic.qq.com/'
print(url_2+c)
复制代码

得到结果

C4000032TY8H2bEqEP.m4a?guid=3719823069&vkey=1C4B824609DF35C9D27A89E0F323F5EA5D3CAA55FA70514F6831E2AAC0B27B8D4D6463DE9E2ED7CB2006BCE7A1A08C38034F4A0838B2EABF&uin=0&fromtag=66
http://dl.stream.qqmusic.qq.com/C4000032TY8H2bEqEP.m4a?guid=3719823069&vkey=1C4B824609DF35C9D27A89E0F323F5EA5D3CAA55FA70514F6831E2AAC0B27B8D4D6463DE9E2ED7CB2006BCE7A1A08C38034F4A0838B2EABF&uin=0&fromtag=66
复制代码

打开url,发现是目标音频文件

那下一步就是分析,构造搜索url,找到purl:

https://u.y.qq.com/cgi-bin/musicu.fcg?-=getplaysongvkey2954502924310327&g_tk=5381&loginUin=0&hostUin=0&format=json&inCharset=utf8&outCharset=utf-8&notice=0&platform=yqq.json&needNewCode=0&data=%7B%22req%22%3A%7B%22module%22%3A%22CDN.SrfCdnDispatchServer%22%2C%22method%22%3A%22GetCdnDispatch%22%2C%22param%22%3A%7B%22guid%22%3A%223719823069%22%2C%22calltype%22%3A0%2C%22userip%22%3A%22%22%7D%7D%2C%22req_0%22%3A%7B%22module%22%3A%22vkey.GetVkeyServer%22%2C%22method%22%3A%22CgiGetVkey%22%2C%22param%22%3A%7B%22guid%22%3A%223719823069%22%2C%22songmid%22%3A%5B%22000bSg2U4GcrUi%22%5D%2C%22songtype%22%3A%5B0%5D%2C%22uin%22%3A%220%22%2C%22loginflag%22%3A1%2C%22platform%22%3A%2220%22%7D%7D%2C%22comm%22%3A%7B%22uin%22%3A0%2C%22format%22%3A%22json%22%2C%22ct%22%3A24%2C%22cv%22%3A0%7D%7D
# 看它的参数:
-: getplaysongvkey2954502924310327
g_tk: 5381
loginUin: 0
hostUin: 0
format: json
inCharset: utf8
outCharset: utf-8
notice: 0
platform: yqq.json
needNewCode: 0
data: {"req":{"module":"CDN.SrfCdnDispatchServer","method":"GetCdnDispatch","param":{"guid":"3719823069","calltype":0,"userip":""}},"req_0":{"module":"vkey.GetVkeyServer","method":"CgiGetVkey","param":{"guid":"3719823069","songmid":["000bSg2U4GcrUi"],"songtype":[0],"uin":"0","loginflag":1,"platform":"20"}},"comm":{"uin":0,"format":"json","ct":24,"cv":0}}
复制代码

对比不同的url,发现只有getplaysongvkey和data中的songmid是变化的,考虑怎么得到getplaysongvkey,然后根本找不到55555,后来尝试了一下只改变songmid发现居然也是可以找到vkey的!哈哈哈,价值就是太棒了! 那就直接传入midsong就可以啦,再次Ctrl+F,这个songmid就是之前找到mid,是一样的! 那就简单多了!呵呵呵 那就开始尝试找到每一首歌的url吧!

import requests
from urllib import parse
import json
import urllib
from bs4 import BeautifulSoup
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36'}
url = 'https://c.y.qq.com/soso/fcgi-bin/client_search_cp?ct=24&qqmusic_ver=1298&new_json=1&remoteplace=txt.yqq.song&searchid=64768420417553403&t=0&aggr=1&cr=1&catZhida=1&lossless=0&flag_qc=0&\
 p=1&n=10&{}&g_tk=1531112714&loginUin=3237707674&hostUin=0&format=json&inCharset=utf8&outCharset=utf-8&notice=0&platform=yqq.json&needNewCode=0'
word = '陈奕迅'
dict = {'w': word}
url_data = parse.urlencode(dict)   # 将word编码
resp = requests.get(url.format(url_data), headers=headers)
resp.encoding='utf-8'
a = json.loads(resp.text)
b = a.get('data').get('song').get('list')
#print(b)
songs_list = []
for i in b:
result = {}
result['id'] = i.get('id')
result['title'] = i.get('title')
result['singer'] = i.get('singer')[0].get('title')
result['mid'] = i.get('mid')
songs_list.append(result['mid'])
url_1 ='https://u.y.qq.com/cgi-bin/musicu.fcg?-=getplaysongvkey6989843649964012&g_tk=5381&loginUin=0&hostUin=0&format=json&inCharset=utf8&outCharset=utf-8&notice=0&platform=yqq.json&needNewCode=0&data=%7B%22req%22%3A%7B%22module%22%3A%22CDN.SrfCdnDispatchServer%22%2C%22method%22%3A%22GetCdnDispatch%22%2C%22param%22%3A%7B%22guid%22%3A%223719823069%22%2C%22calltype%22%3A0%2C%22userip%22%3A%22%22%7D%7D%2C%22req_0%22%3A%7B%22module%22%3A%22vkey.GetVkeyServer%22%2C%22method%22%3A%22CgiGetVkey%22%2C%22param%22%3A%7B%22guid%22%3A%223719823069%22%2C%22songmid%22%3A%5B%22{}%22%5D%2C%22songtype%22%3A%5B0%5D%2C%22uin%22%3A%220%22%2C%22loginflag%22%3A1%2C%22platform%22%3A%2220%22%7D%7D%2C%22comm%22%3A%7B%22uin%22%3A0%2C%22format%22%3A%22json%22%2C%22ct%22%3A24%2C%22cv%22%3A0%7D%7D'
mid = result['mid']
resp = requests.get(url_1.format(mid), headers=headers)
resp.encoding='utf-8'
a = json.loads(resp.text)
c = a.get('req_0').get('data').get('midurlinfo')[0].get('purl')
url_2 = 'http://dl.stream.qqmusic.qq.com/'
print(url_2+c)
print('\n')
复制代码

结果:

http://dl.stream.qqmusic.qq.com/C40000481cWs2JgWe0.m4a?guid=3719823069&vkey=D23EC91C0CE54E56315F8F837CDE5CCCE9A830592DC2EB06BF9D782A8811B52B3544863AFE67ACD9BE7DDE4248C57F4BE0644B75D3B36C20&uin=0&fromtag=66


http://dl.stream.qqmusic.qq.com/C400000Hv0Nh0m4ye8.m4a?guid=3719823069&vkey=AF477E991571D7118E921935B26E5009FC000278EC49A7FD02E66113070E28E448FCAA2DD3FFDC8C9704114E018AF87ABA0BD2C8970D1C25&uin=0&fromtag=66


http://dl.stream.qqmusic.qq.com/C400003Idtm746YJCM.m4a?guid=3719823069&vkey=17264244C83309BEA525942B4EBBB99D3D55D47E9A7E319641ACA65CEA9134AEDD40E096BC09A74237121D91DE65EC0587C4FB68336E0CDD&uin=0&fromtag=66


http://dl.stream.qqmusic.qq.com/


http://dl.stream.qqmusic.qq.com/C400002B2EAA3brD5b.m4a?guid=3719823069&vkey=0E30A1E14E29B3698FA67BCF91DC1BDB3B0F6E6C4E56F14975AA6808BB80C0CA87879C1F8EFDB210E8C64B44B831E659D88371169D78D0BA&uin=0&fromtag=66


http://dl.stream.qqmusic.qq.com/C400002BuJzd3ye6uP.m4a?guid=3719823069&vkey=E3D6360F04C79FD8A57D5FFFD206F188D650352B33D8049A8F5D844A6B0B01DB52406A50BA9F0A3D41F38BCB4E705AE4F545916157B3CBCA&uin=0&fromtag=66


http://dl.stream.qqmusic.qq.com/


http://dl.stream.qqmusic.qq.com/C400003wRtRu3w2W62.m4a?guid=3719823069&vkey=93B9233DDA7383C7E478D1028DA61E417861AE5D41B238B0A549906E91608A0CA01649707286E70826299B2F9A0D9605DD490F29907B5A59&uin=0&fromtag=66


http://dl.stream.qqmusic.qq.com/C400003kCfyN2zp9AW.m4a?guid=3719823069&vkey=AF4B635D20A30F81E60077349CBA52C41FAFF9F916E83F76435E6F9A8E7201E52584529D30BA71DC83AF860A14DCAC180F7BDEF5216A6A84&uin=0&fromtag=66
复制代码

成功啦!!!有些没有网址的貌似没版权吧!? 嗯,基本上完成啦!

转载于:https://juejin.im/post/5cab2f91f265da037941524b

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值