前言
总有一些人的声音,你永远忘不掉
正文
有听她唱的歌写作业的习惯,前几天在下了一个全民k歌PC版,发现不能连续播放,那给我气啊。。。于是去官网游了一遍没发现啥,但是发现了歌曲的分享链接
浏览器访问,Network发现一个Response Header中Content-Type为audio/mp4,再看Request Url的Get参数,发现其中有一个名为fname的变量,不难想到它就是file name,那么其值就是歌曲文件的文件名了
Elements
发现了audio标签,src就是刚才的m4a文件,下载下来就是我要的她唱的歌
那么脚本的思路很清晰了:
- 获取html
- 获取src
- 下载m4a
最开始我用的是urllib获取html,后来发现html中没有audio标签
我在浏览器中禁用JavaScript
发现页面不播放音乐,同样没有audio标签,页面许多元素消失,所以应该是JavaScript渲染。
本来想用selenium,但是我的Chrome更新之后没更新驱动,Google一番发现了名为pyppeteer的module,pip安装之后看了一下文档,是异步。
用pyppeteer获取html:
async def getContent(url):
browser = await launch()
page = await browser.newPage()
await page.setJavaScriptEnabled(enabled=True)
await page.goto(url)
log("+", "Get content from " + url)
content = await page.content()
await browser.close()
return content
解析html(我用的是bs4),并下载歌曲
def DownloadM4a(url):
'''
Download .M4a from share link
'''
loop = asyncio.get_event_loop()
task = asyncio.ensure_future(getContent(url))
loop.run_until_complete(task)
soup = BeautifulSoup(task.result(),"html.parser")
singer = soup.find(name = 'a', attrs = { 'class': 'singer_user__name' }).get_text()
singer = singer.replace(" ","").replace('\n',"")
mdir = os.getcwd()
path = mdir + "\\Music\\"
if not os.path.exists(path):
log("+", "Create a new folder named 'Music'")
os.mkdir(path)
path += singer + "\\"
if not os.path.exists(path):
log("+","Create a new folder for singer named " + singer)
os.mkdir(path)
name = soup.audio['meta'] + ".m4a"
uri = soup.audio['src']
#Download
if os.path.exists(path + name):
log("-", " The music which is " + name + " has existed")
exit(1)
log("+","Downloading " + name + " from " + uri + " into " + path)
music = requests.get(uri)
with open(path + name, 'wb') as fp:
fp.write(music.content)
因为她唱的歌也不是一两首,所以一首一首下还是很麻烦,毕竟我这么 懒 的人(bushi,逃)
所以我又去访问了她的主页,f12
发现了一个ul,ul标签下就有各歌曲的class为mod_playlist__cover的a链接
我还发现了查看更多的a标签
但是点击之后服务器返回的数组为空
我一开始以为是游客身份访问的缘故,但是我登录之后还是无济于事。后来在移动端试了才发现原来是urlScheme,引导打开全民k歌
气人!没办法,只能先下载主页的8首歌曲了
def DownloadM4aEx(url):
'''
Download .M4a from user`s index
'''
loop = asyncio.get_event_loop()
task = asyncio.ensure_future(getContent(url))
loop.run_until_complete(task)
soup = BeautifulSoup(task.result(),"html.parser")
links = soup.findAll(name = 'a', attrs = { 'class': 'mod_playlist__cover'})
for link in links:
DownloadM4a(link['href'])
根据主页连接和分享连接的参数,区分一下
def main(url):
if url:
if "personal?uid=" in url:
try:
DownloadM4aEx(url)
except Exception as e:
log("-", str(e))
elif "play?s=" in url:
try:
DownloadM4a(url)
except Exception as e:
log("-", str(e))
试试效果
Github
源码尽在Github