目标网址:
https://music.163.com/#/artist?id=××××××××
与排行榜的一样,类似的,只需要替换url
例如:排行榜:url='.../discover/toplist?id=××××××××'
歌手热门单曲:url='.../artist?id=××××××××'
- 防止创建title时字符报错导致中断
- 文件名按歌手自定义,根据
<h2>
标签的内容动态生成 filename - 多线程下载 URL列表 中的音乐数据,使用 Python 的
concurrent.futures
模块
一、防止字符报错导致中断
出现报错:
FileNotFoundError: [Errno 2] No such file or directory: ‘music\霄玉若惜【仙剑奇侠传四(玄霄/夙玉)】.mp3’
原因分析:
出现无法写入的字符/
了
这些字符在Windows操作系统中具有特殊的意义或用途,如果允许在文件名中使用,可能会导致系统无法正确处理文件或引发错误。具体来说:
- 反斜杠(\)和正斜杠(/):这些字符在Windows路径中用于分隔目录和文件名。如果在文件名中使用这些字符,可能会导致路径解析错误。
- 冒号(:):用于表示驱动器的分隔符(例如C:),因此不能在文件名中使用。
- 星号(*)和问号(?):这些字符在Windows中用作通配符,用于匹配文件名模式。在文件名中使用它们可能会导致意外的匹配或过滤错误。
- 双引号("):在命令行中,双引号用于标识参数,如果文件名中包含双引号,可能会导致命令解析错误。
- 尖括号(< >)和竖线(|):这些字符在命令行中有特定的用途,如输入/输出重定向和管道操作。在文件名中使用它们可能会干扰这些操作。
解决方法:
在文件名创建之前使用正则表达式来替换掉不允许的字符
比如:替换/
当title中出现\/:*?"<>|
任意字符时,将其替换为一个空格
实现代码:
import os
import re
import requests
filename = 'music\\'
if not os.path.exists(filename):
os.mkdir(filename)
url = 'https://music.163.com/artist?id=××××××××'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'
}
response = requests.get(url=url, headers=headers)
html_data = re.findall('<li><a href="/song\?id=(\d+)">(.*?)</a>', response.text) # 使用正则表达式匹配歌曲ID和标题
invalid_chars_pattern = r'[\/:*?"<>|]' # 定义需要替换的非法字符
for num_id, title in html_data:
title = re.sub(invalid_chars_pattern, ' ', title) # 将非法字符替换为空格
music_url = f'https://music.163.com/song/media/outer/url?id={num_id}.mp3' # 创建音乐文件的URL
music_content = requests.get(url=music_url, headers=headers).content
with open(os.path.join(filename, title.strip() + '.mp3'), mode='wb') as f:
f.write(music_content)
print(num_id, title)
替换url,去除url中间的#/
运行结果:
二、代码优化
减少冗余 结构清晰 功能明确
从而提高代码的可读性,使代码易于维护和扩展。
- 功能划分: 函数划分使得每个部分的功能更明确。
- create_music_folder(): 用于创建音乐保存文件夹。
- get_music_data(): 获取音乐数据,返回歌曲ID和标题的列表。
- sanitize_title(): 对标题进行清理,替换非法字符。
- download_music(): 下载指定歌曲并保存。
- main(): 程序的主入口。
- 减少全局变量的使用: 避免使用过多的全局变量,通过参数传递所需的信息。
- 字符串格式化: 使用f-string(如f"{value}")提高代码的可读性。
import os
import re
import requests
# 创建音乐保存文件夹
def create_music_folder(folder_name='music1'): #def create_music_folder(folder_name='music'):
if not os.path.exists(folder_name):
os.mkdir(folder_name)
# 获取数据 返回歌曲ID和标题的列表
def get_music_data(url, headers):
response = requests.get(url, headers=headers)
return re.findall(r'<li><a href="/song\?id=(\d+)">(.*?)</a>', response.text)
# 标题清理 替换非法字符
def sanitize_title(title):
return re.sub(r'[\/:*?"<>|]', ' ', title)
# 下载歌曲 并保存
def download_music(num_id, title, folder_name='music1'): #def download_music(num_id, title, folder_name='music'):
music_url = f'https://music.163.com/song/media/outer/url?id={num_id}.mp3'
music_content = requests.get(music_url).content
file_path = os.path.join(folder_name, f"{sanitize_title(title).strip()}.mp3")
with open(file_path, 'wb') as f:
f.write(music_content)
# 程序的主入口
def main():
create_music_folder()
url = 'https://music.163.com/artist?id=××××××××' # url = 'https://music.163.com/artist?id=12480034'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'
}
music_data = get_music_data(url, headers)
for num_id, title in music_data:
download_music(num_id, title)
if __name__ == "__main__":
main()
三、文件名按歌手自定义
函数 get_artist_name()
从页面的 HTML 中提取 <h2>
标签里的内容作为文件夹名称。
修改id即可
import os
import re
import requests
# create a folder to save the music
def create_music_folder(folder_name):
if not os.path.exists(folder_name):
os.mkdir(folder_name)
# extracts a valid artist name or list name from the tag <h2>
def get_artist_name(html):
match = re.search(r'<h2[^>]*>(.*?)</h2>', html)
if match:
return match.group(1).strip()
return "Unknown Artist"
# get data, return a list of song ids and titles
def get_music_data(url, headers):
response = requests.get(url, headers=headers)
return re.findall(r'<li><a href="/song\?id=(\d+)">(.*?)</a>', response.text), response.text
# title cleanup, replaces illegal characters
def sanitize_title(title):
return re.sub(r'[\/:*?"<>|]', ' ', title)
# download the song and save it
def download_music(num_id, title, folder_name):
music_url = f'https://music.163.com/song/media/outer/url?id={num_id}.mp3'
music_content = requests.get(music_url).content
file_path = os.path.join(folder_name, f"{sanitize_title(title).strip()}.mp3")
with open(file_path, 'wb') as f:
f.write(music_content)
# main entry point
def main():
url = 'https://music.163.com/artist?id=××××' # can be changed to the appropriate URL
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'
}
# get music data and html content
music_data, html = get_music_data(url, headers)
# extract the artist name or list name as the folder name
artist_name = get_artist_name(html)
create_music_folder(artist_name)
# download
for num_id, title in music_data:
download_music(num_id, title, artist_name)
if __name__ == "__main__":
main()
四、多线程下载 URL列表 中的音乐数据
可以使用 Python 的 concurrent.futures
模块。传入一个 URL 列表,并使用 ThreadPoolExecutor
来实现多线程下载。
import os
import re
import requests
from concurrent.futures import ThreadPoolExecutor
# create a folder to save the music
def create_music_folder(folder_name):
if not os.path.exists(folder_name):
os.mkdir(folder_name)
# extracts a valid artist name or list name from the tag <h2>
def get_artist_name(html):
match = re.search(r'<h2[^>]*>(.*?)</h2>', html)
if match:
return match.group(1).strip()
return "Unknown Artist"
# get data, return a list of song ids and titles
def get_music_data(url, headers):
response = requests.get(url, headers=headers)
return re.findall(r'<li><a href="/song\?id=(\d+)">(.*?)</a>', response.text), response.text
# title cleanup, replaces illegal characters
def sanitize_title(title):
return re.sub(r'[\/:*?"<>|]', ' ', title)
# download the song and save it
def download_music(num_id, title, folder_name):
music_url = f'https://music.163.com/song/media/outer/url?id={num_id}.mp3'
music_content = requests.get(music_url).content
file_path = os.path.join(folder_name, f"{sanitize_title(title).strip()}.mp3")
with open(file_path, 'wb') as f:
f.write(music_content)
# download songs
def download_artist_music(url, headers):
# get music data and html content
music_data, html = get_music_data(url, headers)
# extract the artist name or list name as the folder name
artist_name = get_artist_name(html)
create_music_folder(artist_name)
# download
for num_id, title in music_data:
download_music(num_id, title, artist_name)
def main(urls):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36'
}
# multi-threading download music
with ThreadPoolExecutor() as executor:
executor.map(lambda url: download_artist_music(url, headers), urls)
if __name__ == "__main__":
urls = [
'https://music.163.com/artist?id=××××××××',
'https://music.163.com/artist?id=××××' # add more URLs
]
main(urls)