音乐播放器
简介:系统组成
具备:
pyqt5前端GUI模块
基于用户(user)歌曲推荐系统模块
用户安全管理、歌曲管理、收藏管理、等模块
摘要
-
基于pyqt5 作为前端UI界面的开发语言
-
数据库数据格式为csv文本通过读取为list列表,进行索引操作 --可改进
-
音乐曲库为网易云华语部分歌单(5w+非重复歌曲) (切分数据集,提升速度)
总曲库爬取了网易云音乐中80w首歌400w+次收藏的歌单,推荐数据从中切分为华语部分歌单
-
播放方式为 网络MP3资源(获取网易云外链+id) + 本地MP3资源 --pyqt内的媒体播放模块(可以支持网络mp3资源的播放)
PyQt5.QtMultimedia import QMediaPlayer, QMediaContent (Anaconda虚拟环境存在读不到的bug,升降pyqt版本解决)
-
推荐系统基于用户的音乐收藏数据,实现基于用户协同过滤的音乐推荐列表
使用jaccard算法进行相似度计算(二值离散)(倒查表技术)
-
收藏管理具有增加和删除功能,所有数据在playlist_1.csv 和 playlist_2.csv中
开发心得
- 数据结构的选择很重要(数据存取):
json dict set list touple
pandas DataFrame pd.DataFrame()
numpy NDarray np.array()
例:读取playlist.csv文件用DataFrame格式不利于增删改查操作(扩列)
有的人收藏歌曲只有3首,有的人却有1000多首,过于稀疏
所以采用行读取,变成一个个列表list的形式进行操作,便于索引和读写,最后再写回去 --分表能显著提升效率
还有一种方案,采用二进制,将文件数据以字典形式保存
- 每一个pyqt窗口都是一个类,且类之间平等:
可以互相调用的前提是在一起,ui与功能代码分离
通过继承关系进行ui资源初始化,将所有的窗口类定义在一个py文件下,类之间的属性要共用,就用global xxx
例:登陆界面(class ui_1)的用户输入数据要传给播放界面(class ui_2)
- MP3数据格式的解析:
用eyeD3实现歌曲元信息的增加、删除、修改等操作
- 为了便于数据的个性化,每个用户都有自身的UID
jupyter 代码链接¶
本机代码,github支持
一、推荐系统使用说明
-
用户数据读取,收藏模块的数据已经闭合,所以直接读取数据库playlist.csv内所有数据即可
-
将数据转化为二值离散型稀疏矩阵(行:用户UID,列:歌曲UID,值:(0:未收藏 1:已收藏)
歌曲根据风格可以切分成很多张表,但这里根据用户去推荐,所以歌曲列就不切分形成多张表了。
具体操作为:
读取原始歌单爬虫数据(json),挑选需要的数据项(歌单数据[id,name] , 歌单ID对应下的歌曲ID列表[1,2,3,4…] ,歌单热度)
(如图)
每个歌单的格式
里面包含非常多的信息(风格,歌手,歌曲播放次数,歌曲时长,歌曲发行时间…)
`
{
"result": {
"id": 111450065,
"status": 0,
"commentThreadId": "A_PL_0_111450065",
"trackCount": 120,
"updateTime": 1460164523907,
"commentCount": 227,
"ordered": true,
"anonimous": false,
"highQuality": false,
"subscribers": [],
"playCount": 687070,
"trackNumberUpdateTime": 1460164523907,
"createTime": 1443528317662,
"name": "带本书去旅行吧,人生最美好的时光在路上。",
"cloudTrackCount": 0,
"shareCount": 149,
"adType": 0,
"trackUpdateTime": 1494134249465,
"userId": 39256799,
"coverImgId": 3359008023885470,
"coverImgUrl": "http://p1.music.126.net/2ZFcuSJ6STR8WgzkIi2U-Q==/3359008023885470.jpg",
"artists": null,
"newImported": false,
"subscribed": false,
"privacy": 0,
"specialType": 0,
"description": "现在是一年中最美好的时节,世界上很多地方都不冷不热,有湛蓝的天空和清冽的空气,正是出游的好时光。长假将至,你是不是已经收拾行装准备出发了?行前焦虑症中把衣服、洗漱用品、充电器之类东西忙忙碌碌地丢进箱子,打进背包的时候,我打赌你肯定会留个位置给一位好朋友:书。不是吗?不管是打发时间,小读怡情,还是为了做好攻略备不时之需,亦或是为了小小地装上一把,你都得有一本书傍身呀。读大仲马,我是复仇的伯爵;读柯南道尔,我穿梭在雾都的暗夜;读村上春树,我是寻羊的冒险者;读马尔克斯,目睹百年家族兴衰;读三毛,让灵魂在撒哈拉流浪;读老舍,嗅着老北京的气息;读海茵莱茵,于科幻狂流遨游;读卡夫卡,在城堡中审判……读书的孩子不会孤单,读书的孩子永远幸福。",
"subscribedCount": 10882,
"totalDuration": 0,
"tags": [
"旅行",
"钢琴",
"安静"]
"creator": {
"followed": false,
"remarkName": null,
"expertTags": [
"古典",
"民谣",
"华语"
],
"userId": 39256799,
"authority": 0,
"userType": 0,
"gender": 1,
"backgroundImgId": 3427177752524551,
"city": 360600,
"mutual": false,
"avatarUrl": "http://p1.music.126.net/TLRTrJpOM5lr68qJv1IyGQ==/1400777825738419.jpg",
"avatarImgIdStr": "1400777825738419",
"detailDescription": "",
"province": 360000,
"description": "",
"birthday": 637516800000,
"nickname": "有梦人生不觉寒",
"vipType": 0,
"avatarImgId": 1400777825738419,
"defaultAvatar": false,
"djStatus": 0,
"accountStatus": 0,
"backgroundImgIdStr": "3427177752524551",
"backgroundUrl": "http://p1.music.126.net/LS96S_6VP9Hm7-T447-X0g==/3427177752524551.jpg",
"signature": "漫无目的的乱听,听着,听着,竟然灵魂出窍了。更多精品音乐美图分享请加我微信hu272367751。微信是我的精神家园,有我最真诚的分享。",
"authStatus": 0}
"tracks": [{歌曲1},{歌曲2}, ...]
}
}
每首歌曲的格式
{
"id": 29738501,
"name": "跟着你到天边 钢琴版",
"duration": 174001,
"hearTime": 0,
"commentThreadId": "R_SO_4_29738501",
"score": 40,
"mvid": 0,
"hMusic": null,
"disc": "",
"fee": 0,
"no": 1,
"rtUrl": null,
"ringtone": null,
"rtUrls": [],
"rurl": null,
"status": 0,
"ftype": 0,
"mp3Url": "http://m2.music.126.net/vrVa20wHs8iIe0G8Oe7I9Q==/3222668581877701.mp3",
"audition": null,
"playedNum": 0,
"copyrightId": 0,
"rtype": 0,
"crbt": null,
"popularity": 40,
"dayPlays": 0,
"alias": [],
"copyFrom": "",
"position": 1,
"starred": false,,
"starredNum": 0
"bMusic": {
"name": "跟着你到天边 钢琴版",
"extension": "mp3",
"volumeDelta": 0.0553125,
"sr": 44100,
"dfsId": 3222668581877701,
"playTime": 174001,
"bitrate": 96000,
"id": 52423394,
"size": 2089713
},
"lMusic": {
"name": "跟着你到天边 钢琴版",
"extension": "mp3",
"volumeDelta": 0.0553125,
"sr": 44100,
"dfsId": 3222668581877701,
"playTime": 174001,
"bitrate": 96000,
"id": 52423394,
"size": 2089713
},
"mMusic": {
"name": "跟着你到天边 钢琴版",
"extension": "mp3",
"volumeDelta": -0.000265076,
"sr": 44100,
"dfsId": 3222668581877702,
"playTime": 174001,
"bitrate": 128000,
"id": 52423395,
"size": 2785510
},
"artists": [
{
"img1v1Url": "http://p1.music.126.net/6y-UleORITEDbvrOLV0Q8A==/5639395138885805.jpg",
"name": "群星",
"briefDesc": "",
"albumSize": 0,
"img1v1Id": 0,
"musicSize": 0,
"alias": [],
"picId": 0,
"picUrl": "http://p1.music.126.net/6y-UleORITEDbvrOLV0Q8A==/5639395138885805.jpg",
"trans": "",
"id": 122455
}
],
"album": {
"id": 3054006,
"status": 2,
"type": null,
"tags": "",
"size": 69,
"blurPicUrl": "http://p1.music.126.net/2XLMVZhzVZCOunaRCOQ7Bg==/3274345629219531.jpg",
"copyrightId": 0,
"name": "热门华语248",
"companyId": 0,
"songs": [],
"description": "",
"pic": 3274345629219531,
"commentThreadId": "R_AL_3_3054006",
"publishTime": 1388505600004,
"briefDesc": "",
"company": "",
"picId": 3274345629219531,
"alias": [],
"picUrl": "http://p1.music.126.net/2XLMVZhzVZCOunaRCOQ7Bg==/3274345629219531.jpg",
"artists": [
{
"img1v1Url": "http://p1.music.126.net/6y-UleORITEDbvrOLV0Q8A==/5639395138885805.jpg",
"name": "群星",
"briefDesc": "",
"albumSize": 0,
"img1v1Id": 0,
"musicSize": 0,
"alias": [],
"picId": 0,
"picUrl": "http://p1.music.126.net/6y-UleORITEDbvrOLV0Q8A==/5639395138885805.jpg",
"trans": "",
"id": 122455
}
],
"artist": {
"img1v1Url": "http://p1.music.126.net/6y-UleORITEDbvrOLV0Q8A==/5639395138885805.jpg",
"name": "",
"briefDesc": "",
"albumSize": 0,
"img1v1Id": 0,
"musicSize": 0,
"alias": [],
"picId": 0,
"picUrl": "http://p1.music.126.net/6y-UleORITEDbvrOLV0Q8A==/5639395138885805.jpg",
"trans": "",
"id": 0
}
}
}
`
推荐库 surprise lightfm 看下他们的核心代码,然后再来写自己的,核心是数据处理(数据),相似度计算(算法),数据评估部分(模型预测和评估)
import surprise 用这个库算sort函数会将计算结果如0.99978四舍五入为1,需要将计算结果*100000后使用
import lightfm
project = offline modelling + online predictio
1)offline python脚本语言
2)online 效率至上 C++/Java
原则:能离线预先算好的,都离线算好,最优的形式:线上是一个K-V字典
1)针对用户推荐 网易云音乐(每日30首歌/7首歌)¶
2)针对歌曲 在你听某首歌的时候,找“相似歌曲”
从json文件中提取最终形成playlist.csv文件:(如图)
每行包括:歌单段(名称##标签##UID##热度) 歌曲段(SID:::歌名:::作者:::歌曲评分)
1.数据处理字典化处理
将每行数据处理成如下格式,便于字典化处理(第一列:歌单UID,第二列:歌曲SID,第三列:收藏状态(1),第四列:时间戳)
可扩展为redis k-v数据库
popular_music_suprise_format.txt
用二进制形式存储,一个用户一行记录,读取速度又上了一个台阶
2.二值离散型稀疏矩阵生成
根据popular_music_suprise_format.txt 的数据格式,形成字典K-V形式,获取行(歌曲列表)列(歌单UID)
初始化一个长宽分别是(歌曲列表)(歌单UID)长度的全零矩阵。最后还需要根据用户是否收藏了该歌曲,即根据之前做的K-V字典,
将歌单中含有的歌设置为1。这样一个可以进行矩阵运算的稀疏矩阵就正式生成了,然后将它保存为csv格式为下一步计算做准备。
a1__生成稀疏矩阵.py 代码如下:
import numpy as np
import pandas as pd
def create_ndarrary():
file = r'D:\music_recommdation\working_data\data\output\popular\popular_music_suprise_format.txt'
with open(file) as f:
all = f.readlines()
# lines = all[1:] #去除首行的表头
lines=all
aDict = {}
sample = [] #定义样本存储变量
for line in lines:
# details = line.split('\t')[:3] #取文件每行的前三列
details = line.split(',')[:3]
key1 = details[0]
if key1 not in aDict: #若key1不在定义字典中,则进行记录,否则不记录
aDict[key1] = {}
key2 = details[1]
aDict[key1][key2] = details[2]
else:
key2 = details[1]
aDict[key1][key2] = details[2]
sample.append(key2) #记录样本
h=list(aDict.keys())
#去重!!!!!!!!!!!!!
sample=list(set(sample))
l=sample
data=l
print(len(l))
print(len(h))
df1 = pd.DataFrame(0, index=h, columns=l)
try:
for k,v in aDict.items():
for k2,v2 in v.items():
df1.loc[str(k),str(k2)]=1 #赋值
except Exception as e:
print(e)
print(df1)
df1.to_csv('D:\music_recommdation\working_data\cse\zzzj.csv')
f = open('h.txt','w')
for i in h:
f.write(i+'\n')
f.close()
if __name__ == '__main__':
create_ndarrary()
3.计算对应的jaccard相似度
有了矩阵内容就可以开始计算对应的jaccard相似度了
案例:
from scipy.spatial.distance import cdist
x=np.array([1,1,1,0,0,0,0,0,0,0])
y=np.array([0,1,1,0,0,0,1,0,0,1])
sim1 = cdist(x.reshape(1, -1),y.reshape(1, -1),metric='jaccard')[0][0]
print('使用cdist jaccard求解的jaccard距离:',sim1)
a2_jaccard计算.py 时间与内存优化版–(利用列表读取、因为dataframe启动加载耗时更久)
import pandas as pd
import numpy as np
from scipy.spatial.distance import cdist
def use_UID_get_preid(UID):
UID = UID
path_2 = 'D:\music_recommdation\working_data\data\playlist_1.csv'
f = open(path_2, 'r+', encoding='utf-8')
preid = -1 # 矩阵csv开头是1 -1 是内部文件位置
for i in f:
contents = i.strip().split("\t")
name, tags, playlist_id, subscribed_count = contents[0].split("##")
preid += 1
if UID == playlist_id:
print(preid)
f.close()
return preid
preid = use_UID_get_preid('95441924')
k = 10
f2= open('D:\music_recommdation\working_data\cse\zzzj.csv','r', encoding='utf-8')
j=0 #1081
list_y =[]
# print(f2)
for i in f2:
# if k<=1081:
if j==preid-3:
print(i.strip('\n').split(','))
list_y =i.strip('\n').split(',')[1:]
j += 1
else:
pass
# print(i)
# print(i.strip('\n').split(',')[1:])
j+=1
f2.close()
f= open('D:\music_recommdation\working_data\cse\zzzj.csv','r', encoding='utf-8')
print(list_y)
list_x = []
list_sim=[]
dic_sim={}
m=0 #1081
ll=0
for i in f:
# if k<=1081:
if m==0:
m += 1
# print(m)
pass
else:
# print(m)
# print(i)
# print(i.strip('\n').split(',')[1:])
list_x=i.strip('\n').split(',')[1:]
# print(list_x)
# print(np.array(list(df.iloc[i])))
x = np.array(list_x)
y = np.array(list_y)
sim1 = cdist(x.reshape(1, -1), y.reshape(1, -1), metric='jaccard')[0][0]
list_sim.append(float(sim1))
dic_sim[ll] = float(sim1)
# m += 1
# m += 1
ll+=1
# print(list_sim)
# print(dic_sim)
f.close()
# print(f.readline().strip('\n').split(',')[1:])
def get_sim(x):
return list_sim[x]
#降序
list_sim.sort()
print(list_sim)
id_list_get =[]
for i in range(1,k+1):
#根据字典的值获取它的键
print(list(dic_sim.keys())[list(dic_sim.values()).index(list_sim[i])])
id_list_get.append(list(dic_sim.keys())[list(dic_sim.values()).index(list_sim[i])])
print('id_list_get',id_list_get)
list_to_id = []
id_list = []
f = open('h.txt','r')
for i in f:
list_to_id.append(i)
print(len(list_to_id))
for i in id_list_get:
id_list.append(list_to_id[i])
f.close()
f2 = open('id_list.txt','w')
for i in id_list:
f2.write(i)
f2.close()
print(id_list)
相似度由高到底排序:
去掉自身后,取前10(k=10)相识度的歌单: (获取歌曲位于数据库中的行号和对应的UID)
4.生成推荐结果文件
获取歌曲位于数据库中的行号和对应的UID后,根据之前的字典取得对应歌单名和其下对应的歌曲,根据用户UID去对应保存为name.txt 和id.txt
import os
import pickle
def use_UID_get_preid(UID):
# UID = '1000002'
UID = UID
path_2 = 'D:\music_recommdation\working_data\data\playlist_1.csv'
f = open(path_2, 'r+', encoding='utf-8')
preid = -1 # 矩阵csv开头是1 -1 是内部文件位置
for i in f:
contents = i.strip().split("\t")
name, tags, playlist_id, subscribed_count = contents[0].split("##")
preid += 1
if UID == playlist_id:
print(preid)
f.close()
return preid
def playid_name_2to(UID):
#显示对应歌单名
# 重建歌单id到歌单名的映射字典
id_name_dic = pickle.load(open("../data/output/popular/popular_playlist.pkl","rb"))
#歌曲名-->歌曲id
song_id_name_dic = pickle.load(open("../data/output/popular/popular_song.pkl","rb"))
# print("加载歌曲id到歌曲名的映射字典完成...",song_id_name_dic)
id_song_name_dic = {}
for songlist_id in song_id_name_dic:
id_song_name_dic[song_id_name_dic[songlist_id]] = songlist_id
# print("加载歌单名到歌单id的映射字典完成...",id_song_name_dic)
# 重建歌单名到歌单id的映射字典
name_id_dic = {}
for playlist_id in id_name_dic:
name_id_dic[id_name_dic[playlist_id]] = playlist_id
# print("加载歌单名到歌单id的映射字典完成...",name_id_dic)
# ==============================================================================================
f = open('id_list.txt','r')
name_song_list = []
for i in f:
print(i.strip('\n'))
name_song_list.append(id_name_dic[i.strip('\n')])
print(name_song_list)
list_music = []
for playlist in name_song_list:
print(playlist, name_id_dic[playlist], )
list_music.append(name_id_dic[playlist])
music_song_id_list_2 = []
music_song_id_list = []
in_file='../data/output/popular/popular_music_suprise_format.txt'
for line in open(in_file, encoding='utf-8'):
music_id, song_id, artist, popularity = line.split(",")
if music_id in list_music:
music_song_id_list.append(song_id)
k=0
music_song_id_list_2=list(music_song_id_list) #保存id 歌曲id列表
print(music_song_id_list_2)
for i in music_song_id_list:
try:
if i in song_id_name_dic.keys():
music_song_id_list[k]=song_id_name_dic[i]
k=k+1
except:
print("没名字")
print(music_song_id_list) #id-->name 歌名列表
# ========================================================
path = r"D:/music_recommdation/working_data/data/recommendation_list/"+UID
print(os.path.exists(path))
if not os.path.exists(path):
os.mkdir(r"D:/music_recommdation/working_data/data/recommendation_list/"+UID)
file = open(r"D:/music_recommdation/working_data/data/recommendation_list/" + UID + "/name.txt", "w+",encoding='utf-8')
file2 = open(r"D:/music_recommdation/working_data/data/recommendation_list/" + UID + "/id.txt", "w+",encoding='utf-8')
file.close()
file2.close()
file = open(r"D:/music_recommdation/working_data/data/recommendation_list/"+UID+"/name.txt", "w+", encoding='utf-8')
for line in music_song_id_list:
file.write(line+'\n')
file.close()
file2 = open(r"D:/music_recommdation/working_data/data/recommendation_list/"+UID+"/id.txt", "w+", encoding='utf-8')
for line in music_song_id_list_2:
file2.write(line+'\n')
file2.close()
for i in music_song_id_list_2:
if i in id_song_name_dic.keys():
music_song_id_list[k]=id_song_name_dic[i]
print(music_song_id_list_2)
if __name__ == '__main__':
# play_id = use_UID_get_preid('1000002')
# print(play_id)
UID = '95441924'
playid_name_2to(UID)
生成结果如下:到时候音乐播放系统的推荐内容就从这里加载。
5:推荐系统的评估
下图为playlist.csv中用户UID:1633933的收藏歌曲
如下为推荐系统给出的推荐内容:可以看出(哦漏)被准确推荐给了用户
1.playlist.csv -->
popular_music_suprise_format.txt
# 解析成userid itemid rating timestamp行格式
import json
import sys
def is_null(s):
return len(s.split(",")) > 2
def parse_song_info(song_info):
try:
# song_id, name, artist, popularity = song_info.split("::")
song_id, name, artist, popularity = song_info.split(":::") #按照:::分开元素
# return ",".join([song_id, name, artist, popularity])
return ",".join([song_id, '1', '1300000']) #给每个元素直接插入‘,’,变成str
# return ",".join([song_id, popularity, '1300000'])
except Exception as e:
# print e
# print song_info
return ""
def parse_playlist_line(in_line):
try:
contents = in_line.strip().split("\t")
# print(contents)
name, tags, playlist_id, subscribed_count = contents[0].split("##")
songs_info = map(lambda x: playlist_id + "," + parse_song_info(x), contents[1:]) #lambda 元素: 函数 ,序列
songs_info = filter(is_null, songs_info) #filter(函数,序列) 判断序列是否满足函数结果为TRUE,是TRUE留下,否则从序列中删去
return "\n".join(songs_info)
except Exception as e:
print(e)
return False
def parse_file(in_file, out_file):
out = open(out_file, 'w')
for line in open(in_file, encoding='utf-8'):
result = parse_playlist_line(line)
if (result):
out.write(result.strip() + "\n")
out.close()
path = "./data/output/popular/"
parse_file("./data/playlist.csv", path+"popular_music_suprise_format.txt")
2.playlist.csv -->
popular_playlist.pkl #从歌单id到歌单名称的映射字典
popular_song.pkl #从歌曲id到歌曲名称的映射字典
import pickle
import sys
"""
歌单id-->歌单名
歌曲id-->歌曲名
歌单id-->对应所有歌曲id序列
"""
path = "./data/output/popular/"
def parse_playlist_get_info(in_line, playlist_dic, song_dic):
contents = in_line.strip().split("\t")
name, tags, playlist_id, subscribed_count = contents[0].split("##")
#创建歌单id字典映射
playlist_dic[playlist_id] = name
for song in contents[1:]:
try:
song_id, song_name, artist, popularity = song.split(":::")
# 创建歌曲id字典映射
song_dic[song_id] = song_name+"\t"+artist
except:
print("song format error")
print(song+"\n")
def parse_file(in_file, out_playlist, out_song):
#从歌单id到歌单名称的映射字典
playlist_dic = {}
#从歌曲id到歌曲名称的映射字典
song_dic = {}
for line in open(in_file, encoding='utf-8'):
parse_playlist_get_info(line, playlist_dic, song_dic)
#把映射字典保存在二进制文件中
print(playlist_dic)
pickle.dump(playlist_dic, open(out_playlist,"wb"))
#可以通过 playlist_dic = pickle.load(open("playlist.pkl","rb"))重新载入
pickle.dump(song_dic, open(out_song,"wb"))
parse_file("./data/playlist.csv", path+"popular_playlist.pkl", path+"popular_song.pkl")
song_id, song_name, artist, popularity = song.split(":::")
# 创建歌曲id字典映射
song_dic[song_id] = song_name+"\t"+artist
except:
print("song format error")
print(song+"\n")
def parse_file(in_file, out_playlist, out_song):
#从歌单id到歌单名称的映射字典
playlist_dic = {}
#从歌曲id到歌曲名称的映射字典
song_dic = {}
for line in open(in_file, encoding=‘utf-8’):
parse_playlist_get_info(line, playlist_dic, song_dic)
#把映射字典保存在二进制文件中
print(playlist_dic)
pickle.dump(playlist_dic, open(out_playlist,“wb”))
#可以通过 playlist_dic = pickle.load(open(“playlist.pkl”,“rb”))重新载入
pickle.dump(song_dic, open(out_song,“wb”))
parse_file(“./data/playlist.csv”, path+“popular_playlist.pkl”, path+“popular_song.pkl”)