spotify音乐下载_使用Python和R对音乐进行聚类以在Spotify上创建播放列表。

最新推荐文章于 2024-09-01 08:26:49 发布

weixin_26752765

最新推荐文章于 2024-09-01 08:26:49 发布

阅读量1.2k

点赞数

文章标签： python linux

原文链接：https://towardsdatascience.com/clustering-music-to-create-your-personal-playlists-on-spotify-using-python-and-k-means-a39c4158589a

版权

该博客介绍了如何通过Python和R对音乐数据进行聚类分析，以在Spotify上创建个人化的播放列表。作者详细讲解了从Spotify下载音乐数据的过程，并应用K-Means算法来对歌曲进行智能分组。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

spotify音乐下载

Spotify is one of the most famous Music Platforms to discover new music. The company uses a lot of different algorithms to recommend the user new music based on their music preferences and most of these recommendations are located in Playlists. These Playlists are created for different users based on a wide diversity of music genres and even Spotify is capable to recommend new music based in moods.

小号 potify是最有名的音乐平台之一，以发现新的音乐。该公司使用许多不同的算法根据用户的音乐喜好向他们推荐新音乐，并且其中大多数推荐都位于播放列表中。这些播放列表是根据各种各样的音乐类型为不同的用户创建的，甚至Spotify也能够根据心情推荐新音乐。

Music has been in my daily routine during all my life, It’s a kind of drug that I need when I’m doing housework, working at the office, walking the dog, workouts and so on. I have a lot of music on Spotify that I always wanted to separate according to the similarities of the songs and save them into different playlists. Fortunately, with a little knowledge of Machine Learning Algorithms and Python, I could achieve that goal !!!.

在我的一生中，音乐一直是我的日常生活，这是我做家务，在办公室工作，walking狗，锻炼身体时需要的一种药物。我在Spotify上有很多音乐，我一直想根据歌曲的相似性进行分离，然后将它们保存到不同的播放列表中。幸运的是，只要对机器学习算法和Python有所了解，我就能实现这一目标！

So to do that, first I will list the tools required and some definitions of the Spotify Audio Features that I will use for built the Clustering model.

为此，首先，我将列出所需的工具以及将用于构建集群模型的Spotify音频功能的一些定义。

工具： (Tools:)

Pandas and Numpy for data analysis.
Pandas和Numpy用于数据分析。
Sklearn to build the Machine Learning model.
Sklearn建立机器学习模型。
Spotipy Python Library (click here for more info).
Spotipy Python库( 单击此处了解更多信息)。
Spotify Credentials to access Api Database and Playlists Modify (click here for more info).
Spotify凭据可访问Api数据库和播放列表修改( 单击此处了解更多信息)。

Spotify音频功能： (Spotify Audio Features:)

Spotify uses a series of different features to classify the tracks. I copy/paste the information from the Spotify Webpage.

Spotify使用一系列不同的功能对曲目进行分类。我从Spotify网页复制/粘贴信息。

Acousticness: A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.
声学：轨道是否声学的置信度，范围为0.0到1.0。 1.0表示音轨是声学的高置信度。
Danceability: Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.
舞蹈性：舞蹈性是根据节奏，节奏稳定性，拍子强度和整体规律性等音乐元素的组合来描述轨道适合跳舞的方式。值0.0最低可跳舞，而1.0最高可跳舞。
Energy: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.
能量：能量是从0.0到1.0的量度，表示强度和活动的感知量度。通常，充满活力的曲目会感觉快速，响亮且嘈杂。例如，死亡金属具有较高的能量，而巴赫前奏的得分则较低。有助于此属性的感知特征包括动态范围，感知的响度，音色，发作率和一般熵。
Instrumentalness: Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0.
器乐性：预测曲目是否不包含人声。在这种情况下，“哦”和“啊”的声音被当作工具。说唱或说出的单词轨迹显然是“发声的”。器乐性值越接近1.0，则轨道中没有声音的可能性越大。高于0.5的值旨在表示乐器轨迹，但随着该值接近1.0，置信度更高。
Liveness: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides a strong likelihood that the track is live.
生动度：检测记录中是否有听众。较高的活跃度值表示增加了实时执行轨道的可能性。高于0.8的值很有可能会启用该轨道。
Loudness: the overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing the relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typically range between -60 and 0 db.
响度：轨道的整体响度，以分贝(dB)为单位。响度值是整个轨道的平均值，可用于比较轨道的相对响度。响度是声音的质量，它是身体力量(振幅)的主要心理关联。值通常在-60到0 db之间。
Speechiness: Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audiobook, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.
语音能力：语音能力可检测曲目中是否存在口语。与录音类似的录音(例如脱口秀，有声读物，诗歌)越多，属性值就越接近1.0。高于0.66的值描述的曲目可能完全由口语组成。介于0.33到0.66之间的值描述了可能同时包含音乐和语音的曲目，无论是分段还是分层的(包括说唱音乐)。低于0.33的值最有可能代表音乐和其他非语音类曲目。
Valence: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).
价：从0.0到1.0的小节，描述了曲目传达的音乐积极性。价态高的音轨听起来更积极(例如，快乐，开朗，欣快)，而价态低的音轨听起来更加消极(例如，悲伤，沮丧，愤怒)。
Tempo: The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, the tempo is the speed or pace of a given piece and derives directly from the average beat duration.
速度：以每分钟节拍数(BPM)为单位的曲目的总体估计速度。用音乐术语来说，节奏是指给定乐曲的速度或节奏，它直接来自平均拍子持续时间。

For information reduction purposes I decided to use the features of Loudness, Valence, Energy, and Danceability because they have more influence to differentiate between Energetic and Relaxed songs.

为了减少信息量，我决定使用响度，价，能量和可跳舞性的功能，因为它们具有更大的影响力来区分活力和轻松的歌曲。

1.获取和分析数据： (1. Obtaining and Analysing the Data:)

My favorite band is Radiohead, so I decided to obtain their discography and all the music created by the musicians in their solo music careers.

我最喜欢的乐队是Radiohead ，所以我决定获得他们的唱片以及音乐家在其个人音乐事业中创作的所有音乐。

Using the Spotipy Library I created some functions to download all the songs of Radiohead, Thom Yorke, Atoms For Peace, Jonny Greenwood, Ed O’Brien, Colin Greenwood, and Phil Selway (Yes, I’m obsessed with their music hehe). You can access those functions on my Github Repository (click here).

使用Spotipy库，我创建了一些函数来下载Radiohead ， Thom Yorke ， Atoms For Peace ， Jonny Greenwood ， Ed O'Brien ， Colin Greenwood和Phil Selway的所有歌曲(是的，我很着迷于他们的音乐呵呵)。您可以在我的Github存储库中访问这些功能( 单击此处 )。

I obtained the following data:

我获得了以下数据：

Image for post — Data Frame with shape of 423 rows and 10 columns. (Image by author)

I always wondered why I like a lot the music of Radiohead and I realized that most of their songs tend towards melancholy. Describing the features above, the data showed me that Valence and Energy are less than 0.5 and Danceability tends to low values, so I like tracks with low energy and negative sound (there is still something of my 2000’s Emo Side watching MTV Videos).

我一直想知道为什么我非常喜欢Radiohead的音乐，并且我意识到他们的大多数歌曲都倾向于忧郁。描述上述功能后，数据显示价价和能量小于0.5，而可跳舞性趋向于低值，因此我喜欢低能量和负面声音的音轨(仍然有我2000年的Emo Side观看MTV视频的内容)。

Main Stats of Data Frame (Image by author)

2.建立模型： (2. Building the Model:)

I decided to use K-means Clustering for Unsupervised Machine Learning due to the shape of my data (423 tracks ) and considering I want to create 2 playlists separating Relaxed tracks from Energetic tracks (K=2).

由于数据的形状(423个音轨)，我决定将K均值聚类用于无监督机器学习，并考虑到我要创建2个播放列表，将轻松音轨与能量音轨(K = 2)分开。

Important: I’m not using train and test data because in this case I just want to group all the tracks into 2 different groups to create playlists with the entire data.

重要提示 ：我不使用训练和测试数据，因为在这种情况下，我只想将所有轨道分为2个不同的组，以创建包含整个数据的播放列表。

So let’s do it!. I first import the libraries:

因此，让我们开始吧！我首先导入库：

from sklearn.cluster import KMeansfrom sklearn.preprocessing import MinMaxScaler

Then I need to define features and normalize the values of the model. I’ will use MinMaxScaler to preserve the shape of the original distribution and scale the features between a range from 0 to 1. Once I have the values in the correct format, I just simply create the K-Means model and then save the labels into the main Data Frame called “df”.

然后，我需要定义特征并标准化模型的值。我将使用MinMaxScaler保留原始分布的形状，并在0到1的范围内缩放要素。一旦以正确的格式获得了值，我只需创建K-Means模型，然后将标签保存到主数据帧称为“ df”。

col_features = ['danceability', 'energy', 'valence', 'loudness']
X = MinMaxScaler().fit_transform(df[col_features])kmeans = KMeans(init="kmeans++",
                n_clusters=2,
                random_state=15).fit(X)df['kmeans'] = kmeans.labels_

That’s All, I have the music clustered in 2 groups !!!

就是这样，我将音乐分为2组！！！

But now I need to study the features of these labels, so I plot the tracks in a 3D Scatter and then I analyze the respective mean of each feature grouping the data frame by the K-Means result labels.

但是现在我需要研究这些标签的特征，因此我将轨迹绘制在3D散点图中，然后分析通过K-Means结果标签将数据框分组的每个特征的平均值。

As I noticed on the graph the values are quite well grouped, blue values are located in label 0 and red values in label 1. Looking at the table of means, the label 0 grouped tracks with less danceability, energy, valence, loudness, so this one corresponds to Relaxed songs, likewise, the label 1 has the Energetic songs.

正如我在图表上所注意到的，这些值被很好地分组了，蓝色值位于标签0中，红色值位于标签1中。从均值表来看， 标签0分组后的音轨具有较低的可跳舞性，活力，化合价，响度，因此这首歌曲对应于“轻松的歌曲” ，同样， 标签1具有“劲歌” 。

3.具有R的模型的准确性： (3. Accuracy of the Model with R:)

I know that Clustering accuracy is a bit subjective trying to evaluate the best result of a Clustering Algorithm, but in the same way, I wanted to observe if my model is separating the tracks well. So with a little help from Rstudio, I used the Silhouette Analysis. to measure the accuracy of my model.

我知道聚类精度在尝试评估聚类算法的最佳结果时有点主观，但是以同样的方式，我想观察我的模型是否很好地分隔了轨道。因此，在Rstudio的一点帮助下，我使用了“轮廓分析”。来衡量我模型的准确性。

In Rstudio I used the library “cluster” and “factoextra” to visualize and calculate the Silhouette Analysis using the Euclidean distance. the complete code is on my Github Repository (click here):

在Rstudio中，我使用库“群集”和“ factoextra”来使用欧几里得距离可视化和计算“轮廓分析”。完整的代码在我的Github存储库中( 单击此处 )：

#Calculate The euclidean distance of my dataframe values.
dd <- dist(df,method="euclidean")#Silhouette Analysis using the K-means model(km) and distance(dd)
sil.km <- silhouette(km$cluster,dd)
fviz_silhouette(sil.km)

The result is:

结果是：

The Silhouette Analysis is a way to measure how close each point in a cluster is to the points in its neighboring clusters. Silhouette values lies in the range of [-1, 1]. A value of +1 indicates that the sample is far away from its neighboring cluster and very close to the cluster its assigned. Similarly, a value of -1 indicates that the point is close to its neighboring cluster than to the cluster its assigned. So in my case values are between 0.25 and 0.60 inferring that most of the values are quite well grouped.

轮廓分析是一种测量聚类中的每个点与其相邻聚类中的点的接近程度的方法。轮廓值在[-1，1]的范围内。值+1表示样本离其邻近的簇很远，并且非常接近为其分配的簇。类似地，值-1表示该点比其分配的群集更靠近其相邻群集。因此，在我的情况下，值介于0.25和0.60之间，这表明大多数值都进行了很好的分组。

4.在Spotify上创建播放列表： (4. Creating the Playlists on Spotify:)

To create the playlists and add the clustered tracks I use the library Spotipy explained in the first part of this article. You just need to get a client id, client secret, and username code to use the Spotify’s Apis and manipulate your library music. I let you the link with the information (Click Here).

为了创建播放列表并添加群集的曲目，我使用了本文第一部分中解释的库Spotipy。您只需要获取客户端ID，客户端密钥和用户名代码，即可使用Spotify的Apis并操作您的音乐库。我让您链接信息( 单击此处 )。

I had to separate the tracks grouped into 2 different variables and then having the ids of the tracks I just create 2 new playlists and pass them the ids of the tracks. The code is the following one:

我必须将轨道分为2个不同的变量，然后让它们的ID分别创建两个新的播放列表，并将它们的ID传递给它们。代码如下：

#Separating the clusters into new variables
cluster_0 = df[df['kmeans']==0]
cluster_1  = df[df['kmeans']==1]#Obtaining the ids of the songs and conver the id dataframe column to a list.
ids_0 = cluster_0['id'].tolist()
ids_1 = cluster_1['id'].tolist()#Creating 2 new playlists on my Spotify User
pl_energy = sp.user_playlist_create(username=username,
                                           name="Radiohead :)")pl_relaxed = sp.user_playlist_create(user=username,
                                            name="Radiohead :(")#Adding the tracks into the playlists
#For energetic Playlist
sp.user_playlist_add_tracks(user=username,
                            playlist_id = pl_energy['id'],
                            tracks=ids_1)#For relaxed Playlist
sp.user_playlist_add_tracks(user=username,
                            playlist_id = pl_relaxed['id'],
                            tracks=ids_0)

Finally, I have my 2 playlists of Radiohead’s songs using a K-means Clustering Algorithm to separate Energetic Songs from Relaxed Songs !!!!.

最后，我有2个Radiohead歌曲的播放列表，使用K-means聚类算法将精力充沛的歌曲与轻松的歌曲分开！！！！

If you want to listen to both playlists you can access them below.

如果您想同时收听两个播放列表，则可以在下面访问它们。

最具活力的Radiohead歌曲(209首曲目) (Most Energetic Radiohead Songs (209 tracks))

Playlist created by the author

作者创建的播放列表

最轻松的Radiohead歌曲(214首曲目) (Most Relaxed Radiohead Songs (214 tracks))

Playlist created by the author

作者创建的播放列表

结论 (Conclusion)

Machine Learning Algorithms are a lot of fun to implement ideas or projects related to things you like. In my case, I like music a lot, so I could use this knowledge to create cool ways helping me to automate a task that can take a long time to perform it. I also could learn more about this amazing world of Data Science and my tendencies to music tastes.

机器学习算法对于实现与您喜欢的事物相关的想法或项目很有趣。就我而言，我非常喜欢音乐，因此我可以利用这些知识来创建一些很酷的方法，以帮助我自动执行一项需要很长时间才能完成的任务。我还可以了解有关数据科学这个神奇世界的更多信息，以及我对音乐品味的倾向。