bokeh pandas_使用Pandas和Bokeh将Rolling Stone的500张最伟大专辑可视化

bokeh pandas

by Gautham Koorma

通过Gautham Koorma

使用Pandas和Bokeh将Rolling Stone的500张最伟大专辑可视化 (Rolling Stone’s 500 Greatest Albums Visualized Using Pandas and Bokeh)

In 2003, Rolling Stones Magazine polled musicians, producers, and industry executives about their favorite albums. The result was a special issue titled “The 500 Greatest Albums of All Time.”

2003年,《滚石杂志》(Rolling Stones Magazine)对音乐家,制作人和行业高管进行了投票,调查了他们最喜欢的专辑。 结果是一期名为“有史以来最伟大的500张专辑”的特刊。

The list — which they revised in 2012 — mainly features American and British music from the 1960s and the 1970s.

该清单在2012年进行了修订,主要包含1960年代和1970年代的美国和英国音乐。

As an ardent music fan and an aspiring music producer, I listen to a wide variety of genres. The Rolling Stones list served as an introduction to rock music for me back in the day.

作为热心的音乐迷和有抱负的音乐制作人,我喜欢听各种各样的音乐。 滚石乐队的清单在当时为我介绍了摇滚音乐。

One day I while browsing through Kaggle to pick up a simple data set and test my newly acquired data visualization skills, I stumbled upon the list uploaded as a CSV dataset. I decided to get my hands dirty by using pandas to explore the data and bokeh to visualize the results.

有一天,我在浏览Kaggle并选择一个简单的数据集并测试新获得的数据可视化技能时,偶然发现了以CSV数据集上传的列表。 我决定通过使用熊猫来探索数据并通过散景来可视化结果来弄脏我的手。

Bokeh is a Python library for interactive visualization. It features a powerful interface that supports high-level charting, intermediate-level plotting, and lower-level modeling.

Bokeh是用于交互式可视化的Python库。 它具有强大的界面,可支持高级图表,中级绘图和低级建模。

The complete code I used for reading, refining, exploring, and visualizing the data can be found on my GitHub page, and also in this notebook submitted on Kaggle.

我用于读取,精炼,探索和可视化数据的完整代码可以在我的GitHub页面上找到 ,也可以在Kaggle上提交的笔记本中找到

This post will describe the approaches I took, complete with my visualizations and the insight I gained from building them.

这篇文章将描述我采用的方法,并结合我的可视化效果以及从构建它们中获得的见解。

获取和构建数据 (Getting and Structuring the Data)

Getting the data was simple, since it was in a 500 x 6 excel spreadsheet. All I had to do was read it into a pandas data frame directly by using the read_excel() function.

获取数据非常简单,因为它位于500 x 6的Excel电子表格中。 我要做的就是直接使用read_excel()函数将其读入pandas数据框中。

The data frame had 500 rows, one for each album listing the Chart Number, Year, Album, Artist, Genre, and Subgenre. The Genre and Subgenre columns had multiple comma separated values in a string, so I had to split the string at the first comma and keep just the first value in new columns as the most relevant categorization of the album’s Genre and Subgenre.

数据框有500行,每张专辑一个,列出图表编号,年份,专辑,艺术家,流派和子流派。 “类型”和“子类型”列在字符串中具有多个逗号分隔的值,因此,我必须在第一个逗号处拆分字符串,并仅将新列中的第一个值保留为专辑的“类型”和“子类型”的最相关分类。

The master data frame became 500 x 8 after the Genres_Refined and Subgenres_Refined columns were added.

添加Genres_Refined和Subgenres_Refined列后,主数据帧变为500 x 8。

I used a Python 3.5.2 kernel (Anaconda 4.2.0 distribution) on a Jupyter notebook.

我在Jupyter笔记本上使用了Python 3.5.2内核(Anaconda 4.2.0发行版)。

探索数据并获得见解 (Exploring the Data and Gaining Insights)

I adopted the split-apply-combine strategy using pandas inbuilt groupby() function in most cases and the reshaping strategy using pandas inbuilt pivot_table() function for a single case. I fed the resulting data frames into bokeh charts and figures.

我采用了熊猫内置groupby()的拆分应用组合策略 多数情况下使用“函数”,并且在单个情况下使用pandas内置的pivot_table()函数进行重塑策略。 我将得到的数据框输入到散景图和图形中。

Here are the questions I posed and their resulting visualizations.

以下是我提出的问题及其产生的可视化效果。

名单上专辑数量最多的前10位歌手 (The top-10 artists who have the most albums on the list)

To get the top 10 artists, I used groupby() on the artists column, took a count, and sorted the resulting data frame to get the top 10 artists having the most number of albums.

为了获得前10位艺术家,我在groupby()列上使用了groupby() ,进行了计数,并对所得数据框进行了排序,以获取专辑数量最多的前10位艺术家。

To visualize the results, I used the a figure object from the bokeh.plotting library and drew black circles using the circle() method.

为了使结果可视化,我使用了bokeh.plotting库中的一个Figure对象,并使用circle()绘制了黑色圆圈 方法。

Clearly, the Beatles, Bob Dylan, and the Rolling Stones topped the list with 10 albums apiece.

显然,甲壳虫乐队,鲍勃·迪伦和滚石乐队以每张10张专辑高居榜首。

列表中的专辑数量的年度计数 (Year-wise count of the number of albums in the list)

To get this, I used groupby() on the year column and took a count following which I sorted the data by year and plotted the resulting data frame using a line chart from bokeh.charts.

为此,我在year列上使用groupby()并进行了计数,然后按年份对数据进行了排序,并使用了来自bokeh.charts的折线图来绘制结果数据框。

Maximum number of albums in the list were released in 1970. Albums released in the late 1960s and early 1970s were also found abundantly. The final spike is found in the early 1990s accounting for the outbreak of Pop, R&B, and Hip-Hop music.

该列表中的专辑数量最多,是1970年发行的。还发现了1960年代末和1970年代初发行的专辑。 最后一个高峰出现在1990年代初,原因是流行音乐,节奏布鲁斯音乐和嘻哈音乐的爆发。

热门流派和子流派 (Top Genres and Subgenres)

To identify the top genres and the subgenres within them, I reshaped the data using the pandas pivot_table() function in which I set the index as the Genre_Refined and Subgenre_Refine columns, and set the aggfunc parameter to count.

为了识别顶级流派和其中的子流派,我使用pandasivot_table()函数重塑了数据,在该函数中,我将索引设置为Genre_Refined和Subgenre_Refine列,并将aggfunc参数设置为count。

After taking a subset of the data frame using a condition that there should be more than 5 albums in a subgenre, I fed the data frame to a bokeh donut chart and set the palette to Purples9.

在子流派中应有5张以上专辑的条件下获取数据框的子集后,我将数据框馈入bokeh 甜甜圈图,并将调色板设置为Purples9。

Rock and its subgenres cover about 80% of the selection. Hip-Hop, R&B, Soul, Country, and Electronic music albums covered the remaining 20%.

岩石及其子流派覆盖了大约80%的选择。 嘻哈,R&B,灵魂,乡村和电子音乐专辑覆盖了剩余的20%。

各流派的歌曲(按年份) (Songs in each Genre by year)

To get this data, I did a groupby() on Year and Genre_Refined, took the count, sorted the values by Year, and fed the resulting data frame to a bokeh heatmap. This time I used the Reds9 palette.

为了获得这些数据,我做了一个groupby() 在Year和Genre_Refined上,对计数进行计数,按Year对值进行排序,然后将得到的数据帧提供给bokeh 热图 。 这次我使用了Reds9调色板。

Rock music albums from the late 60s and the 70s are clearly the most numerous. Funk, Soul, and Jazz music albums reduced in numbers over the years, paving the way for Hip-Hop and Electronic albums.

60年代和70年代末的摇滚音乐专辑显然是最多的。 近年来,Funk,Soul和Jazz音乐专辑的数量有所减少,为嘻哈和电子专辑铺平了道路。

多年以来的岩石亚体 (Subgenres of Rock Over the Years)

To get this data, I did a groupby() on the Year, Genre_Refined, and Subgenre_Refined, took a count, and subset the data frame to include just Rock in the Genre_Refined column. I then fed the resulting data frame to a bokeh heatmap.

为了获得此数据,我在Year,Genre_Refined和Subgenre_Refined上进行了一个groupby() ,进行了计数,并对数据框进行了子集处理,以在Genre_Refined列中仅包含Rock。 然后,我将得到的数据框输入到bokeh热图。

The initial few years were dominated by Rock & Roll, while Blues Rock and Pop Rock slowly increased in number by the mid 1960s. By the mid 1970s, Alternative Rock started coming into the picture, followed by Indie Rock in the mid 1980s.

最初的几年以摇滚乐为主,而布鲁斯摇滚乐和流行摇滚乐的数量在1960年代中期逐渐增加。 到1970年代中期,Alternative Rock开始出现,随后是1980年代中期的Indie Rock。

前10张专辑的摘要 (A summary of the Top 10 albums)

Finally, I summarized the top 10 albums in the list after grouping it by artist.

最后,在按艺术家分组后,我总结了列表中的前10张专辑。

The final results are not really surprising. The Rolling Stone Magazine list mostly contains songs from from Rock and its various subgenres, with a few outliers in the form of Hip-Hop, R&B, Soul, Country, and Electronic music albums.

最终结果并不令人惊讶。 《滚石杂志》列表主要包含来自Rock及其各个子流派的歌曲,以及一些以Hip-Hop,R&B,Soul,Country和Electronic音乐专辑形式出现的离群值。

If you’re like me and like to occasionally reconnect with the music of the Beatles, Bob Dylan, Rolling Stones, and the other pioneers of Rock and Roll during the 60s and 70s, I suggest you give these top albums a listen, then explore from there.

如果您像我一样,喜欢偶尔与甲壳虫乐队,鲍勃·迪伦,滚石乐队以及摇滚乐的其他先驱者重新建立联系,那么建议您试听一下这些顶级专辑,然后再进行探索从那里。

If you’re curious, you can read the full list of albums here.

如果您有好奇心,可以在此处阅读相册的完整列表。

I’m a technology consultant, data science enthusiast, and aspiring music producer. If you have writing opportunities or are interested in getting in touch for work, feel free to write to me at contact at gautham dot biz.

我是技术顾问,数据科学爱好者和有抱负的音乐制作人。 如果您有写作机会或有兴趣联系工作,请随时与gautham dot biz的联系人联系。

If you liked this article, please hit the recommend button and share it with your friends.

如果您喜欢这篇文章,请点击“推荐”按钮并与您的朋友分享。

翻译自: https://www.freecodecamp.org/news/visualising-rolling-stones-500-greatest-songs-using-bokeh-78ebc0eaff3f/

bokeh pandas

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值