卫报:如何预测用户对某首歌曲的喜好

原文刊登于英国卫报: Datablog: Can you predict who will love a song?

作者Jeremy Howard是数据分析和模型预测平台Kaggle的总裁和首席科学家


本来想翻译的,后来觉得保留原文更好。本文关键词:百代唱片、音乐推荐、盛大创新院、Kaggle。


以下是原文:


Data science communities teamed up with EMI to find out how accurately you can predict someone's opinion of a song based on a handful of details about their general musical taste.

 

参赛者旨在分析EMI的数据,来预测听众对歌曲的评分,图片来源: Judith Collins / Alamy/Alamy

As finales go, it couldn't have been much more tense. With the finish tantalisingly in sight, the relatively unknown frontrunner held a clear and seemingly unbreakable lead, only to find a veteran champion breaking through. And then as the two grappled for first place, in a true Cinderella story, a third darted in from nowhere in the final moments to steal it from them both and claim the victory.


But this nailbiting finish had nothing to do with the Tour de France, the Olympics, or any other kind of traditional sporting event for that matter. Instead, it involved a battle between hundreds of data scientists around the world racing to help shape the future of the music industry. Their task: to develop an algorithm capable of predicting if a listener will love a new song.

Not that long ago such a pursuit would have been considered utter folly and best left to soothsayers and astrologers. Thanks to the sheer scale and quality of data that's now becoming available, and to the development of better algorithms through events such as this, it is now not only quite feasible but rapidly becoming a way of doing business in many industries.

This event, the Music Data Science Hackathon, is clear evidence of that because it involved the music giant EMI Music sharing its highly prized EMI Million Interview Dataset for the very first time. This is a vast and uniquely rich dataset compiled from 20-minute interviews with 800,000 music lovers from 25 different countries, recording their interests, attitudes, behaviours, and their familiarity and appreciation of music. For the data science community in London and those further afield – throughKaggle's online platform – this was a chance to show just what can be achieved when the right kind of data meets the right minds.

Held in partnership with Data Science London, EMI Music, EMC,Lightspeed Research and Kaggle, the challenge was to use this dataset to predict the rating someone would give a song based on their demographic, the artist and track ratings, their answers to questions about musical preferences and the words they use to describe EMI artists.

With a prize fund of £6,500, we saw more than 1,300 entries submitted by 138 different teams. Some of these attended the event in person, while the rest were made up of Kaggle's online community of 45,000 data scientists. We saw a broad range of approaches, from generalised boosted methods to random forests, single value decomposition to matrix factorisation and collaborative filtering, with no one class of model outperforming all the others.

The results were outstanding, both in terms of quality and quantity of algorithms. However, in the end there was a very clear winning team, which came from Shanda Innovations, a tech incubator based in Shanghai and Beijing and a rising star in the Kaggle community. As in several previous Kaggle and Data Science London collaborations, the winners' code and algorithms will be open sourced.

But besides showing that is possible to make these kinds of predictions, this event also uncovered some other nice gems, such as how women tended to be generally more positive than men, using words like "current", "edgy" and "cool" to describe songs, as opposed to "cheap", "unoriginal" and "superficial". Retired people tended rate songs higher, while students and unemployed people often gave lower ratings. And it was interesting to see correlations between the words people used to describe the same song, often seemingly at odds with each other.

The words "noisy" and "uplifting" is one example. And similarly one person's "superficial" is another's "playful". Another consistent theme was that the characteristics commonly used by the music industry to inform their marketing, such as "age" and "gender", turned out to be not the most powerful predictors after all.

Perhaps the loudest message to take from this is how very qualitative data sets – extremely subjective survey questions about people, their relationship with the music they like, and the words they associate with different tracks – can be mined. It's a great reminder that collaboration, bright minds, and machine learning can be used to understand even a very non-technical question such as "Will you like a new song?"


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值