Netflix Reveals All (well, at least a lot)


netflixlogo

Last night I had the distinct pleasure of attending a Data Science Track event sponsored by the LA Machine Learning meetup group: Data Science @ Netflix. Held at the new, much larger, Cross Campus location in Santa Monica, the event attracted 250 people with another hundred-plus on hand at a satellite location in Pasadena using a streaming video link. Presenting were Douglas Twisselmann, Ph.D., Senior Data Scientist, and Kevin Wylie, Director of Content Data Science, from the Netflix content team in Beverly Hills. Netflix has another data science group in Los Gatos, Calif.

The Netflix content team is tasked with the challenge of licensing/purchasing/developing the best TV and movies for its 44 million users in 41 countries. This talk covered an overview of what the content data science teams do for the organization towards the goals of identifying characteristics of an “ideal” content library, predicting demand for titles that Netflix does not have, determining the customer impact of adding or losing sets of content, and helping to identify the next original series. In addition, they covered some data and techniques one might use in demand prediction. Here is a slide describing the Netflix data pipeline:

Netflix_datapipeline

Netflix does it right with both a Data Science Engineering and Science & Algorithmsgroup. They wisely have two distinct teams for engineering AND theoretical data science (mathematical statistics, probability theory, machine learning) instead of trying to hire unicorns like many other companies. The Netflix corporate culture also was discussed where “high performance” is valued above all else, i.e. you can be fired for being average. It sounds like a pressure cooker, but some people thrive on work environments like that.

One cool slide included in the presentation, and worth the price of admission in my opinion, was a list of machine learning technology Netflix uses in one form or another:

  • Regression models (logistic, linear, elastic nets)
  • GBDT/RF
  • SVD & other MF models
  • Factorization machines
  • Restricted Boltzmann machines
  • Markov Chains and other graphical models
  • Clustering (from k-means to HDP)
  • Deep ANN
  • LDA

Another slide had a group of academic books favored by the Netflix data science team and lo-and-behold I saw my favorite book!

The Netflix data science guys were as candid as they were allowed to be with their insights into how the company maximizes their data assets, however there were a number of limitations to what they could talk about, especially how they utilize user rankings. But all Netflex customers intrinsically know their recommender systems are second to none. We were treated to some actually Tweets from customers after receiving a highly targeted e-mail. One woman responded this way to an alert that another season of “The Office” would be available: “Netflix, you understand me better than any man has!” I think that says it all.

April 10, 2014 by Daniel Gutierrez Leave a Comment

原文:http://inside-bigdata.com/2014/04/10/netflix-reveals/



  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值