Netflix Reveals All (well, at least a lot)_netflix user characteristics-CSDN博客

Last night I had the distinct pleasure of attending a Data Science Track event sponsored by the LA Machine Learning meetup group: Data Science @ Netflix. Held at the new, much larger, Cross Campus location in Santa Monica, the event attracted 250 people with another hundred-plus on hand at a satellite location in Pasadena using a streaming video link. Presenting were Douglas Twisselmann, Ph.D., Senior Data Scientist, and Kevin Wylie, Director of Content Data Science, from the Netflix content team in Beverly Hills. Netflix has another data science group in Los Gatos, Calif.

The Netflix content team is tasked with the challenge of licensing/purchasing/developing the best TV and movies for its 44 million users in 41 countries. This talk covered an overview of what the content data science teams do for the organization towards the goals of identifying characteristics of an “ideal” content library, predicting demand for titles that Netflix does not have, determining the customer impact of adding or losing sets of content, and helping to identify the next original series. In addition, they covered some data and techniques one might use in demand prediction. Here is a slide describing the Netflix data pipeline:

Netflix does it right with both a Data Science Engineering and Science & Algorithmsgroup. They wisely have two distinct teams for engineering AND theoretical data science (mathematical statistics, probability theory, machine learning) instead of trying to hire unicorns like many other companies. The Netflix corporate culture also was discussed where “high performance” is valued above all else, i.e. you can be fired for being average. It sounds like a pressure cooker, but some people thrive on work environments like that.

One cool slide included in the presentation, and worth the price of admission in my opinion, was a list of machine learning technology Netflix uses in one form or another:

Regression models (logistic, linear, elastic nets)
GBDT/RF
SVD & other MF models
Factorization machines
Restricted Boltzmann machines
Markov Chains and other graphical models
Clustering (from k-means to HDP)
Deep ANN
LDA

Another slide had a group of academic books favored by the Netflix data science team and lo-and-behold I saw my favorite book!

The Netflix data science guys were as candid as they were allowed to be with their insights into how the company maximizes their data assets, however there were a number of limitations to what they could talk about, especially how they utilize user rankings. But all Netflex customers intrinsically know their recommender systems are second to none. We were treated to some actually Tweets from customers after receiving a highly targeted e-mail. One woman responded this way to an alert that another season of “The Office” would be available: “Netflix, you understand me better than any man has!” I think that says it all.

April 10, 2014 by Daniel Gutierrez Leave a Comment

原文：http://inside-bigdata.com/2014/04/10/netflix-reveals/

» 本文链接： http://www.52ml.net/13421.html

» 转载请注明来源： 我爱机器学习(52ml.net) » 《Netflix Reveals All (well, at least a lot)》