netflix 数据科学家_数据科学和机器学习在Netflix中的应用

netflix 数据科学家

Using data science, Netflix has surpassed its competition and now has over 100 million users globally. Data science helps Netflix keep track of all your likes and dislikes to make sure you’re satisfied.

借助数据科学,Netflix超越了竞争对手,目前在全球拥有超过1亿用户。 数据科学可帮助Netflix跟踪您的所有好恶,以确保您满意。

Image for post
, Photo by fauxels from PexelsPexels fauxels摄影

这个概念 (The Concept)

Data science is a combination of tools, algorithms, and machine learning principles that help users gain functional and beneficial patterns from raw data. A data scientist can identify future occurrences of an event by using advanced machine learning algorithms. The Internet of Things (IoT) has given rise to the fundamentals of data science, making it the most valuable resource for all companies today.

数据科学是工具,算法机器学习原理的组合 ,可帮助用户从原始数据中获得功能性和有益的模式。 数据科学家可以使用高级机器学习算法来确定事件的未来发生。 物联网(IoT)引起了数据科学的基础知识,使其成为当今所有公司最有价值的资源

目的 (The Aim)

Netflix has always strived to improve User Interface at all levels. Their primary goal is to add Contextual Awareness to their recommendations. It means that the proposals should have high logical reasoning behind them. As per DataFlair, two types of contextual classes are relevant to Netflix.

Netflix一直在努力改善所有级别的用户界面 。 他们的主要目标是在他们的建议中增加上下文意识 。 这意味着这些建议应在其背后具有较高的逻辑推理性。 根据DataFlair ,两种类型的上下文类与Netflix有关。

1.明确 (1. Explicit)

● Location

●位置

● Language

●语言

● Time of the Day

●一天中的时间

● Device

●设备

2.推断 (2. Inferred)

● Binging Patterns

●结合方式

● Companion

●同伴

Image for post
User Interface at all levels, Photo by 用户界面energepic.com from PexelsPexels energepic.com摄

应用程序 (The Application)

Netflix has used data science to ensure that users enjoy value for money. With the help of various Analytical Tools, the Streaming Giant identifies the liking and proclivity of users and directs them towards similar options. A study suggests that recommendations influence more than 80% of all streamed content on Netflix.

Netflix利用数据科学来确保用户享受金钱的价值 。 在各种分析工具的帮助下,Streaming Giant可以识别用户的喜好和倾向并将其引导至相似的选项。 一项研究表明,推荐影响Netflix上所有流媒体内容的80%以上。

Netflix does not use the conventional Hadoop warehouse. It instead uses an upgraded Data Storage System, Amazon’s S3. It allows it to spin more Hadoop clusters for work bases accessing the same Data. It uses Hive for Ad hoc queries and Analytics/PIG for ETL (Extract, transform, load)

Netflix不使用传统的Hadoop仓库。 相反,它使用升级的数据存储系统Amazon的S3 。 它允许它为访问相同数据的工作基地旋转更多的Hadoop集群。 它使用Hive进行临时查询,并使用Analytics / PIG进行ETL(提取,转换,加载)

数据 (The Data)

To begin their Analysis, Netflix gathers Raw Fata, from which it plans to extract resourceful information using Data Science Algorithms. A combination of these algorithms transforms plain numbers to a detailed Recommendation Plan. For every 5 minutes a user spends on scrolling, Netflix can predict more than 40% of their relative selection patterns. There are several fields on Netflix, where Data is collected, captured, and stored.

为了开始进行分析,Netflix收集了Raw Fata,并计划使用Data Science算法从中提取资源丰富的信息。 这些算法的组合将素数转换为详细的推荐计划。 用户每花5分钟滚动一次,Netflix就可以预测其相对选择模式的40%以上。 Netflix上有几个字段,用于收集,捕获和存储数据。

● Time: The primary step is to understand and store the Time and Date when users stream content. It helps them identify your Sunday night-horror movie plans or your Afternoon-thriller preferences.

●时间:第一步是了解和存储用户流式传输内容时的时间和日期。 它可以帮助他们确定您的周日夜间恐怖电影计划或您的下午惊悚片偏好。

● Searches: All Search Titles are automatically stored to re-direct further recommendations towards these searches. Let’s say you search “John Wick,” watch the movie and close Netflix. The next time you switch the application back on, you will undoubtedly find more Action movies or more Keanu Reeves starters.

●搜索:将自动存储所有搜索标题,以将更多建议重定向到这些搜索。 假设您搜索“ John Wick ”,观看电影并关闭Netflix。 下次您重新打开该应用程序时,无疑会找到更多的动作电影或更多的Keanu Reeves起动器。

● Browsing and scrolling behavior: Netflix also uses Advanced Analytical programs to identify which Movie/TV show you decided to stop and read about. It helps them showcase more similar content to catch your eye and get you interested again.

●浏览和滚动行为:Netflix还使用Advanced Analytical程序来确定您决定停止并阅读的电影/电视节目。 它可以帮助他们展示更多类似的内容,以引起您的注意并再次引起您的兴趣。

● Pause/Fast-forward: Using Data Science, Netflix catches the exact durations where a user starts Pausing or Fast-forwarding while streaming content. It helps it identify what kind of scenes are preferred over others. If you skip an action movie’s emotional scene, it develops the algorithm to avoid passionate movies in future recommendations. But if you re-watch an emotional scene, it will adapt accordingly.

●暂停/快进:使用数据科学,Netflix可以捕获用户在流式传输内容时开始暂停或快进的确切时长。 它有助于确定哪种场景比其他场景更受青睐。 如果您跳过动作片的情感场景,它会开发出避免在以后的推荐中出现激情片的算法。 但是,如果您重新观看一个情感场景,它将相应地进行调整。

● A device used: If you use separate mechanisms to stream different content, this differentiation is stored permanently. For example, Children watching cartoons on the home-TV will not be recommended movies watched by their parents on the iPad, despite using the same account.

●使用的设备:如果使用单独的机制来流传输不同的内容,则此差异将被永久存储。 例如,即使使用相同的帐户,也不会推荐父母在iPad上观看家庭电视上观看动画片的儿童观看的电影。

Image for post
To begin their Analysis, Netflix gathers Raw Fata, from which it plans to extract resourceful information using Data Science Algorithms, Photo by Lukas from Pexels
为了开始进行分析,Netflix收集了Raw Fata,并计划使用Data Science算法从中提取资源丰富的信息, PexelsLukas

该项目 (The Project)

Netflix uses Data at all levels possible. From the time it a user logs in to log out, it stores all possible information it needs. It then channels these Data to bring out actionable information. The most famous story of Netflix’s marketing is how they purchased the “House of Cards” series. The series, starred by Kevin Spaced and directed by David Fincher, was one of the biggest blockbuster hits. More than a hundred million dollars was incurred to purchase this TV series, for several reasons.

Netflix尽可能使用数据。 从用户登录到注销开始,它就存储了所需的所有可能的信息。 然后,它会引导这些数据以带出可操作的信息 。 Netflix营销最著名的故事是他们如何购买“ 纸牌屋 ”系列。 该系列由凯文·斯派西德(Kevin Spaced)主演,由大卫·芬奇(David Fincher)执导,是最热门的大片之一。 购买该电视连续剧的费用超过一亿美元 ,原因有几个。

● Netflix identified a vast fan base for Actor Kevin Spacey, who has acted in movies such as 21 and American Beauty.

●Netflix为演员凯文·斯派西(Kevin Spacey)确定了庞大的粉丝群,他曾出演过21电影和《美国美女》等电影

● It also did a background check about Trending and Popular movies on their platform. Movies like Fight Club and The Social Network were highly rated and viewed by their audience, all directed by the renowned David Fincher.

●它还对平台上的热门电影和热门电影进行了背景检查。 像《 搏击俱乐部》《社交网络》这样的电影获得了观众的高度评价和观看,全部由著名的大卫·芬彻执导。

● Netflix also viewed the statistics of the British version of the series, that was earlier released. The UK version received due appreciation by its target audience, which boosted its stance.

●Netflix还查看了早先发行的该系列英国版本的统计信息。 英国版受到其目标受众的应有赞赏,这增强了其立场。

● The Political Drama Genre was one of their most active genres, with movies like Elizabeth I: The Virgin Queen and Winnie Mandela, doing rounds on their website.

●政治戏剧类型是他们最活跃的类型之一,像伊丽莎白一世(Elizabeth I:The Virgin Queen)温妮·曼德拉(Winnie Mandela)等电影在其网站上进行巡回演出。

Using programmable algorithms, all factors were linked to a pattern, making Netflix spend the big bucks on House of Cards. The series then became a massive hit and climbed to the #1 position on their trending charts, making it a successive and profitable Analysis.

使用可编程算法 ,所有因素都与一种模式相关联,从而使Netflix在“纸牌屋”上花了大钱 该系列随后大受欢迎,并在其趋势图上攀升至第一位,使其成为连续且盈利的分析。

Image for post
Netflix identified a vast fan base for Actor Kevin Spacey, who has acted in movies such as 21 and American Beauty, Photo by Bich Tran from Pexels
Netflix公司确定了演员凯文斯派西,谁在电影中,如 21行动 和美国丽人,照片庞大粉丝群的 碧陈德良Pexels

好处 (The Benefits)

Why would a company like Netflix, having a Market Monopoly, spend their time on Data Science? The answer is Consumer Retention. It is crucial to attracting new customers while retaining the current batch. Using Data Analysis tools, users of Netflix have preferred its platform over other service providers such as Hotstar and Amazon Prime. Netflix has beautifully driven millions of users towards its platform, achieving 20 Billion Dollars in revenue in 2019.

为什么像Netflix这样拥有市场垄断地位的公司花时间在数据科学上 ? 答案是消费者保留。 在保留当前批次的同时吸引新客户至关重要。 使用数据分析工具,Netflix的用户比其他服务提供商(例如Hotstar和Amazon Prime)更喜欢其平台。 Netflix吸引了数百万用户使用其平台,在2019年实现了200亿美元的收入。

结果 (The Outcome)

Netflix gained more than 3.1 Million followers on its platform after the release of House of Cards; this addition was majorly gained from the US streamers. It helped Netflix in plenty of ways.

在纸牌屋发布之后,Netflix在其平台上吸引了310万追随者; 这种增加主要来自美国的彩带。 它以多种方式帮助了Netflix。

● Revenue: Newly subscribed users added more than 72.5 Million Dollars in Revenue for Netflix. It was more than 75% of the combined investment Netflix made to air both seasons of the show.

●收入:新订阅用户为Netflix增加了超过7250万美元的收入。 这是该节目两个季度Netflix播出的总投资的75%以上。

● Word of Mouth: Adding high users and tending to their needs using Data Science helped Netflix gain even more popularity globally. It also led to the sequential addition of users through referrals, expanding, and creating further growth opportunities.

●口口相传:使用数据科学增加高用户群并满足他们的需求有助于Netflix在全球范围内获得更大的普及。 它还通过推荐 ,扩展和创造进一步的增长机会而导致用户的顺序添加。

显示器 (The Display)

Every section on Netflix’s home page is unique to its user’s account. Each chapter is displayed based on a vast set of Data collected, combined to produce the most relevant recommendations.

Netflix主页上的每个部分对于其用户帐户都是唯一的。 每章都是根据收集的大量数据进行显示的,并结合起来产生最相关的建议。

1.趋势: (1. Trending:)

The Trending section is formatted according to the Location and preferences of the user. Chris Hemsworth’s Extraction was on the top of the Trending list in India, just after its release. Every user in India who had viewed action-based content or Chris Hemsworth’s movies was recommended Extraction.

根据用户的位置和偏好设置“趋势”部分的格式。 克里斯赫姆斯沃思的提取是在印度的趋势列表的顶部,只是其发布后。 在印度每个用户谁曾看到基于行动的内容或克里斯赫姆斯沃思的电影推荐提取

Image for post
Netflix gained more than 3.1 Million followers on its platform after the release of House of Cards, Image by Jorge Gryntysz from Pixabay
Netflix公司获得了其平台上超过310万周的追随者 卡,图像 的众议院版本由后 豪尔赫GryntyszPixabay

2.继续观看 (2. Continue Watching)

This section is a set of collective content that a User has begun streaming, but has left unfinished. Pause durations are stored to start streaming the content on the exact scene on which it has been paused/terminated before.

此部分是用户已开始流式传输但未完成的一组集体内容。 暂停持续时间将被存储,以开始在之前已被暂停/终止的确切场景上流式传输内容。

3.类型内容 (3. Genre Content)

If the user frequently indulges in viewing Action movies, A section will be separately created named “Violent Movies.” This section will contain all popular Action Movies that have plenty of Violent scenes. If a user watches shows like Money Heist (A top-rated show dealing with thieves in Spain), they will find an additional section named “Risk-Taker and Rule-Breaker TV” on their Home Page.

如果用户经常沉迷于观看动作电影,则将单独创建一个名为“暴力电影”的部分。 本部分将包含所有具有暴力场景的流行动作片。 如果用户观看了诸如Money Heist之类的节目(这是西班牙处理盗贼的最佳节目),他们将在其主页上找到一个名为“ Risk-Taker and Rule-Breaker TV”的附加栏目。

4.因为你看了 (4. Because You Watched)

There is also a combination section, where all other Data is factored in. Suppose a User watched the Movie Polar, a new part called “Because you watched Polar” will be created, containing other movies of the Same genre, Actors, Directors, and Producers.

还有一个组合部分,其中包含所有其他数据。假设用户观看了电影Polar,将创建一个名为“因为您观看了Polar”的新部分,其中包含相同类型,演员,导演和其他电影的其他电影生产者。

Netflix aims at making people wonder how it always has a ready-made list that will entertain them. Every Pause, Scroll, and Log-in time is used to enhance User Interface in the best way possible.

Netflix的目的是让人们怀疑它总是有一个现成的列表来娱乐他们。 每次暂停,滚动和登录时间都用于以最佳方式增强用户界面

Image for post
Netflix aims at making people wonder how it always has a ready-made list that will entertain them, Photo by Stas Knop from Pexels
Netflix的目标是让人们怀疑它总是有一个现成的列表来娱乐他们, PexelsStas Knop

测试 (The Testing)

Netflix always conducts Background Testing at scale to understand the functionality of their Data analysis-driven recommendations. The Results and Statistics from these Tests determine whether a set of algorithms should be widely introduced in their platforms globally.

Netflix始终进行大规模的背景测试 ,以了解其以数据分析为依据的建议的功能。 这些测试的结果和统计数据决定了是否应在其全球平台上广泛引入一组算法。

基于交错的个性化 (Personalization Based on Interleaving)

Netflix conventionally followed the A/B testing policy, where two sets of reduced algorithms were tested on two different sets of samples. The results of these tests were based on how accurately the recommendation section appealed to the target samples. This method was subsequently scrapped because of its implausibility.

Netflix一直遵循A / B测试政策 ,即在两组不同样本上测试两组简化算法。 这些测试的结果基础上,建议部分如何准确地上诉到目标样本。 此方法由于其难以置信而随后被废弃。

Netflix adopted a new method of Testing. In this testing method, Netflix decided to infuse Interleaving of Algorithms to decide on the best Page Ranking Algorithm for improving User Interface. This method benefited the American Media Service Provider in many ways.

Netflix采用了一种新的测试方法。 在这种测试方法中,Netflix决定注入算法交织来确定最佳的页面排名算法,以改善用户界面。 这种方法使美国媒体服务提供商从许多方面受益。

Image for post
Interleaving of Algorithms to decide on the best Page Ranking Algorithm for improving User Interface, Image by 交织 ,以决定用于改善用户界面的最佳页面排名算法。作者: Michal Jarmoluk from Michal Pixabay Jarmoluk

● Cost-friendly: Interleaving involves blending, which means Netflix carried out two tests for the price of one. Background testing involves a significant amount of Cost, which was saved using this method.

●成本低廉:交织涉及融合,这意味着Netflix以一项价格进行了两项测试。 后台测试涉及大量Cost ,使用此方法可以节省这些费用

● Time-saving: Combining two testing methods into one saves time to work on other matters and quickly gives out the results. We all know that Time is Money; hence, this is considered as a more suitable and profitable choice of Testing.

●节省时间:将两种测试方法合而为一,可以节省处理其他问题的时间并快速给出结果。 我们都知道时间是金钱 ; 因此,这被认为是一种更合适,更有利可图的测试选择。

重要性 (The Importance)

As the world moves into the future, digitization has been normalized by all. The inflow of Users on the Internet is continually growing in large numbers. It has created a heated environment filled with intense competition among Media Service Providers like Netflix and Amazon Prime.

随着世界走向未来,数字化已被所有人规范化。 Internet上的用户流入量持续大量增长。 它创造了一个激烈的环境,充满了像Netflix和Amazon Prime这样的媒体服务提供商之间的激烈竞争

1.参与度: (1. Engagement:)

Data Science helps Netflix to increase the participation of users powerfully and creatively. Using Analytics, a virtual rapport between the user and the Service provider is created. Netflix aims at exploiting this rapport with their Market Share advantage.

数据科学帮助Netflix强大而有创意地增加了用户的参与度。 使用Analytics(分析),可以在用户和服务提供商之间建立虚拟的融洽关系Netflix旨在利用其市场份额优势来发展这种融洽关系。

2.解决方案: (2. Solution:)

Netflix aims at using Data Science as a go-to for problem-solving. There are plenty of problems that Data Science can help with.

Netflix旨在将数据科学作为解决问题的捷径 。 数据科学可以解决很多问题。

● Low reach: Recommendations on Netflix can improve the view count on overlooked content. It helps Netflix to keep its audience engaged on its platform.

●触及率低:Netflix上的建议可以提高被忽略内容的观看次数。 它可以帮助Netflix保持观众对平台的关注。

● Feedback and Ratings: Analytical programs and Probability models help Netflix average a cluster of User Ratings to categorize content, based on its ability to impress.

●反馈和评分:分析程序和概率模型可帮助Netflix根据其印象深刻的能力对一组用户评分进行平均,以对内容进行分类。

● Policy Control: Netflix has a strict policy that discourages the sharing of a single account by multiple people. Netflix allows up-to five Individual Profiles to access the website using one account. Using Data Science governs the Devices used for log-ins from the same accounts to avoid a breach.

●策略控制:Netflix具有严格的策略,不鼓励多人共享一个帐户。 Netflix允许多达五个个人档案使用一个帐户访问该网站。 使用Data Science可以控制用于从同一帐户登录的设备,以避免违规。

Image for post
rapport between the user and the Service provider is created. 融洽关系Netflix aims at exploiting this rapport with their Netflix的目标是利用这种关系有其 Market Share advantage, Video by 市场份额的优势,视频由 BUMIPUTRA from 土著Pixabay Pixabay

● Innovation and Efficiency: The critical quality of Data Science is that it never runs out of fashion. Machine learning continually adapts to the present, uses previously-stored Data available at present to predict future outcomes. Efficiency for Netflix would mean to deliver the right content to the right user.

●创新和效率:数据科学的关键素质是它永远不会过时。 机器学习不断适应当前情况,使用当前可用的先前存储的数据来预测未来的结果。 Netflix的效率意味着向正确的用户提供正确的内容。

● Decision making: Gathering Data to make decisions is not the mantra to success. The mantra lies in mastering Analytics to use the Data and channel it in the right direction. Netflix has used Data Science to identify the appropriate opportunities and paths available.

●决策:收集数据来制定决策不是成功的咒语。 口头禅在于掌握Analytics(分析)以使用数据并按正确的方向进行引导。 Netflix已使用数据科学来确定适当的机会和可用路径。

● Personalization: In a commercial market where the physical sale is conducted, a consumer can ask for personalized products, test it, and purchase it. Data Science has helped Netflix stretch its range to meet all the customized demands of the public.

●个性化:在进行实物销售的商业市场中,消费者可以要求个性化产品,进行测试并购买。 数据科学帮助Netflix扩展了其范围,以满足公众的所有定制需求。

For a consumer, a sense of satisfaction is met when the correct product is available at the right time and place, for the right price. Netflix has made its users’ lives more convenient by providing high-quality, relevant content at their fingertips.

对于消费者而言,当在正确的时间和地点以正确的价格获得正确的产品时,就会感到满足感。 Netflix通过提供触手可及的高质量相关内容,使用户的生活更加便捷

结论 (The Conclusion)

It all comes down to one question:

归结为一个问题:

Based on the historical actions taken by a user and the data available, what is the most probable video a user will play right now?

根据用户的历史操作和可用数据,用户现在最可能播放的视频是什么?

Image for post
Netflix aims at using Data Science as a go-to for problem-solving. There are plenty of problems that Data Science can help with., Photo by Dominika Roseclay from Pexels
Netflix旨在将 数据科学作为 解决问题的捷径 。 数据科学可以解决很多问题。,来自 PexelsDominika Roseclay

The list of recommendations can be prepared within seconds using Probability Models and Analytical Programs. Data science has become an integral part of the growing world. It has built the foundation on which companies like Netflix and more will develop their future. Netflix has minimized its scope for errors, enhanced User Interface, and boosted User Engagement.

可以使用概率模型分析程序在几秒钟内准备好建议列表。 数据科学已成为成长中世界不可或缺的一部分。 它为Netflix等公司和更多公司发展未来奠定了基础。 Netflix 最大限度地减少出错的范围增强了用户界面 ,并增强了用户参与度

I`ve always taken life as a journey from one experience to another. So far it has been a road full of interesting events and people. Join me on my Journey through LinkedIn, Instagram & Youtube

我一直把生活视为从一种经历到另一种经历的旅程。 到目前为止,这条路充满了有趣的事件和人们。 通过 LinkedIn Instagram Youtube 加入我的旅程

Once in action, decision-making seems like an easy task. But it requires creative workers, using high-end tools to create solutions adaptable across all verticals. Netflix holds a dominating market share and is crowned as “HBO of Internet Tv.” The success of any platform on the World Wide Web can’t come without a strong foundation. Without Data Science, companies would be stuck with unfiltered clusters of Databases, with no clue how they will proceed further.

一旦采取行动,决策似乎是一件容易的事。 但是,这需要创意工作者使用高端工具来创建适用于所有行业的解决方案。 Netflix拥有主要的市场份额,并被冠以“ 互联网电视的HBO”之称 。 互联网上任何平台的成功都离不开坚实的基础 。 没有数据科学,公司将被困在未经过滤的数据库集群中 ,而没有任何线索进一步发展。

Every person must ask themselves whether Data Analytics will improve their business or not? Netflix did it, so should you.

每个人都必须问自己,Data Analytics是否会改善他们的业务? Netflix做到了,您也应该这样做。

With all the information at hand, you are hopefully prepared to become a successful Data Scientist in the future. Hope this helps and all the best for your future endeavors! Thanks for reading this article! Leave a comment below if you have any questions.

掌握了所有信息,您有望将来成为一名成功的数据科学家。 希望这对您的未来有所帮助,并祝一切顺利! 感谢您阅读本文! 如有任何疑问,请在下面发表评论。

翻译自: https://medium.com/towards-artificial-intelligence/applications-of-data-science-and-machine-learning-in-netflix-dcdf6abbb194

netflix 数据科学家

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值