Netflix:为什么建立专门的媒体数据库?

Netflix的NMDB是一个基于微服务的数据系统,用于存储和查询媒体资产的技术元数据,如视频、音频和文本。它支持结构化数据、媒体时间线建模、时空查询和多租户,旨在优化用户体验、内容推荐、编码效率和内容质量控制。
摘要由CSDN通过智能技术生成

本文解释了Netflix建立专门的媒体数据库的原因,包括精准的用户推荐,极致的编码以及更高效的实现创意。


文 / Rohit Puri

译 / 王月美

原文:https://medium.com/netflix-techblog/the-netflix-media-database-nmdb-9bf8e6d0944d


想象一下,我们正在研究下一代自适应视频流算法。我们的目标是最大限度地缩短全球数百万Netflix会员的播放启动时间。为此,我们需要收集ISO BMFF(基本媒体文件格式)格式化比特流的标题的聚合统计数据(包括最小值,最大值,中值,平均值,任意百分数)。Netflix转码集群为大量内容提供服务,并为每个内容生成大量的比特流(具有不同的编解码器+质量组合)。在过去,我们需要编写一次性脚本,以便在我们分析数据之前,以艰难的方式从比特流中抓取头部信息。很显然这种方法无法扩展——我们脚本中的软件错误会将导致重置整个工作。


此外,当分析我们的媒体数据的另一个完全不同的维度时,还需要一个新的“一次性”脚本来处理。对于来自不同域的问题多次重复这种方法使我们意识到这里存在一种模式,并让我们建立一个以可扩展的方式来解决这个问题的系统。


这篇博客文章介绍了Netflix媒体数据库(NMDB)——一种基于Netflix微服务平台构建的

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
著名的Netflix 智能推荐 百万美金大奖赛使用是数据集. 因为竞赛关闭, Netflix官网上已无法下载. Netflix provided a training data set of 100,480,507 ratings that 480,189 users gave to 17,770 movies. Each training rating is a quadruplet of the form . The user and movie fields are integer IDs, while grades are from 1 to 5 (integral) stars.[3] The qualifying data set contains over 2,817,131 triplets of the form , with grades known only to the jury. A participating team's algorithm must predict grades on the entire qualifying set, but they are only informed of the score for half of the data, the quiz set of 1,408,342 ratings. The other half is the test set of 1,408,789, and performance on this is used by the jury to determine potential prize winners. Only the judges know which ratings are in the quiz set, and which are in the test set—this arrangement is intended to make it difficult to hill climb on the test set. Submitted predictions are scored against the true grades in terms of root mean squared error (RMSE), and the goal is to reduce this error as much as possible. Note that while the actual grades are integers in the range 1 to 5, submitted predictions need not be. Netflix also identified a probe subset of 1,408,395 ratings within the training data set. The probe, quiz, and test data sets were chosen to have similar statistical properties. In summary, the data used in the Netflix Prize looks as follows: Training set (99,072,112 ratings not including the probe set, 100,480,507 including the probe set) Probe set (1,408,395 ratings) Qualifying set (2,817,131 ratings) consisting of: Test set (1,408,789 ratings), used to determine winners Quiz set (1,408,342 ratings), used to calculate leaderboard scores For each movie, title and year of release are provided in a separate dataset. No information at all is provided about users. In order to protect the privacy of customers, "some of the rating data for some customers in the training and qualifyin
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值