Netflix

2009年Netflix举办了一场Netflix大奖赛。他们公开一批匿名数据,允许参赛团队使用以得出更好的算法。他们从获胜的团队中得到了现有算法10.06%的提升。Netflix本想再举行一场Netflix大奖赛,但最终由于FTC(联邦贸易委员会)对隐私问题的考虑而取消。

Netflix的推荐系统包含许多算法。用于生产系统的两个核心算法是有限玻尔兹曼机(RBM,Restricted Boltzmann Machines)和一种称为SVD+ +的矩阵分解法(Matrix Factorization)。这两种算法用线性混合方式来结合产生一个单一的更高精度的估算值。

RBM是被修改为可以进行协同过滤的神经网络。每个用户都有一个RBM,其输入节点都代表用户评分过的电影。 

SVD + +是一种非对称形式的SVD(奇异值分解),使用用像RBM一样的隐含信息。它是Netflix大赛奖的获奖团队开发的。

Netflix团队在他们的技术博客上的报道:Learning a Personalized Homepage

开源项目 

https://netflix.github.io/。Netflix有一个优秀的工程博客,他们最近发布了一个帖子叫做The Evolution of Open Source at Netflix。

大数据 

Genie:一个强大的,基于REST的抽象体,应用于我们的各种数据处理框架,尤其是Hadoop。

Inviso:提供了对我们的Hadoop工作和集群性能的详细见解。

Lipstick:以一种清晰的视觉方式展示了Pig工作流程。

Aegisthus:从Cassandra批量抽象数据以供下游分析处理。

建造和交付工具

Nebula:Netflix分享其内部基础设施建设的工具。

Aminator:一个用于创建EBS AMI的工具。

Asgard:亚马逊Web服务(AWS)用于应用程序部署和云管理的Web界面。

一般运行服务和程序库

Eureka:Netflix云平台服务搜索

Archaius:分布式配置。

Ribbon:弹性和智能化进程及服务通信。

Hystrix:提供单一服务调用外可靠性。在运行时隔离延迟和容错。

Karyon和Governator:JVM容器服务。

Prana sidecar:提供实例内代理功能。

Zuul:在云部署的边缘提供动态脚本代理。

Fenzo:为云计算本地框架提供先进的调度和资源管理。

数据持久性 

EVCache和Dynomite:用于大规模Memcached和Redis。

Astyanax和Dyno:更好地使用云数据存储的客户端库。

分析、可靠性和性能 

Atlas:时间序列遥测平台

Edda:跟踪云变化的服务

Spectator:与Atlas易集成Java应用程序代码

Vector:以最小的花费获取高分辨率主机级指标。

Ice:获取当前成本和云使用趋势。

SimianArmy:Netflix实例随机故障测试。

安全

Security Monkey:有助于监测和保护的基于AWS的大环境。

scumblr:利用全网针对性搜索定位特定安全问题以进行调查。

MSL:一种可扩展的、灵活的安全消息传递协议,解决了许多安全通信使用情况和需求。

falcor:通过虚拟的JSON图以单域模型展现远程数据源。

restify:Node.jsREST架构,专门用于Web Service API

Rxjs:JavaScript的反应式编程库

http://www.tuicool.com/articles/7zaqye

转载于:https://www.cnblogs.com/softidea/p/5632966.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
著名的Netflix 智能推荐 百万美金大奖赛使用是数据集. 因为竞赛关闭, Netflix官网上已无法下载. Netflix provided a training data set of 100,480,507 ratings that 480,189 users gave to 17,770 movies. Each training rating is a quadruplet of the form . The user and movie fields are integer IDs, while grades are from 1 to 5 (integral) stars.[3] The qualifying data set contains over 2,817,131 triplets of the form , with grades known only to the jury. A participating team's algorithm must predict grades on the entire qualifying set, but they are only informed of the score for half of the data, the quiz set of 1,408,342 ratings. The other half is the test set of 1,408,789, and performance on this is used by the jury to determine potential prize winners. Only the judges know which ratings are in the quiz set, and which are in the test set—this arrangement is intended to make it difficult to hill climb on the test set. Submitted predictions are scored against the true grades in terms of root mean squared error (RMSE), and the goal is to reduce this error as much as possible. Note that while the actual grades are integers in the range 1 to 5, submitted predictions need not be. Netflix also identified a probe subset of 1,408,395 ratings within the training data set. The probe, quiz, and test data sets were chosen to have similar statistical properties. In summary, the data used in the Netflix Prize looks as follows: Training set (99,072,112 ratings not including the probe set, 100,480,507 including the probe set) Probe set (1,408,395 ratings) Qualifying set (2,817,131 ratings) consisting of: Test set (1,408,789 ratings), used to determine winners Quiz set (1,408,342 ratings), used to calculate leaderboard scores For each movie, title and year of release are provided in a separate dataset. No information at all is provided about users. In order to protect the privacy of customers, "some of the rating data for some customers in the training and qualifyin
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值