netflix 开源_Netflix的Polynote是一个新的开源框架,可用来构建更好的数据科学笔记本

netflix 开源

I recently started a new newsletter focus on AI education. TheSequence is a no-BS( meaning no hype, no news etc) AI-focused newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Please give it a try by subscribing below:

我最近开始了一份有关AI教育的新时事通讯。 TheSequence是无BS(意味着没有炒作,没有新闻等),它是专注于AI的新闻通讯,需要5分钟的阅读时间。 目标是让您了解机器学习项目,研究论文和概念的最新动态。 请通过以下订阅尝试一下:

Notebooks are the data scientist best friend and can also be a nightmare to work with. For someone accustomed to work with modern integrated develop environments(IDEs), working with notebooks feels like going back decades. Furthermore, modern notebook environments is mostly constrained to Python programs and lack first-class support for other programming languages. A few days ago, Netflix open sourced Polynote, a new notebook environment that addresses some of those challenges.

笔记本电脑是数据科学家最好的朋友,也可能是工作的噩梦。 对于习惯于使用现代集成开发环境(IDE)的人来说,使用笔记本电脑就像回溯几十年。 此外,现代笔记本环境大多受限于Python程序,并且缺乏对其他编程语言的一流支持。 几天前, Netflix开源了Polynote ,这是一个可以解决其中一些挑战的新笔记本环境。

Polynote was born out of the necessity to accelerate data science experimentation at Netflix. Over the years, Netflix has built a world-class machine learning platform mostly based on JVM languages like Scala. The support for those languages in mainstream notebook technologies such as Jupyter is fundamentally basic so they needed a better solutions. Polynote was initiated by that basic requirement but incorporated the lessons learned building one of the most ambitious notebook-based experimentation platforms in the data science world.

Polynote诞生于加速Netflix数据科学实验的必要性。 多年来,Netflix建立了一个世界级的机器学习平台,该平台主要基于Scala等JVM语言。 Jupyter等主流笔记本技术对这些语言的支持从根本上来说是基础,因此他们需要更好的解决方案。 Polynote是由该基本要求发起的,但结合了所学的经验教训,从而建立了数据科学界最雄心勃勃的基于笔记本的实验平台之一。

Netflix的笔记本驱动器架构内部 (Inside Netflix’ Notebook Drive Architecture)

Over the last few years, Netflix has transformed its use of data science notebooks from an experimentation artifact to a key component of the lifecycle of machine learning solutions. Initially, Netflix adopted Jupyter Notebooks like a data exploration and analysis tools. However, the engineering team quickly realized that Jupyter offered tangible advantages in terms of runtime abstraction, extensibility, interpretability of the code and debugging that could have a major impact in data science workloads if used correctly. In order to expand the use of Jupyter as a data science runtime, the Netflix team needed to solve a few major challenges:

在过去的几年中,Netflix已将其对数据科学笔记本的使用从实验工件转变为机器学习解决方案生命周期的关键组成部分。 最初,Netflix将Jupyter Notebooks用作数据探索和分析工具。 但是,工程团队很快意识到,Jupyter在运行时抽象,可扩展性,代码的可解释性和调试方面提供了明显的优势,如果使用得当,它们可能会对数据科学工作量产生重大影响。 为了扩大Jupyter作为数据科学运行时的使用,Netflix团队需要解决一些主要挑战:

· The Code-Output Mismatch: Notebooks are frequently changed and, many times, the output you are seeing in the environment does not correspond to the current code.

· 代码输出不匹配:笔记本经常更改,并且在许多情况下,您在环境中看到的输出与当前代码不对应。

· The Server Requirement: Notebooks typically require a Notebook server runtime to run which represents an architecture challenge when adopted at scale.

· 服务器要求:笔记本计算机通常需要运行笔记本计算机服务器运行时,这在大规模采用时对体系结构提出了挑战。

· Scheduling: Most data science models need to be executed on a periodic basics but the tools for scheduling Notebooks are still fairly limited.

· 计划:大多数数据科学模型需要定期执行,但是用于计划笔记本的工具仍然相当有限。

· Parametrizing: Notebooks are fairly static code-environments and the processes for passing input parameters are far from trivial.

· 参数化:笔记本电脑是相当静态的代码环境,传递输入参数的过程绝非易事。

· Integration Testing: Notebooks are isolated code- environments which notoriously difficult to integrate with other Notebooks. As a result, tasks like integration testing become a nightmare when using Notebooks.

· 集成测试:笔记本电脑是孤立的代码环境,众所周知,它很难与其他笔记本电脑集成。 因此,使用笔记本电脑时,集成测试等任务将成为噩梦。

To address those requirements, Netflix built a very ambitious architecture that enable the operationalization of Jupyter notebooks. The initial implementation included technologies such as Papermill which enables the parametrization of notebooks.

为了满足这些要求,Netflix建立了一个雄心勃勃的体系结构,可以使Jupyter笔记本电脑投入运营。 最初的实现包括诸如Papermill之类的技术,这些技术可以实现笔记本的参数化。

Image for post
Source: https://polynote.org/
资料来源: https : //polynote.org/

While the initial notebook architecture at Netflix was certainly ambitious, it was also constrained Python programs. Now it was time to expand.

尽管Netflix最初的笔记本架构确实雄心勃勃,但它也限制了Python程序。 现在该扩展了。

输入Polynote (Entering Polynote)

Polynote is a multi-language notebook experimentation environment. In addition to Python, the current release supports languages such as SQL, Vega(visualizations) and, of course, Scala. The platform is also integrated with data science infrastructures such as Apache Spark. At its core, Polynote includes the following capabilities:

Polynote是一种多语言笔记本实验环境。 除Python外,当前版本还支持SQL,Vega(visualizations),当然还有Scala等语言。 该平台还与数据科学基础架构(例如Apache Spark)集成在一起。 Polynote的核心包括以下功能:

a) Improved Editing Experience: Polynote tries to enable an editing experience closer to modern IDEs.

a) 改进的编辑体验: Polynote试图使编辑体验更接近现代IDE。

b) Multi-Language Support: Polynote introduces first-class support for Scala and other languages used in data science environmenhts.

b) 多语言支持: Polynote引入了对Scala和数据科学环境中使用的其他语言的一流支持。

c) Data Visualization Improvements: Polynote integrates native data visualizations into notebooks’ dataset without the need of adding a lot of code.

c) 数据可视化方面的改进: Polynote将原生数据可视化集成到笔记本的数据集中,而无需添加大量代码。

d) Configuration and Dependency Management: Languages like Scala require complex package dependencies in its programs. Polynote saves the package dependency configuration within the notebook itself addressing some of the common challenges in this area experienced by JVM developers.

d) 配置和依赖性管理: Scala之类的语言在其程序中需要复杂的软件包依赖性。 Polynote将程序包依赖项配置保存在笔记本自身中,以解决JVM开发人员在该领域遇到的一些常见挑战。

e) Reproducibility: The combination of code, data and execution results into a single document makes notebooks powerful, but also difficult to reproduce. Polynote includes reproducibility as a first-class capability of the framework.

e)可复制性:将代码,数据和执行结果组合到一个文档中,使笔记本功能强大,但也难以复制。 Polynote将可再现性作为框架的一流功能。

改进的编辑体验 (Improved Editing Experience)

Polynote includes common features in IDEs such as code auto-completion or syntax error highlighting which improves the experience for data scientists and researchers building Notebooks. More of the editing capabilities are powered by the Monaco editor which powers the experience of Visual Studio Code.

Polynote包含IDE中的常见功能,例如代码自动完成或语法错误突出显示,从而改善了构建笔记本电脑的数据科学家和研究人员的体验。 摩纳哥编辑器提供了更多的编辑功能,这些功能为Visual Studio Code的体验提供了支持。

Image for post
Source: https://polynote.org/
资料来源: https : //polynote.org/

多国语言支持 (Multi-Language Support)

Polynote does not only provide support for multiple languages but it also allows those languages to be combined in a single program. In Polynote, every cell can be based on a different language. When a cell is run, the kernel provides the available typed input values to the cell’s language interpreter. In turn, the interpreter provides the resulting typed output values back to the kernel. This allows cells in Polynote notebooks to operate within the same context. The example below shows a Python library, to compute an isotonic regression of a dataset generated with Scala.

Polynote不仅提供对多种语言的支持,而且还允许将这些语言组合在一个程序中。 在Polynote中,每个单元格可以基于不同的语言。 当单元运行时,内核将可用的类型化输入值提供给单元的语言解释器。 反过来,解释器将结果输入的输出值提供回内核。 这使Polynote笔记本中的单元格可以在相同的上下文中运行。 下面的示例显示了一个Python库,用于计算使用Scala生成的数据集的等渗回归。

Image for post
Source: https://polynote.org/
资料来源: https : //polynote.org/

数据可视化改进 (Data Visualization Improvements)

Data visualizations are a common component of most notebook environment. However, Polynote takes the visualization value proposition to another level by including it as a native component of the platform which does not require developers to write any code in order to visually explore a dataset.

数据可视化是大多数笔记本环境的常见组件。 但是,Polynote通过将可视化价值主张包含在平台的本机组件中,将可视化价值主张提升到了另一个层次,不需要开发人员编写任何代码即可直观地浏览数据集。

Image for post
Source: https://polynote.org/
资料来源: https : //polynote.org/

配置和依赖性管理 (Configuration and Dependency Management)

Most of the time, data scientists working on notebooks can enjoy the efficiency of Python’s package management model to handle the dependencies of a program. However, in JVM-languages like Scala dependency management can become a total night mare. Polynote addresses that challenge by storing the configuration and dependency information directly in the notebook itself, rather than relying on external files. Additionally, Polynote provides a user-friendly Configuration section where users can set dependencies for each notebook.

大多数时候,从事笔记本工作的数据科学家可以享受Python的包管理模型处理程序依赖关系的效率。 但是,在诸如Scala依赖关系管理之类的JVM语言中,它们可能会变成一头噩梦。 Polynote通过将配置和相关性信息直接存储在笔记本本身中而不是依赖于外部文件来解决这一挑战。 此外,Polynote还提供了一个用户友好的“配置”部分,用户可以在其中为每个笔记本设置依赖性。

Image for post
Source: https://polynote.org/
资料来源: https : //polynote.org/

重现性 (Reproducibility)

With Polynote, Netflix a new code interpretation block instead of relying on a REPL model like a traditional notebook. One of the key capabilities of the new interpretation model is that it removes hidden states which allows data scientists to copy cells within a notebook without introducing any state from the previous position.

借助Polynote,Netflix有了新的代码解释模块,而不再像传统笔记本那样依赖REPL模型。 新解释模型的关键功能之一是,它消除了隐藏状态,这使数据科学家可以在笔记本中复制单元而无需从先前位置引入任何状态。

Image for post
Source: https://polynote.org/
资料来源: https : //polynote.org/

Polynote is a new release in the ambitious competitive of data science notebooks but one that stands in its own merits. The support for JVM-based languages could make Polynote a favorite of developers working on Spark infrastructures. Also the editing and reproducatility capabilities are definitely welcomed enhancements to traditional notebook environments. Polynote is available in Github and you can also follow the project’s website.

Polynote是在雄心勃勃的数据科学笔记本电脑竞争中推出的新版本,但它有自己的优点。 对基于JVM的语言的支持可能使Polynote成为使用Spark基础结构的开发人员的最爱。 同样,编辑和再现性功能无疑是对传统笔记本环境的增强。 Polynote 在Github可用,您也可以访问项目的网站

翻译自: https://medium.com/dataseries/netflixs-polynote-is-a-new-open-source-framework-to-build-better-data-science-notebooks-4bdab6b8d0ae

netflix 开源

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值