uber开源_uber概述对开源机器学习的杰出贡献

最新推荐文章于 2024-08-15 23:39:34 发布

weixin_26712375

最新推荐文章于 2024-08-15 23:39:34 发布

阅读量261

点赞数

文章标签：机器学习 python 人工智能 java

原文链接：https://medium.com/dataseries/an-overview-of-ubers-impressive-contributions-to-open-source-machine-learning-cfb6eabd12ac

版权

uber开源

I recently started a new newsletter focus on AI education. TheSequence is a no-BS( meaning no hype, no news etc) AI-focused newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Please give it a try by subscribing below:

我最近开始了一份有关AI教育的新时事通讯。 TheSequence是无BS(意味着没有炒作，没有新闻等)，它是专注于AI的新闻通讯，需要5分钟的阅读时间。目标是让您了解机器学习项目，研究论文和概念的最新动态。请通过以下订阅尝试一下：

Artificial intelligence(AI) has been an atypical technology trend. In a traditional technology cycle, innovation typically begins with startups trying to disrupt industry incumbents. In the case of AI, most of the innovation in the space has been coming from the big corporate labs of companies like Google, Facebook, Uber or Microsoft. Those companies are not only leading impressive tracks of research but also regularly open sourcing new frameworks and tools that streamline the adoption of AI technologies. In that context, Uber has emerged as one of the most active contributors to open source AI technologies in the current ecosystems. In just a few years, Uber has regularly open sourced projects across different areas of the AI lifecycle. Today, I would like to review a few of my favorites.

人工智能(AI)已经成为非典型的技术趋势。在传统的技术周期中，创新通常始于试图破坏行业老大的初创公司。以人工智能为例，该领域的大多数创新都来自大型公司实验室，例如Google，Facebook，Uber或Microsoft。这些公司不仅在令人印象深刻的研究方面处于领先地位，而且还定期公开采购新框架和工具，以简化AI技术的采用。在这种情况下，Uber已成为当前生态系统中开源AI技术最活跃的参与者之一。在短短的几年内，Uber定期在AI生命周期的不同领域开放项目。今天，我想回顾一些我的最爱。

Uber is a near-perfect playground for AI technologies. The company combines all the traditional AI requirements of a large scale tech company with a front row seat to AI-first transportation scenarios. As a result, Uber has been building machine/deep learning applications across largely diverse scenarios ranging from customer classifications to self-driving vehicles. Many of the technologies used by Uber teams have been open sourced and received accolades from the machine learning community. Let’s look at some of my favorites:

Uber是AI技术的近乎完美的游乐场。该公司将大型科技公司的所有传统AI要求与前排座椅相结合，满足AI优先运输的需求。因此，Uber已经在从客户分类到自动驾驶汽车的各种场景中构建了机器/深度学习应用程序。 Uber团队使用的许多技术都是开源的，并获得了机器学习社区的赞誉。让我们看看我的一些最爱：

Note: I am not covering technologies like Michelangelo or PyML, as they are well documented having been open sourced.

注意：我没有介绍像Michelangelo或PyML这样的技术，因为它们已经公开地有据可查。

路德维希：无代码机器学习模型的工具箱 (Ludwig: A Toolbox for No-Code Machine Learning Models)

Image for post — Source: https://uber.github.io/ludwig/

Ludwig is a TensorFlow based toolbox that allows to train and test deep learning models without the need to write code. Conceptually, Ludwig was created under five fundamental principles:

Ludwig是基于TensorFlow的工具箱，无需编写代码即可训练和测试深度学习模型。从概念上讲，路德维希是根据五个基本原则创建的：

No coding required: no coding skills are required to train a model and use it for obtaining predictions.
无需编码 ：无需编码技能即可训练模型并将其用于获取预测。
Generality: a new data type-based approach to deep learning model design that makes the tool usable across many different use cases.
通用性 ：一种基于数据类型的新方法来进行深度学习模型设计，使该工具可在许多不同的用例中使用。
Flexibility: experienced users have extensive control over model building and training, while newcomers will find it easy to use.
灵活性 ：经验丰富的用户对模型的建立和培训具有广泛的控制权，而新用户则会发现它易于使用。
Extensibility: easy to add new model architecture and new feature data types.
可扩展性 ：易于添加新的模型架构和新的特征数据类型。
Understandability: deep learning model internals are often considered black boxes, but we provide standard visualizations to understand their performance and compare their predictions.
可理解 ：深度学习模型内部通常被认为是黑盒子，但我们提供的标准的可视化，了解其性能，并比较他们的预测。

Using Ludwig, a data scientist can train a deep learning model by simply providing a CSV file that contains the training data as well as a YAML file with the inputs and outputs of the model. Using those two data points, Ludwig performs a multi-task learning routine to predict all outputs simultaneously and evaluate the results. Under the covers, Ludwig provides a series of deep learning models that are constantly evaluated and can be combined in a final architecture. The Uber engineering team explains this process by using the following analogy: “if deep learning libraries provide the building blocks to make your building, Ludwig provides the buildings to make your city, and you can choose among the available buildings or add your own building to the set of available ones.”

使用路德维希，数据科学家可以通过简单地提供一个包含训练数据的CSV文件以及一个带有模型输入和输出的YAML文件来训练深度学习模型。利用这两个数据点，路德维希执行多任务学习例程，以同时预测所有输出并评估结果。在幕后，路德维希提供了一系列深度学习模型，这些模型会不断进行评估，并且可以结合到最终的体系结构中。 Uber工程团队使用以下类比来解释此过程： “如果深度学习图书馆提供了建造建筑物的基础，路德维希提供了建造城市的建筑物，您可以在可用的建筑物中进行选择，也可以将自己的建筑物添加到可用的集合。”

Pyro：一种本机概率编程语言 (Pyro: A Native Probabilistic Programming Language)

Pyro is a deep probabilistic programming language(PPL) released by Uber AI Labs. Pyro is built on top of PyTorch and is based on four fundamental principles:

Pyro是由Uber AI Labs发布的一种深度概率编程语言(PPL)。 Pyro建立在PyTorch之上，并基于以下四个基本原则：

Universal: Pyro is a universal PPL — it can represent any computable probability distribution. How? By starting from a universal language with iteration and recursion (arbitrary Python code), and then adding random sampling, observation, and inference.
通用：Pyro是通用PPL-它可以表示任何可计算的概率分布。怎么样？从具有迭代和递归的通用语言(任意Python代码)开始，然后添加随机采样，观察和推断。
Scalable: Pyro scales to large data sets with little overhead above hand-written code. How? By building modern black box optimization techniques, which use mini-batches of data, to approximate inference.
可扩展 ：Pyro可以扩展到大型数据集，而手写代码的开销却很小。怎么样？通过构建使用小批数据的现代黑盒优化技术来近似推断。
Minimal: Pyro is agile and maintainable. How? Pyro is implemented with a small core of powerful, composable abstractions. Wherever possible, the heavy lifting is delegated to PyTorch and other libraries.
最小：Pyro是敏捷且可维护的。怎么样？ Pyro以强大的可组合抽象的一小部分核心实现。尽可能将繁重的工作委托给PyTorch和其他库。
Flexible: Pyro aims for automation when you want it and control when you need it. How? Pyro uses high-level abstractions to express generative and inference models, while allowing experts to easily customize inference.
灵活：Pyro的目标是在需要时实现自动化，并在需要时进行控制。怎么样？ Pyro使用高级抽象来表示生成模型和推理模型，同时使专家可以轻松自定义推理。

These principles often pull Pyro’s implementation in opposite directions. Being universal, for instance, requires allowing arbitrary control structure within Pyro programs, but this generality makes it difficult to scale. However, in general, Pyro achieves a brilliant balance between these capabilities making one of the best PPLs for real world applications.

这些原则经常使Pyro的实现朝相反的方向发展。例如，要实现通用，就需要在Pyro程序中允许任意控制结构，但是这种通用性使其难以扩展。但是，总的来说，Pyro在这些功能之间实现了出色的平衡，从而成为现实应用中最好的PPL之一。

集成块：用于机器学习模型的调试和解释工具集 (Manifold: A Debugging and Interpretation Toolset for Machine Learning Models)

Manifold is Uber technologies for debugging and interpreting machine learning models at scale. With Manifold, the Uber engineering team wanted to accomplish some very tangible goals:

Manifold是Uber的技术，用于大规模调试和解释机器学习模型。借助Manifold，Uber工程团队希望实现一些非常切实的目标：

· Debug code errors in a machine learning model.

·在机器学习模型中调试代码错误。

· Understand the strengths and weaknesses of one model both in isolation and in comparison, with other models.

·分别了解一个模型与其他模型的优势和劣势。

· Compare and ensemble different models.

·比较和整合不同的模型。

· Incorporate insights gathered through inspection and performance analysis into model iterations.

·将通过检查和性能分析收集的见解纳入模型迭代中。

To accomplish those goals, Manifold segments the machine learning analysis process into three main phases: Inspection, Explanation and Refinement.

为了实现这些目标，Manifold将机器学习分析过程分为三个主要阶段：检查，说明和改进。

· Inspection: In the first part of the analysis process, the user designs a model and attempts to investigate and compare the model outcome with other existing ones. During this phase, the user compares typical performance metrics, such as accuracy, precision/recall, and receiver operating characteristic curve (ROC), to have coarse-grained information of whether the new model outperforms the existing ones.

· 检查：在分析过程的第一部分中，用户设计模型，并尝试调查模型结果并将其与其他现有结果进行比较。在此阶段中，用户将比较典型的性能指标，例如准确性，精度/召回率和接收器工作特性曲线(ROC)，以获取有关新模型是否优于现有模型的粗粒度信息。

· Explanation: This phase of the analysis process attempts to explain the different hypotheses formulated in the previous phase. This phase relies on comparative analysis to explain some of the symptoms of the specific models.

· 解释：分析过程的这一阶段试图解释上一阶段提出的不同假设。此阶段依靠比较分析来解释特定模型的某些症状。

· Refinement: In this phase, the user attempts to verify the explanations generated from the previous phase by encoding the knowledge extracted from the explanation into the model and testing the performance.

· 优化：在此阶段，用户尝试通过将从解释中提取的知识编码到模型中并测试性能来验证从上一阶段生成的解释。

柏拉图：大规模构建会话代理的框架 (Plato: A Framework for Building Conversational Agents at Scale)

Uber built the Plato Research Dialogue System(PRDS) to address the challenges of building large scale conversational applications. Conceptually, PRDS is a framework to create, train and evaluate conversational AI agents on diverse environments. From a functional standpoint, PRDS includes the following building blocks:

Uber建立了柏拉图研究对话系统(PRDS)，以解决构建大型对话应用程序的挑战。从概念上讲，PRDS是一个用于在各种环境下创建，训练和评估对话式AI代理的框架。从功能的角度来看，PRDS包括以下构件：

Speech recognition (transcribe speech to text)
语音识别(将语音转录为文本)
Language understanding (extract meaning from that text)
语言理解(从该文本中提取含义)
State tracking (aggregate information about what has been said and done so far)
状态跟踪(有关到目前为止所说和所做的事情的汇总信息)
API call (search a database, query an API, etc.)
API调用(搜索数据库，查询API等)
Dialogue policy (generate abstract meaning of agent’s response)
对话政策(生成代理响应的抽象含义)
Language generation (convert abstract meaning into text)
语言生成(将抽象含义转换为文本)
Speech synthesis (convert text into speech)
语音合成(将文本转换为语音)

PRDS was designed with modularity in mind in order to incorporate state-of-the-art research in conversational systems as well as continuously evolve every component of the platform. In PRDS, each component can be trained either online (from interactions) or offline and incorporate into the core engine. From the training standpoint, PRDS supports interactions with human and simulated users. The latter are common to jumpstart conversational AI agents in research scenarios while the former is more representative of live interactions.

PRDS在设计时就考虑了模块化，以便将最新研究纳入对话系统中，并不断发展平台的每个组件。在PRDS中，可以在线(通过交互)或离线训练每个组件，并将其合并到核心引擎中。从培训的角度来看，PRDS支持与人类和模拟用户的交互。后者是在研究场景中快速启动对话式AI代理的常用方法，而前者则更能代表实时互动。

Horovod：大规模培训深度学习的框架 (Horovod: A Framework for Training Deep Learning at Scale)

Horovod is one of the Uber ML stacks that has become extremely popular within the community and has been adopted by research teams at AI-powerhouses like DeepMind or OpenAI. Conceptually, Horovod is a framework for running distributed deep learning training jobs at scale.

Horovod是Uber ML堆栈之一，在社区中非常流行，并已被DeepMind或OpenAI等AI强国的研究团队采用。从概念上讲，Horovod是用于大规模运行分布式深度学习培训工作的框架。

Horovod leverages message passing interface stacks such as OpenMPI to enable a training job to run on a highly parallel and distributed infrastructure without any modifications. Running a distributed TensorFlow training job in Horovod is accomplished in four simple steps:

Horovod利用诸如OpenMPI之类的消息传递接口栈来使培训作业能够在高度并行且分布式的基础架构上运行，而无需进行任何修改。在Horovod中运行分布式TensorFlow培训工作可以通过四个简单步骤完成：

hvd.init() initializes Horovod.
hvd.init()初始化Horovod。
config.gpu_options.visible_device_list = str(hvd.local_rank()) assigns a GPU to each of the TensorFlow processes.
config.gpu_options.visible_device_list = str(hvd.local_rank())为每个TensorFlow进程分配一个GPU。
opt=hvd.DistributedOptimizer(opt)wraps any regular TensorFlow optimizer with Horovod optimizer which takes care of averaging gradients using ring-allreduce.
opt = hvd.DistributedOptimizer(opt)使用Horovod优化器包装任何常规的TensorFlow优化器，该优化器使用ring-allreduce来平均梯度。
hvd.BroadcastGlobalVariablesHook(0) broadcasts variables from the first process to all other processes to ensure consistent initialization.
hvd.BroadcastGlobalVariablesHook(0)将变量从第一个进程广播到所有其他进程，以确保一致的初始化。

Uber AI研究：AI研究的常规资源 (Uber AI Research: A Regular Source of AI Research)

Last by not least, we should mention Uber’s active contributions to AI research. Many of Uber’s open source releases are inspired by their research efforts. Uber AI Research website is a phenomenal catalog of papers that highlight Uber’s latest effort in AI research.

最后，我们要特别提到的是Uber对AI研究的积极贡献。 Uber的许多开源版本都受到他们研究工作的启发。 Uber AI Research网站是一个惊人的论文目录，突出了Uber在AI研究方面的最新成果。

These are some of the contributions of the Uber engineering team that have seen regular adoption by the AI research and development community. As Uber continues implementing AI solutions at scale, we should see new and innovated frameworks that simplify the adoption of machine learning by data scientists and researchers.

这些是Uber工程团队的一些贡献，这些发现已被AI研究与开发社区定期采用。随着Uber继续大规模实施AI解决方案，我们应该看到创新的框架，这些框架可以简化数据科学家和研究人员对机器学习的采用。

翻译自: https://medium.com/dataseries/an-overview-of-ubers-impressive-contributions-to-open-source-machine-learning-cfb6eabd12ac

uber开源

weixin_26712375

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
uber开源_uber概述对开源机器学习的杰出贡献

uber开源I recently started a new newsletter focus on AI education. TheSequence is a no-BS( meaning no hype, no news etc) AI-focused newsletter that takes 5 minutes to read. The goal is to keep you up to...
复制链接

扫一扫