TensorFlow：谷歌机器学习白皮书翻译（1）

最新推荐文章于 2024-07-25 12:32:58 发布

精笃健康

最新推荐文章于 2024-07-25 12:32:58 发布

阅读量5.6k

点赞数 1

分类专栏：机器学习文章标签：机器学习

机器学习专栏收录该内容

1 篇文章 0 订阅

订阅专栏

TensorFlow:
Large-Scale Machine Learning on Heterogeneous Distributed Systems

基于多样分布系统的大规模机器学习（heterogeneous这里翻译成多样的，可能不准确，欢迎大家指正）

TensorFlow [1] is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms.
TensorFlow 是一种表现机器学习算法的交互式界面，同时也是一种执行算法的实现工具。
A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems,
在多样分布系统中，不需要或者需要很少的改动即可实现（机器学习的计算）
ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds
of machines and thousands of computational devices such as GPU cards.
对多样分布系统的支持范围可以从小到移动设备如手机和平板电脑到大到成千上万的机器或如GPU一样的计算设备
The system is flexible and can be used to express a wide variety of algorithms, including training and inference
algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and
computational drug discovery.
本系统是弹性的，可以用来表现广泛多样的算法，包括深度神经网络的训练和预测（inference这里翻译成预测），同时它也可以用来做科研（conducting research），也可以往计算机科学或其他学科众多领域的产品中部署机器学习系统，包括（但不限于）语音识别，计算机视觉（computer vision），机器人学，信息检索（information retrieval），自然语言处理，地理信息提取和计算药物的发现（前两天见到一篇文章说有个公司专门通过自动识别医学图像来分析病例，等找到了把链接贴过来）

This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google.
The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org
谷歌做为神级的公司，在这篇白皮书中主要介绍了TensorFlow 在google系统上不如的界面和界面的实现过程。
其API和参考实现方法做为一种开源包发布在 www.tensorflow.org上。

1：引言

The Google Brain project started in 2011 to explore the use of very-large-scale deep neural networks, both for research and for use in Google’s products. As part of the early work in this project, we built DistBelief, our first-generation scalable distributed training and inference system [14], and this system has served us well. 谷歌大脑项目起始于2011年，其用来探索深度学习的大规模应用，项目同时用于研究和谷歌产品的应用。做为该项目的早期工作，我们建立了DistBelief，第一代可扩展训练预测系统，该系统在谷歌内部已经有了很好的应用。
We and others at Google have performed a wide variety of research using DistBelief including work on unsupervised learning [31], language representation [35, 52], models for image classification and object detection [16, 48], video classification [27], speech recognition [56, 21, 20],sequence prediction [47], move selection for Go [34],pedestrian detection [2], reinforcement learning [38],and other areas [17, 5].
我们和谷歌的其他同事利用DistBelief展示了一系列的研究，包括在非监督学习方面的工作，语言表现（这里理解应该是谷歌翻译），突变分类和目标发现模型，视频分类，语音识别，时间序列预测，move selection for Go，行人检测系统，强化学习和其他领域。
In addition, often in close collaboration with the Google Brain team, more than 50 teams at Google and other Alphabet companies have deployed
deep neural networks using DistBelief in a wide variety of products, including Google Search [11], our advertising products, our speech recognition systems [50, 6, 46],Google Photos [43], Google Maps and StreetView [19],Google Translate [18], YouTube, and many others.
此外，通过和谷歌大脑团队的密切合作，谷歌和其他Alphabet 的有超过50个团队利用DistBelief 在多种产品线上部署了深度学习网络，包括google搜索，广告产品，语音识别系统，谷歌图片，谷歌地图，谷歌街景，谷歌翻译，YouTube，和很多其他产品。

——————————————————————————————————————————————————————————

Based on our experience with DistBelief and a more complete understanding of the desirable system properties and requirements for training and using neural networks, we have built TensorFlow, our second-generation system for the implementation and deployment of largescale machine learning models.
基于我们利用DistBelief 的经历和对分布式系统性能的更完整的理解，同时为了利用神经网络的的要求，我们开发了TensorFlow，我们开发的针对实现和部署大规模机器学习的第二代系统。
TensorFlow takes computations described using a dataflow-like model and maps them onto a wide variety of different hardware platforms, ranging from running inference on mobile device platforms such as Android and iOS to modest sized training and inference systems using single machines containing one or many GPU cards to large-scale training systems running on hundreds of specialized machines with thousands of GPUs.
TensorFlow 利用数据流类似的模型来描述计算方式，同时将他们应用到多种不同的硬件平台上，范围从在移动设备平台上如android和iOS设备上的预测到利用由单个电脑包含一个或者多个GPU的适中大小的训练和预测系统，再到大型的成百上千的包含数以千记的GPU的特殊机器组成的计算机基阵集群上。
Having a single system that can span such a broad range of platforms significantly simplifies the real-world use of machine learning system, as we have found that having separate systems for large-scale training and small-scale deployment leads to significant maintenance burdens and leaky abstractions.
有这样一个跨度如此光的平台明显的会简化现实世界对机器学习系统的应用，因为我们发现如果针对大型训练和小尺度的部署分别开发系统会导致很繁重的维护负担和很差的衔接性。
TensorFlow computations are expressed as stateful dataflow graphs (described in more detail in Section 2), and we have focused on making the system both flexible enough for quickly experimenting with new models for research purposes and sufficiently high performance and robust for production training and deployment of machine learning models.
TensorFlow 计算方法可以表述为数据流的图表状态（在第二部分中有详细的描述），另外，我们专注于使系统可同时针对研究目的的新模型能够快速实验验证保持足够的弹性，同时又对产品训练和部署机器学习模型保持足够的高性能和鲁棒性。
For scaling neural network training to larger deployments, TensorFlow allows clients to easily express various kinds of parallelism through replication and parallel execution of a core model dataflow graph, with many different computational devices all collaborating to update a set of shared parameters or other
state.
对于大规模的神经网络训练要求的更大的部署空间，TensorFlow 允许用户通过赋值和并行执行一个核心的模型数据流图来轻松实现多种并行度。其可以由许多不同的计算设备协作来更新一套共享参数或者其他状态。
Modest changes in the description of the computation allow a wide variety of different approaches to parallelism to be achieved and tried with low effort[14, 29, 42].
计算描述中的适度改变允许多种不同方法的并行实现，同时只需要很少的工作量
Some TensorFlow uses allow some flexibility in terms of the consistency of parameter updates, and we can easily express and take advantage of these relaxed
synchronization requirements in some of our larger deployments.
一些TensorFlow 应用允许一些一致性的参数的灵活设置，我们可以再一些我们的大型部署中利用这种宽松的同步需求
Compared to DistBelief, TensorFlow’s programming model is more flexible, its performance is significantly better, and it supports training and using a
broader range of models on a wider variety of heterogeneous hardware platforms.
和DistBelief相比，TensorFlow的程序模型更加有弹性，它的性能有了显著的提升，同时也支持在更宽泛的各种硬件设备平台应用
——————————————————————————————————————————

Dozens of our internal clients of DistBelief have already switched to TensorFlow. These clients rely on TensorFlow for research and production, with tasks as diverse as running inference for computer vision models on mobile phones to large-scale training of deep neural networks with hundreds of billions of parameters on hundreds of billions of example records using many hundreds of machines [11, 47, 48, 18, 53, 41].
许多公司内部的DistBelief 客户已经转向了TensorFlow。这些客户基于TensorFlow 来做研究和出产产品，这些包括多种多样任务，从手机上计算机视觉的判别到具有成千上亿的参数，拥有几千亿个样品记录，需要几百台机器深度网络的大规模训练。
Although these applications have concentrated on machine learning and deep neural networks in particular,we expect that TensorFlow’s abstractions will be useful in a variety of other domains, including other kinds of machine learning algorithms, and possibly other kinds of numerical computations.
尽管这些应用集中于机器学习和深度神经网络，我们希望TensorFlow可以应用于其他更多的领域，包括其他各种机器学习算法和其他各种数值计算
We have open-sourced the TensorFlow API and a reference implementation under the Apache 2.0 license in November, 2015, available at www.tensorflow.org.
我们的开源TensorFlow API和参考实验方式可以在网站上获取。
————————————————————————————————————
The rest of this paper describes TensorFlow in more detail. Section 2 describes the programming model and basic concepts of the TensorFlow interface, and Section 3 describes both our single machine and distributed implementations.
本文的其他部分将会介绍更多TensorFlow 的细节，第二部分描述程序设计模型和TensorFlow 界面接口的基本概念，第三部分描述了单机和分布式实现。
Section 4 describes several extensions to the basic programming model, and Section 5 describes several optimizations to the basic implementations.
第四部分描述了一些基本程序设计模型的延展，第5部分描述了一些基本实现的优化。
Section 6 describes some of our experiences in using TensorFlow, Section 7 describes several programming idioms we have found helpful when using TensorFlow, and Section 9 describes several auxiliary tools we have built around the core TensorFlow system.
第六部分描述了一些我们用TensorFlow的实验，第七部分描述了我们使用TensorFlow过程中的一些编程习惯。第九部分描述了一些我们搭建核心TensorFlow系统的辅助工具。
Sections 10 and 11 discuss future and related work, respectively, and Section 12 offers concluding thoughts.
第十和第11部分讨论了未来和相关的工作，第12部分是结束寄语。
—————————————————————————————

精笃健康

关注

1
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
TensorFlow：谷歌机器学习白皮书翻译（1）

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems基于多样分布系统的大规模机器学习（heterogeneous这里翻译成多样的，可能不准确，欢迎大家指正）TensorFlow [1] is an interface for expressing machine learning algorit
复制链接

扫一扫

专栏目录