Machine learningTrends, perspectives, and prospects

论文翻译:Machine learning:Trends, perspectives, and prospects

机器学习:趋势, 视角和前景

Abstract

摘要

Machine learning addresses the question of how to build computers that improve
automatically through experience. It is one of today’s most rapidly growing technical fields,
lying at the intersection of computer science and statistics, and at the core of artificial
intelligence and data science. Recent progress in machine learning has been driven both by
the development of new learning algorithms and theory and by the ongoing explosion in the
availability of online data and low-cost computation. The adoption of data-intensive
machine-learning methods can be found throughout science, technology and commerce,
leading to more evidence-based decision-making across many walks of life, including
health care, manufacturing, education, financial modeling, policing, and marketing.

机器学习解决了如何构建改进 自动通过经验。它是当今发展最快的技术领域之一, 位于计算机科学和统计学的交汇处,在人工的核心 情报和数据科学。机器学习的最新进展是由两者推动的 新的学习算法和理论的发展,并由正在进行的爆炸 在线数据和低成本计算的可用性。采用数据密集型 机器学习方法可以在科学、技术和商业中找到, 导致在各行各业进行更多循证决策,包括 医疗保健、制造、教育、金融建模、警务和营销。

Machine learning is a discipline focusedon two interrelated questions: How canone construct computer systems that auto-matically improve through experience?and What are the fundamental statistical-computational-information-theoretic laws thatgovern all learning systems, including computers,humans, and organizations? The study of machinelearning is important both for addressing thesefundamental scientific and engineering ques-tions and for the highly practical computer soft-wareithasproducedandfieldedacrossmanyapplications.

机器学习是一门学科,主要针对两个相互关联的问题:如何构建通过经验自动改进的计算机系统? 以及管理所有学习系统(包括计算机、人类和组织)的基本统计计算信息理论定律是什么? 机器学习的研究对于解决这些基础科学和工程问题以及高度实用的计算机软软件和领域交叉应用具有十分重要的作用。

Machine learning has progressed dramati-cally over the past two decades, from laboratory curiosity to a practical technology in widespread commercial use. Within artificial intelligence (AI),
machine learning has emerged as the method of choice for developing practical software for
computer vision, speech recognition, natural lan-guage processing, robot control, and other ap-
plications. Many developers of AI systems now recognize that, for many applications, it can be
far easier to train a system by showing it exam-ples of desired input-output behavior than to
program it manually by anticipating the desired response for all possible inputs. The effect of ma-
chine learning has also been felt broadly across computer science and across a range of indus-
tries concerned with data-intensive issues, such as consumer services, the diagnosis of faults in
complex systems, and the control of logistics chains. There has been a similarly broad range of
effects across empirical sciences, from biology to cosmology to social science, as machine-arning
methods have been developed to analyze high-throughput experimental data in novel ways. See
Fig. 1 for a depiction of some recent areas of ap-plication of machine learning.

机器学习已经取得戏剧性进展 过去二十年的卡利,从实验室 对实用技术的好奇心在广泛 商业用途。在人工智能 (AI) 中, 机器学习已成为方法 开发实用软件的选择 计算机视觉、语音识别、自然语言 瓜类处理、机器人控制和其他 乘法。现在许多人工智能系统的开发人员 认识到,对于许多应用程序,它可以 通过显示系统考试来训练系统要容易得多 所需的输入输出行为比 通过预测所需的数据手动编程 所有可能输入的响应。ma- 中国学习也广泛感受到跨越 计算机科学和跨各种印度 尝试与数据密集型问题,如 作为消费者服务,故障诊断 复杂的系统,以及物流的控制 链。有一个类似的广泛范围 从生物学到 宇宙学到社会科学,作为机器学习 已开发出分析高 以新颖的方式处理实验数据。看到 图 1 描述 A- 机器学习的复制。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-kIvzvJsP-1597588401830)(C:\Users\39820\Desktop\图1.png)]

Fig. 1. Applications of machine learning.Machine learning is having a substantial effect on manyareas of technology and science; examples of recent applied success stories include robotics andautonomous vehicle control (top left), speech processing and natural language processing (topright), neuroscience research (middle), and applications in computer vision (bottom). [The middlepanel is adapted from (29). The images in the bottom panel are from the ImageNet database; objectrecognition annotation is by R. Girshick.]

图1。机器学习的应用。机器学习对许多技术和科学领域产生了重大影响;最近应用的成功案例包括机器人和自动车辆控制(左上)、语音处理和自然语言处理(右上)、神经科学研究(中间)以及计算机视觉中的应用(底部)。・中面板改编自(29)。底部面板中的图像来自 ImageNet 数据库;对象识别注释是由 R. Girshick.

A learning problem can be defined as the problem of improving some measure of perform-ance when executing some task, through some type of training experience. For example, in learn-ing to detect credit-card fraud, the task is to as-sign a label of “fraud” or “not fraud” to any given credit-card transaction. The performance metric to be improved might be the accuracy of this fraud classifier, and the training experience might Consist of a collection of historical credit-card transactions, each labeled in retrospect as fraud-ulent or not. Alternatively, one might define a different performance metric that assigns a higher penalty when “fraud” is labeled “not fraud” than when “not fraud” is incorrectly labeled “fraud.”One might also define a different type of training experience—for example, by including unlab-eled credit-card transactions along with labeled examples.

学习问题可以定义为提高某种程度的绩效问题.执行某些任务时,通过某种类型的培训经验。例如,在学习检测信用卡欺诈时,任务是为任何特定的信用卡交易签署"欺诈"或"非欺诈"的标签。要改进的性能指标可能是此欺诈分类器的准确性,培训体验可能包含一系列历史信用卡交易,每个交易在回想起来都标记为欺诈- 无行为。或者,可以定义不同的绩效指标,当"欺诈"被标记为"不欺诈"时,该指标比将"不欺诈"错误地标记为"欺诈"时分配更高的惩罚。还可以定义不同类型的培训体验,例如,将未实验室的信用卡交易和标记示例都包括在外。

A diverse array of machine-learning algorithmshas been developed to cover the wide variety ofdata and problem types exhibited across differ-ent machine-learning problems (1,2). Conceptual-ly, machine-learning algorithms can be viewed assearching through a large space of candidateprograms, guided by training experience, to finda program that optimizes the performance metric.Machine-learning algorithms vary greatly, in partby the way in which they represent candidateprograms (e.g., decision trees, mathematical func-tions, and general programming languages) and inpartbythewayinwhichtheysearchthroughthisspace of programs (e.g., optimization algorithmswith well-understood convergence guaranteesand evolutionary search methods that evaluatesuccessivegenerationsofrandomlymutatedpro-grams). Here, we focus on approaches that havebeen particularly successful to date

多种机器学习算法,以涵盖不同机器学习问题(1,2)中显示的各种数据类型和问题类型。从概念上讲,机器学习算法可以通过大量候选程序(以培训经验为指导)进行搜索,从而找到一个优化性能指标的程序。机器学习算法差异很大,部分原因在于它们表示候选程序(例如决策树、数学函数和常规编程语言)的方式,以及通过程序空间(例如,具有理解良好的收敛保证的优化算法和评估成功生成结果的进化搜索方法)的方式。在这里,我们重点讨论迄今特别成功的方法.

Many algorithms focus on function approxi-mation problems, where the task is embodiedin a function (e.g., given an input transaction, out-put a“fraud”or“not fraud”label), and the learn-ing problem is to improve the accuracy of thatfunction, with experience consisting of a sampleof known input-output pairs of the function. Insome cases, the function is represented explicit-ly as a parameterized functional form; in othercases, the function is implicit and obtained via asearch process, a factorization, an optimization procedure, or a simulation-based procedure. Evenwhen implicit, the function generally dependson parameters or other tunable degrees of free-dom,andtrainingcorrespondstofindingvaluesfor these parameters that optimize the perform-ance metric.

许多算法侧重于函数近似问题,其中任务体现在函数中(例如,给定一个输入事务,放置一个"欺诈"或"不欺诈"标签),而学习的问题是提高该函数的准确性,具有由函数的已知输入输出对样本组成的经验。在某些情况下,函数显式表示为参数化函数形式;在其他情况下,函数是隐式的,通过 asearch 过程、分理化、优化过程或基于模拟的过程获得。即使隐式,函数通常取决于参数或其他可调自由度,并且训练应答器对优化性能指标的这些参数进行计算。

Whatever the learning algorithm, a key scien-tific and practical goal is to theoretically character-ize the capabilities of specific learning algorithmsand the inherent difficulty of any given learningproblem: How accurately can the algorithm learnfrom a particular type and volume of trainingdata? How robust is the algorithm to errors in itsmodeling assumptions or to errors in the train-ing data? Given a learning problem with a givenvolume of training data, is it possible to design asuccessful algorithm or is this learning problemfundamentally intractable? Such theoretical char-acterizations of machine-learning algorithms andproblems typically make use of the familiar frame-works of statistical decision theory and compu-tational complexity theory. In fact, attempts tocharacterize machine-learning algorithms the-oretically have led to blends of statistical andcomputationaltheoryinwhichthegoalistosimul-taneously characterize the sample complexity(how much data are required to learn accurately)and the computational complexity (how muchcomputation is required) and to specify how thesedepend on features of the learning algorithm suchas the representation it uses for what it learns(3–6). A specific form of computational analysisthat has proved particularly useful in recentyears has been that of optimization theory,with upper and lower bounds on rates of con-vergence of optimization procedures mergingwell with the formulation of machine-learningproblems as the optimization of a performancemetric (7,8)

无论学习算法如何,一个关键的、科学和实践的目标,就是从理论上将特定学习算法的功能与任何特定学习问题的内在难度相结合:该算法如何准确地从特定类型和数量的训练数据中学习?算法对模型假设中的错误或训练数据中的错误的算法有多可靠?给定一卷训练数据中的学习问题,是否可能设计成功算法,或者这种学习问题是否难以解决?这种机器学习算法和问题的理论字符化通常利用统计学决策理论和共性复杂性理论的熟悉框架工作。事实上,尝试对机器学习算法进行定性,这些算法在统计和计算理论中混合在一起,这些理论对样本复杂性(准确学习需要多少数据)和计算复杂性(需要多少计算)进行描述,并指定这些算法如何依赖学习算法的特性,例如它用于学习的表示形式(3-6)。近年来证明特别有用的计算分析的一种具体形式是优化理论,优化程序的上下限值与机器学习问题作为性能指标的优化的公式合并在一起(7,8)

s a field of study, machine learning sits at thecrossroads of computer science, statistics and avariety of other disciplines concerned with auto-matic improvement overtime, and inference anddecision-making under uncertainty. Related dis-ciplines include the psychological study of humanlearning, the study of evolution, adaptive controltheory, the study of educational practices, neuro-science, organizational behavior, and economics.Although the past decade has seen increased cross-talk with these other fields, we are just beginningto tap the potential synergies and the diversityof formalisms and experimental methods usedacross these multiple fields for studying systemsthat improve with experience.

是一个研究领域,它位于计算机科学、统计学和其他与自动改进的加班、不确定的推理和决策有关学科的交叉领域。相关理念包括人类学习心理学研究、进化研究、适应性控制理论、教育实践研究、神经科学、组织行为和经济学。虽然过去十年来,与这些领域的交叉讨论有所增加,但我们才刚刚开始挖掘潜在的协同效应,以及形式主义和实验方法的多样性,这些形式主义和实验方法跨这些多领域用于研究与经验相改进的系统。

Drivers of machine-learning progress

机器学习进步的驱动因素

The past decade has seen rapid growth in theability of networked and mobile computing sys-tems to gather and transport vast amounts ofdata, a phenomenon often referred to as“BigData.”The scientists and engineers who collectsuch data have often turned to machine learn-ing for solutions to the problem of obtaininguseful insights, predictions, and decisions fromsuch data sets. Indeed, the sheer size of the datamakes it essential to develop scalable proce-dures that blend computational and statistical considerations, but the issue is more than themere size of modern data sets; it is the granular,personalized nature of much of these data. Mo-bile devices and embedded computing permitlarge amounts of data to be gathered about in-dividual humans, and machine-learning algo-rithms can learn from these data to customizetheir services to the needs and circumstancesof each individual. Moreover, these personalizedservices can be connected, so that an overall ser-vice emerges that takes advantage of the wealthand diversity of data from many individualswhile still customizing to the needs and circum-stances of each. Instances of this trend towardcapturing and mining large quantities of data toimprove services and productivity can be foundacross many fields of commerce, science, andgovernment. Historical medical records are usedto discover which patients will respond bestto which treatments; historical traffic data areused to improve traffic control and reduce con-gestion; historical crime data are used to helpallocate local police to specific locations at spe-cific times; and large experimental data sets arecaptured and curated to accelerate progress inbiology, astronomy, neuroscience, and other data-intensive empirical sciences. We appear to be atthe beginning of a decades-long trend toward in-creasingly data-intensive, evidence-based decision-making across many aspects of science, commerce,and government.

过去十年来,网络和移动计算系统收集和传输大量数据的能力迅速增长,这种现象通常被称为"大数据"。收集这些数据的科学家和工程师经常转向机器学习,以寻求从这些数据集中获取使用上见解、预测和决策的解决方案。事实上,由于数据的绝对规模,开发将计算和统计考虑融为一其中的可扩展产品至关重要,但问题不仅仅是现代数据集的大小;它是这些数据的粒度、个性化性质。Mo-bile 设备和嵌入式计算允许收集大量有关个人数据的数据,机器学习算法可以从这些数据中学习,以便根据每个人的需求和情况定制他们的服务。此外,这些个性化服务可以相互连接,因此,整体服务功能将充分利用来自许多人的财富和多样性数据,同时仍然根据每个人的需求和周边情况进行定制。这种趋势的例子,即挖掘和挖掘大量数据,以提高服务和生产力,可以在商业、科学和政府的许多领域找到。历史病历用于发现哪些患者对哪些治疗反应最好;历史交通数据用于改善交通管制,减少交通拥堵;历史犯罪数据用于帮助当地警察在时间将特定地点定位;大型实验数据集经过整理和整理,可加速生物学、天文学、神经科学和其他数据密集型经验科学的进展。我们似乎处于数十年来的趋势的开始,这种趋势在科学、商业和政府的许多方面都出现了数据密集型、循证决策。

With the increasing prominence of large-scaledata in all areas of human endeavor has come awave of new demands on the underlying machine-learning algorithms. For example, huge data setsrequire computationally tractable algorithms, high-ly personal data raise the need for algorithmsthat minimize privacy effects, and the availabil-ity of huge quantities of unlabeled data raisesthe challenge of designing learning algorithmsto take advantage of it. The next sections surveysome of the effects of these demands on recentwork in machine-learning algorithms, theory, andpractice.

随着大规模数据在人类工作所有领域的日益突出,对基础机器学习算法提出了新的要求。例如,巨大的数据集需要可计算可处理的算法,高利个人数据提出了对算法的需要,以最大限度地减少隐私影响,而大量未标记数据的利用带来了设计学习算法以利用它的挑战。下一节将介绍这些需求对机器学习算法、理论和实践近期工作的影响。

Core methods and recent progress

核心方法和最近的进展

The most widely used machine-learning methodsare supervised learning methods (1). Supervisedlearning systems, including spam classifiers ofe-mail, face recognizers over images, and med-ical diagnosis systems for patients, all exemplifythe function approximation problem discussedearlier, where the training data take the form ofa collection of (x,y) pairs and the goal is toproduce a predictionyinresponsetoaqueryx. The inputsxmaybeclassicalvectorsortheymaybemorecomplexobjectssuchasdocuments,images, DNA sequences, or graphs. Similarly,many different kinds of outputyhave been studied.Much progress has been made by focusing onthe simple binary classification problem in whichytakes on one of two values (for example,“spam”or“not spam”), but there has also been abun-dant research on problems such as multiclassclassification (whereytakes on one ofKlabels),multilabel classification (whereyis labeled simul-taneously by several of theKlabels), rankingproblems (whereyprovides a partial order onsome set), and general structured predictionproblems (whereyis a combinatorial object suchas a graph, whose components may be requiredto satisfy some set of constraints). An exampleof the latter problem is part-of-speech tagging,where the goal is to simultaneously label everyword in an input sentencexas being a noun,verb, or some other part of speech. Supervisedlearning also includes cases in whichyhas real-valued components or a mixture of discrete andreal-valued components.

使用最广泛的机器学习方法是监督学习方法 (1)。监督学习系统,包括电子邮件的垃圾邮件分类器、图像上的人脸识别器和患者的医疗诊断系统,都举例说明了所讨论的功能近似问题,其中培训数据以收集 (x,y) 对的形式出现,目标是创建预测*无响应的三分之一。输入x也许经典器器ortheymaybemore复杂对象,图像,DNA序列,或图形。同样,对许多不同种类的产出进行了研究。通过关注简单的二进制分类问题,在二进制分类中,重点关注两个值之一(例如"垃圾邮件"或"不是垃圾邮件"),取得了很大进展,但对多分类(其中一个标签上的取值)等问题的研究也取得了重大进展。 多标签分类 (其中, 由几个标签标记模拟), 排名问题 (其中提供部分顺序的一组), 和一般结构化预测问题 (其中一个组合对象, 如图形,其组件可能需要满足某些约束集)。后一个问题的一个例子是部分语音标记,其目标是同时将输入句子中的每个单词标记为名词、动词或语音的其他部分。监督学习还包括具有实际价值组件或离散和实际价值组件混合的情况。

Supervised learning systems generally formtheir predictions via a learned mappingf(x),which produces an outputyfor each inputx(ora probability distribution overygivenx). Manydifferent forms of mappingfexist, including decision trees, decision forests, logistic regres-sion, support vector machines, neural networks,kernel machines, and Bayesian classifiers (1). Avariety of learning algorithms has been proposedto estimate these different types of mappings, andthere are also generic procedures such as boost-ing and multiple kernel learning that combinethe outputs of multiple learning algorithms.Procedures for learningffrom data often makeuse of ideas from optimization theory or numer-ical analysis, with the specific form of machine-learning problems (e.g., that the objective functionor function to be integrated is often the sum overa large number of terms) driving innovations. Thisdiversity of learning architectures and algorithmsreflects the diverse needs of applications, withdifferent architectures capturing different kindsof mathematical structures, offering different lev-els of amenability to post-hoc visualization andexplanation, and providing varying trade-offsbetween computational complexity, the amountof data, and performance.

受监督的学习系统通常通过一个学习的映射f(x)形成他们的预测,它产生一个输出每个输入x(或概率分布过一个。许多不同形式的映射存在,包括决策树、决策林、逻辑 regres-sion、支持向量机、神经网络、内核计算机和贝叶斯分类器 (1)。提出了学习算法的 Avariety 来估计这些不同类型的映射,还有一些通用过程,如提升和多内核学习,它们结合了多种学习算法的输出。从数据中学习的程序往往利用优化理论或数字分析中的想法,以机器学习问题的具体形式(例如,要集成的目标函数函数函数通常是推动创新的总和)。学习体系结构和算法的这种多样性反映了应用程序的不同需求,不同的体系结构捕获了不同类型的数学结构,为后期可视化和扩展提供了不同的适用性,并在计算复杂性、数据量和性能之间提供了不同的权衡。

0ne high-impact area of progress in supervisedlearning in recent years involves deep networks,which are multilayer networks of threshold units,each of which computes some simple param-eterized function of its inputs (9,10). Deep learningsystems make use of gradient-based optimiza-tion algorithms to adjust parameters throughoutsuchamultilayerednetworkbasedonerrorsatits output. Exploiting modern parallel comput-ing architectures, such as graphics processingunits originally developed for video gaming, ithas been possible to build deep learning sys-tems that contain billions of parameters andthat can be trained on the very large collectionsof images, videos, and speech samples availableon the Internet. Such large-scale deep learningsystems have had a major effect in recent yearsin computer vision (11) and speech recognition(12), where they have yielded major improve-ments in performance over previous approaches(see Fig. 2). Deep network methods are beingactively pursued in a variety of additional appli-cations from natural language translation tocollaborative filtering

监督学习进展的高影响领域涉及深度网络,即阈值单元的多层网络,每个阈值单元都计算其输入的一些简单的参数化函数(9,10)。深度学习系统利用基于梯度的优化算法来调整整个多层网络基点对照器输出的参数。利用现代并行计算架构(如最初为视频游戏开发的图形处理单元),可以构建包含数十亿个参数的深度学习系统,并可对互联网上可用的图像、视频和语音样本进行大量训练。近年来,这种大规模的深度学习系统在计算机视觉(11)和语音识别(12)方面产生了重大影响,在性能方面比之前的方法有显著提高。 (参见图2)。从自然语言翻译到协作过滤,在多种附加应用中积极追求深度网络方法

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-8r7Ld1tY-1597588401833)(C:\Users\39820\Desktop\图2.png)]

Fig. 2. Automatic generation of text captions for images with deep networks. A convolutional neural network is trained to interpret images, and its output is then used by a recurrent neural network trained to generate a text caption (top).The sequence at the bottom shows the word-by-word focus of the network on different parts of input image while it generates the caption word-by-word. [Adapted with permission from (30)]

图2 自动生成具有深度网络的图像的文本字幕。卷积神经网络经过训练来解释图像,然后由经过训练的循环神经网络使用,以生成文本标题(顶部)。底部的序列显示网络在输入图像不同部分的逐字焦点,同时生成标题逐字。[经许可改编自 (30)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-4gWGfEfL-1597588401835)(C:\Users\39820\Desktop\图3.png)]

Fig. 3. Topic models.Topic modeling is a methodology for analyzing documents, where a document is viewed as a collection of words, and the words inthe document are viewed as being generated by an underlying set of topics (denoted by the colors in the figure). Topics are probability distributionsacross words (leftmost column), and each document is characterized by a probability distribution across topics (histogram). These distributions areinferred based on the analysis of a collection of documents and can be viewed to classify, index, and summarize the content of documents. [From (31).Copyright 2012, Association for Computing Machinery, Inc. Reprinted with permission

图3 主题模型。主题建模是一种分析文档的方法,其中文档被视为单词集合,文档中的单词被视为由一组基础主题生成(由图中的颜色表示)。主题是概率分布交叉单词(最左边的列),每个文档的特点是跨主题的概率分布(直方图)。这些分布基于对文档集合的分析而编制,可以查看以对文档内容进行分类、索引和汇总

The internal layers of deep networks can beviewed as providing learned representations ofthe input data. While much of the practical suc-cess in deep learning has come from supervisedlearning methods for discovering such repre-sentations, efforts have also been made to devel-op deep learning algorithms that discover usefulrepresentations of the input without the need forlabeled training data (13). The general problem isreferred to as unsupervised learning, a secondparadigm in machine-learning research (2).

深度网络的内部层可以视为提供输入数据的学习表示形式。虽然深度学习中许多实际的实现来自监督学习方法来发现此类转发,但也努力开发深度学习算法,在无需标记培训数据的情况下发现输入的有用代表性(13)。一般问题被指为无监督学习,是机器学习研究中的第二个标准(2)。

Broadly, unsupervised learning generally in-volves the analysis of unlabeled data under as-sumptions about structural properties of thedata (e.g., algebraic, combinatorial, or probabi-listic). For example, one can assume that datalie on a low-dimensional manifold and aim toidentify that manifold explicitly from data. Di-mension reduction methods—including prin-cipal components analysis, manifold learning,factor analysis, random projections, and autoen-coders (1,2)—make different specific assump-tions regarding the underlying manifold (e.g.,that it is a linear subspace, a smooth nonlinearmanifold, or a collection of submanifolds). An-other example of dimension reduction is thetopic modeling framework depicted in Fig. 3.A criterion function is defined that embodiesthese assumptions—often making use of generalstatistical principles such as maximum like-lihood, the method of moments, or Bayesianintegration—and optimization or sampling algo- rithms are developed to optimize the criterion.As another example, clustering is the problemof finding a partition of the observed data (anda rule for predicting future data) in the absenceof explicit labels indicating a desired partition.A wide range of clustering procedures has beendeveloped, all based on specific assumptionsregarding the nature of a“cluster.”In both clus-tering and dimension reduction, the concernwith computational complexity is paramount,given that the goal is to exploit the particularlylarge data sets that are available if one dis-penses with supervised labels

从广义上讲,无监督学习一般在数据的结构属性(例如代数、组合或概率列表)的作为汇总下对未标记的数据进行分析。例如,可以假定数据在低维歧管上,目的是从数据中显式标识该歧管。Di-menion 还原方法(包括 prin-cipal 成分分析、歧管学习、因子分析、随机投影和自动编码器 (1,2) - 对底层歧管(例如,它是线性子空间、平滑非线性歧页或子马尼折叠集合)进行不同的特定分析。维度缩减的另一个示例是图 3 中描述的量位建模框架,该定义是体现这些假设的标准函数,通常使用一般统计原理,如最大值、时位方法或 Bayesian 整合,并开发优化或采样算法-rithms 来优化标准。另一个示例是,在缺少指示所需分区的显式标签的情况下查找观测数据的分区(以及预测未来数据的规则)是问题。已制定了广泛的聚类过程,所有过程都基于对"聚类"性质的具体假设。在 clus-tering 和尺寸缩减方面,计算复杂性的担心至关重要,因为目标是利用具有监督标签的未使用笔时可用的特别大数据集

third major machine-learning paradigm isreinforcement learning (14,15). Here, the infor-mation available in the training data is inter-mediate between supervised and unsupervisedlearning. Instead of training examples that in-dicate the correct output for a given input, thetraining data in reinforcement learning are as-sumed to provide only an indication as to whetheran action is correct or not; if an action is incor-rect, there remains the problem of finding thecorrect action. More generally, in the setting ofsequences of inputs, it is assumed that rewardsignals refer to the entire sequence; the assign-ment of credit or blame to individual actions in thesequence is not directly provided. Indeed, althoughsimplified versions of reinforcement learningknown as bandit problems are studied, where itis assumed that rewards are provided after eachaction, reinforcement learning problems typicallyinvolve a general control-theoretic setting inwhich the learning task is to learn a control strat-egy (a“policy”) for an agent acting in an unknowndynamical environment, where that learned strat-egy is trained to chose actions for any given state,with the objective of maximizing its expected re-ward over time. The ties to research in controltheory and operations research have increasedover the years, with formulations such as Markovdecision processes and partially observed Mar-kov decision processes providing points of con-tact (15,16). Reinforcement-learning algorithmsgenerally make use of ideas that are familiarfrom the control-theory literature, such as policyiteration, value iteration, rollouts, and variancereduction, with innovations arising to addressthe specific needs of machine learning (e.g., large-scale problems, few assumptions about the un-known dynamical environment, and the use ofsupervised learning architectures to representpolicies). It is also worth noting the strong tiesbetween reinforcement learning and many dec-ades of work on learning in psychology andneuroscience, one notable example being theuse of reinforcement learning algorithms to pre-dict the response of dopaminergic neurons inmonkeys learning to associate a stimulus lightwith subsequent sugar reward (17)

第三个主要机器学习范式是强制学习(14,15)。在这里,培训数据中可用的信息在监督和非监督学习之间相互调解。强化学习中的培训数据没有训练示例,而是仅提供操作是否正确的指示;如果一个动作是不可纠正的, 仍然存在找到纠正行动的问题。更笼统地,在输入序列的设置中,假定奖励ignal指的是整个序列;不直接提供信贷或责任的信贷或责任,以个别行动。事实上,虽然对称为土匪问题而称为强化学习的简化版本进行了研究,但假设每次行动后都提供奖励,但强化学习问题通常涉及一个通用的控制理论设置,其中学习任务是为在未知动力学环境中行事的代理学习控制策略(“策略”),其中所学的 strat-egy 接受培训,以选择任何特定状态的操作,目的是随着时间的推移,最大限度地实现预期的再处理。多年来,与控制理论和运营研究研究的联系有所增加,马尔科夫决策过程和部分遵守的马尔科夫决策过程等公式提供了联系点(15,16)。强化学习算法通常利用控制理论文献中熟悉的想法,如策略迭代、值迭代、推出和方差缩减,并产生创新,以满足机器学习的特定需求(例如,大规模问题、对未知动态环境的假设很少,以及使用监督学习体系结构来表示政策)。值得注意的是,强化学习与心理学和神经科学学中许多学习工作之间的紧密联系,一个显著的例子是,使用强化学习算法来预先控制多巴胺神经元在猴子学习将刺激光与随后的糖奖励(17)

Although these three learning paradigms helpto organize ideas, much current research involvesblends across these categories. For example, semi-supervised learning makes use of unlabeled datato augment labeled data ina supervised learningcontext, and discriminative training blends ar-chitectures developed for unsupervised learningwith optimization formulations that make useof labels. Model selection is the broad activity ofusing training data not only to fit a model butalso to select from a family of models, and thefact that training data do not directly indicate which model to use leads to the use of algo-rithms developed for bandit problems and toBayesian optimization procedures. Active learn-ing arises when the learner is allowed to choosedata points and query the trainer to request tar-geted information, such as the label of an other-wise unlabeled example. Causal modeling is theeffort to go beyond simply discovering predictiverelations among variables, to distinguish whichvariables causally influence others (e.g., a highwhite-blood-cell count can predict the existenceof an infection, but it is the infection that causesthe high white-cell count). Many issues influencethe design of learning algorithms across all ofthese paradigms, including whether data areavailable in batches or arrive sequentially overtime, how data have been sampled, require-ments that learned models be interpretable byusers, and robustness issues that arise whendata do not fit prior modeling assumptions.

虽然这三种学习模式有助于组织想法,但当前许多研究涉及这些类别的研究。例如,半监督式学习利用未标记的数据在受监督的学习文本中增强标签数据,而歧视性训练将为无监督学习开发的 ar-chitecture 与使用标签的优化公式混合在一起。模型选择是使用训练数据不仅适合模型,而且从模型系列中选择的广泛活动,事实上训练数据没有直接指示使用哪个模型导致使用为土匪问题开发的算法-rithms和Bayesian优化过程。当允许学员选择数据点并查询培训师以请求 tar 获取的信息(如其他未标记示例的标签)时,将出现主动学习。因果建模是超越简单地发现变量之间的预测关系,以区分哪些可变因果影响其他人(例如,高白血细胞计数可以预测感染的存在,但感染导致高白细胞计数)。许多问题会影响所有这些范式学习算法的设计,包括数据是分批使用还是按顺序到达加班,数据是如何采样的,学习模型的需要由用户解释,以及数据不符合先前建模假设时出现的健壮性问题。

Emerging trends

新兴趋势

Emerging trendsThe field of machine learning is sufficiently youngthat it is still rapidly expanding, often by invent-ing new formalizations of machine-learningproblems driven by practical applications. (Anexample is the development of recommendationsystems, as described in Fig. 4.) One major trenddriving this expansion is a growing concern withthe environment in which a machine-learningalgorithm operates. The word“environment”here refers in part to the computing architecture;whereas a classical machine-learning system in-volved a single program running on a single ma-chine, it is now common for machine-learningsystems to be deployed in architectures that in-clude many thousands or ten of thousands ofprocessors, such that communication constraintsand issues of parallelism and distributed pro-cessing take center stage. Indeed, as depictedin Fig. 5, machine-learning systems are increas-ingly taking the form of complex collections ofsoftware that run on large-scale parallel and dis-tributed computing platforms and provide a rangeof algorithms and services to data analysts.

新兴趋势 机器学习领域还很年轻,它仍在迅速扩展,通常是通过发明由实际应用驱动的机器学习问题的新形式化。(例如,如图4所示,建议系统的发展。这种扩展的一个主要趋势是,机器学习算法运行的环境越来越令人关注。这里的"环境"一词部分地指计算体系结构;而一个经典的机器学习系统在单个 ma-chine 上运行的单个程序中,现在通常部署在架构中,这些架构中可以部署数千或数万个处理器,因此通信约束和并行性问题以及分布式亲访问问题成为中心阶段。事实上,如图5所示,机器学习系统正以复杂的软件集合的形式,在大规模并行和非复杂计算平台上运行,并为数据分析师提供一系列算法和服务。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-7LxxGJzi-1597588401837)(C:\Users\39820\Desktop\图4.png)]

Fig. 4. Recommendation systems.A recommen-dation system is a machine-learning system that isbased on data that indicate links between a setof a users (e.g., people) and a set of items (e.g.,products). A link between a user and a productmeans that the user has indicated an interest inthe product in some fashion (perhaps by purchas-ing that item in the past). The machine-learning prob-lem is to suggest other items to a given user that heor she may also be interested in, based on the dataacross all users.

图4.建议系统。再通信系统是一种机器学习系统,它基于指示用户集(例如人)和一组项目(例如产品)之间链接的数据。用户和产品之间的链接表示用户已以某种方式表示对产品感兴趣(可能通过过去购买该产品)。机器学习 prob-lem 是根据所有用户的数据向给定用户推荐您可能感兴趣的其他项目。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-7CEvPa8V-1597588401837)(C:\Users\39820\Desktop\图5.png)]

Fig. 5. Data analytics stack.Scalable machine-learning systems are layered architectures that arebuilt on parallel and distributed computing platforms. The architecture depicted here—an open-source data analysis stack developed in the Algorithms, Machines and People (AMP) Laboratory atthe University of California, Berkeley—includes layers that interface to underlying operating systems;layers that provide distributed storage, data management, and processing; and layers that providecore machine-learning competencies such as streaming, subsampling, pipelines, graph processing,and model serving

图5 数据分析堆栈。可扩展的机器学习系统是构建在并行和分布式计算平台上的分层体系结构。此处描述的体系结构 —在加州大学伯克利分校的算法、机器和人员 (AMP) 实验室开发的开源数据分析堆栈 - 包括与底层操作系统接口的层;提供分布式存储、数据管理和处理的层;和层,提供核心机器学习能力,如流、子采样、管道、图形处理和模型服务

The word“environment”also refers to thesource of the data, which ranges from a set ofpeople who may have privacy or ownership con-cerns, to the analyst or decision-maker who mayhave certain requirements on a machine-learningsystem (for example, that its output be visual-izable), and to the social, legal, or political frame-work surrounding the deployment of a system.The environment also may include other machine-learning systems or other agents, and the overallcollection of systems may be cooperative or ad-versarial. Broadly speaking, environments pro-vide various resources to a learning algorithmand place constraints on those resources. Increas-ingly, machine-learning researchers are formalizingthese relationships, aiming to design algorithmsthat are provably effective in various environ-ments and explicitly allow users to express andcontrol trade-offs among resources.

环境"一词还指数据来源,从一组可能拥有隐私或所有权的人,到对机器学习系统可能有某些要求的分析师或决策者(例如,其输出是可视的),以及围绕系统部署的社会、法律或政治框架工作。环境还可以包括其他机器学习系统或其他代理,系统的总体集合可能是协作的,也可以是反面的。从广义上讲,环境将各种资源用于学习算法,并限制这些资源。显然,机器学习研究人员正在正式确定这些关系,旨在设计在各种环境中非常有效的算法,并明确允许用户表达和控制资源之间的权衡。

As an example of resource constraints, let ussuppose that the data are provided by a set ofindividuals who wish to retain a degree of pri- vacy. Privacy can be formalized via the notion of“differential privacy,”which defines a probabi-listic channel between the data and the outsideworld such that an observer of the output of thechannel cannot infer reliably whether particularindividuals have supplied data or not (18). Clas-sicalapplicationsofdifferentialprivacyhaveinvolved insuring that queries (e.g.,“what is themaximum balance across a set of accounts?”)toa privatized database return an answer that isclose to that returned on the nonprivate data.Recent research has brought differential privacyinto contact with machine learning, where que-ries involve predictions or other inferential asser-tions (e.g.,“giventhedataI’veseensofar,whatisthe probability that a new transaction is fraud-ulent?”)(19,20). Placing the overall design of aprivacy-enhancing machine-learning systemwithin a decision-theoretic framework providesusers with a tuning knob whereby they can choosea desired level of privacy that takes into accountthekindsofquestionsthatwillbeaskedofthedata and their own personal utility for the an-swers. For example, a person may be willing toreveal most of their genome in the context ofresearch on a disease that runs in their familybut may ask for more stringent protection if in-formation about their genome is being used toset insurance rates.

作为资源约束的一个例子,让我们假设数据是由一组希望保留一定程度的原始数据的人提供的。隐私可以通过"差异隐私"的概念来正式化,它定义了数据与外部世界之间的一个概率列表通道,以便通道输出的观察者无法可靠地推断特定个人是否提供了数据 (18)。克拉斯- sical 应用- 不同属性的真空涉及确保查询(例如,"一组帐户之间的最大余额是多少?"to a 私有化数据库返回一个答案,该答案与非私人数据返回的答案是一样。最近的研究带来了差异化的隐私与机器学习的接触,其中,que-ries 涉及预测或其他推断性评估(例如,"给定的data’s’到目前为止,新交易是欺诈的可能性是什么?(19,20). 将增强能力机器学习系统的总体设计与决策理论框架放在一起,为用户提供了一个调优旋钮,用户可以选择所需的隐私级别,其中考虑到数据所要问的问题,以及他们对于网络用户的个人效用。例如,一个人可能愿意在研究其家族中运行的疾病时放弃大部分基因组,但如果有关其基因组的形成被用于设置保险费率,则可能需要更严格的保护。

Communication is another resource that needsto be managed within the overall context of adistributed learning system. For example, datamay be distributed across distinct physical loca-tions because their size does not allow them tobe aggregated at a single site or because of ad-ministrative boundaries. In such a setting, we maywish to impose a bit-rate communication con-straint on the machine-learning algorithm. Solvingthe design problem under such a constraint willgenerally show how the performance of the learn-ing system degrades under decrease in commu-nication bandwidth, but it can also reveal howthe performance improves as the number of dis-tributedsites(e.g.,machinesorprocessors)in-creases, trading off these quantities against theamount of data (21,22). Much as in classical in-formation theory, this line of research aims atfundamental lower bounds on achievable per-formance and specific algorithms that achievethose lower bounds.

交流是另一个需要在分布式学习系统的整体上下文中管理的资源。例如,数据可能分布在不同的物理 loca-tions 中,因为它们的大小不允许它们聚合在单个站点上,或者由于广告的部委边界。在这种环境中,我们可能希望对机器学习算法施加比特率通信。在这样的约束下解决设计问题,一般会显示学习系统的性能在通信带宽降低时如何降低,但它也可以揭示性能如何随着分离站点(例如机器或处理器)的数量而提高,从而以数据量(21,22)来交易这些数量。与经典形成理论一样,本研究路线旨在对可实现的每个形成和实现这些下限的特定算法进行基础性下限。

A major goal of this general line of research isto bring the kinds of statistical resources studiedin machine learning (e.g., number of data points,dimension of a parameter, and complexity of ahypothesis class) into contact with the classicalcomputational resources of time and space. Sucha bridge is present in the“probably approximatelycorrect”(PAC) learning framework, which studiesthe effect of adding a polynomial-time compu-tation constraint on this relationship among errorrates, training data size, and other parameters ofthe learning algorithm (3). Recent advances inthis line of research include various lower boundsthat establish fundamental gaps in performanceachievable in certain machine-learning prob-lems (e.g., sparse regression and sparse princi-pal components analysis) via polynomial-timeand exponential-time algorithms (23). The coreof the problem, however, involves time-data trade-offs that are far from the polynomial/exponentialboundary. The large data sets that are increas-ingly the norm require algorithms whose timeand space requirements are linear or sublinear inthe problem size (number of data points or num-ber of dimensions). Recent research focuses onmethodssuchassubsampling,randomprojec-tions, and algorithm weakening to achieve scal-ability while retaining statistical control (24,25).Theultimategoalistobeabletosupplytimeand space budgets to machine-learning systemsin addition to accuracy requirements, with thesystem finding an operating point that allowssuch requirements to be realized.

这一总体研究线的一个主要目标是使机器学习中研究的统计资源种类(例如数据点的数量、参数的维度和假设类的复杂性)与时间和空间的经典计算资源接触。Sucha 桥存在于"近似更正"(PAC)学习框架中,该框架研究了在误差率、训练数据大小和学习算法的其他参数(3)之间添加多项式时间共和效应效应的效果。这一研究领域的最新进展包括各种下限,通过多项式和指数时间算法(23),在某些机器学习概率(例如稀疏回归和稀疏 princi-pal 组件分析)中,建立了性能方面的基本差距。然而,问题的核心涉及时间数据权衡,这些权衡远非多项式/指数边界。严格规范的大型数据集需要其时间和空间要求在问题大小(数据点数或维度数)中为线性或亚线性的算法。最近的研究侧重于methodssuchas子采样、随机projec-tions和算法弱化,以实现量力,同时保持统计控制(24,25)。除了精度要求外,系统还找到了一个允许实现这些要求的操作点,因此,最终目标可保证的供给时间和空间预算用于机器学习系统。

Opportunities and challengesDespite

机遇与挑战

Despites its practical and commercial successes,machine learning remains a young field withmany underexplored research opportunities.Some of these opportunities can be seen by con-trasting current machine-learning approachesto the types of learning we observe in naturally occurring systems such as humans and otheranimals, organizations, economies, and biologicalevolution. For example, whereas most machine-learning algorithms are targeted to learn onespecific function or data model from one singledata source, humans clearly learn many differ-ent skills and types of knowledge, from yearsof diverse training experience, supervised andunsupervised, in a simple-to-more-difficult se-quence (e.g., learning to crawl, then walk, thenrun). This has led some researchers to beginexploring the question of how to construct com-puter lifelong or never-ending learners that op-erate nonstop for years, learning thousands ofinterrelated skills or functions within an over-all architecture that allows the system to im-prove its ability to learn one skill based onhaving learned another (26–28). Another aspectof the analogy to natural learning systems sug-gests the idea of team-based, mixed-initiativelearning. For example, whereas current machine-learning systems typically operate in isolationto analyze the given data, people often workin teams to collect and analyze data (e.g., biol-ogists have worked as teams to collect and an-alyze genomic data, bringing together diverseexperiments and perspectives to make progresson this difficult problem). New machine-learningmethods capable of working collaboratively withhumans to jointly analyze complex data setsmight bring together the abilities of machinesto tease out subtle statistical regularities frommassive data sets with the abilities of humans todraw on diverse background knowledge to gen-erate plausible explanations and suggest newhypotheses. Many theoretical results in machinelearning apply to all learning systems, whetherthey are computer algorithms, animals, organ-izations, or natural evolution. As the field pro-gresses, we may see machine-learning theoryand algorithms increasingly providing modelsfor understanding learning in neural systems, organizations, and biological evolution and seemachine learning benefit from ongoing studiesof these other types of learning systems.

尽管机器学习取得了实用和商业上的成功,但它仍然是一个年轻的领域,拥有许多探索不足的研究机会。其中一些机会可以通过将当前机器学习方法与我们在自然发生的系统(如人类和其他动物、组织、经济和生物进化)中观察到的学习类型进行分析来看到。例如,虽然大多数机器学习算法的目标是从单个数据源中学习特定函数或数据模型,但人类显然从多年不同的培训经验、监督和无人监督中,以简单到更困难的自我(例如,学习抓取,然后走路,然后运行)中学习许多不同的技能和类型的知识。这导致一些研究人员开始探索如何构建一个终生或永无止境的学习者的问题,这些学习者在多年不停的算法中学习了数千种相关的技能或功能,使系统能够证明其根据学习另一种技能来学习一项技能的能力(26-28)。与自然学习系统类比的另一个方面是团队为基础的混合主动学习理念。例如,当前的机器学习系统通常独立运行以分析给定数据,但人们通常通过团队来收集和分析数据(例如,生物基因组学家作为团队来收集和分析基因组数据,将不同的实验和观点汇集在一起,以在这个难题上取得进展)。能够与人类协作共同分析复杂数据集的新机器学习模型能够将机器的细微统计规律与人类从各种背景知识中绘制到基因时代合理的解释和提出新假设的能力结合在一起。机器学习中的许多理论成果适用于所有学习系统,无论是计算机算法、动物、器官化还是自然进化。作为领域的促进者,我们可以看到机器学习理论和算法越来越多地为理解神经系统、组织、生物进化中的学习提供模型,并且从这些其他类型的学习系统的持续研究中受益。

As with any powerful technology, machinelearning raises questions about which of its po-tential uses society should encourage and dis-courage. The push in recent years to collect newkinds of personal data, motivated by its eco-nomic value, leads to obvious privacy issues, asmentioned above. The increasing value of dataalso raises a second ethical issue: Who will haveaccess to, and ownership of, online data, and whowill reap its benefits? Currently, much data arecollected by corporations for specific uses leadingto improved profits, with little or no motive fordata sharing. However, the potential benefits thatsociety could realize, even from existing onlinedata, would be considerable if those data were tobe made available for public good.

和任何强大的技术一样,机器学习也提出了一个问题,即社会应该鼓励和消除这种利用。近年来,在生态价值的推动下,收集新型个人数据,导致明显的隐私问题,如上所述。数据价值的不断增长也引发了第二个道德问题:谁将访问和拥有在线数据,谁将获得其好处?目前,公司收集的很多数据用于特定用途,从而提高利润,几乎没有或没有数据共享的动机。然而,如果为公共利益提供这些数据,社会甚至从现有的在线数据中也可以实现的潜在好处将相当可观。

To illustrate, consider one simple exampleof how society could benefit from data that isalready online today by using this data to de-crease the risk of global pandemic spread frominfectious diseases. By combining location datafrom online sources (e.g., location data from cellphones, from credit-card transactions at retailoutlets, and from security cameras in public placesand private buildings) with online medical data(e.g., emergency room admissions), it would befeasible today to implement a simple system totelephone individuals immediately if a personthey were in close contact with yesterday was justadmitted to the emergency room with an infec-tious disease, alerting them to the symptoms theyshould watch for and precautions they shouldtake. Here, there is clearly a tension and trade-offbetween personal privacy and public health, andsociety at large needs to make the decision onhow to make this trade-off. The larger point ofthis example, however, is that, although the dataare already online, we do not currently have thelaws,customs,culture,ormechanismstoenable society to benefit from them, if it wishes to do so.In fact, much of these data are privately held andowned, even though they are data about each ofus. Considerations such as these suggest thatmachine learning is likely to be one of the mosttransformative technologies of the 21st century.Although it is impossible to predict the future, itappears essential that society begin now to con-sider how to maximize its benefits.

tive technologies of the 21st century.Although it is impossible to predict the future, itappears essential that society begin now to con-sider how to maximize its benefits.

为了说明这一点,请考虑一个简单的例子,说明社会如何利用这些数据来降低全球大流行从传染病传播的风险,从当今在线数据中获益。通过将来自在线来源的位置数据(例如,来自手机的位置数据、零售店的信用卡交易、公共场所和私人建筑的安全摄像头)与在线医疗数据(例如急诊室入院)相结合,今天,如果昨天与自己有密切联系的人被紧急通知,立即实施一个简单的系统,立即给个人打电话是不可能的。, 提醒他们他们应该注意的症状和预防措施, 他们应该采取。在这里,个人隐私和公共卫生之间显然存在紧张和权衡,社会需要就如何做出这种权衡作出决定。然而,这个例子的更大一点是,虽然数据已经在线,但我们目前没有法律、习俗、文化或机械社会能够从中受益,如果它愿意的话。事实上,这些数据大部分是私有的,尽管它们是关于每个数据的数据。诸如此类的考虑表明,机器学习可能是21世纪最具转化性的技术之一。虽然预测未来是不可能的,但社会现在开始强调如何最大化其利益,这一点似乎至关重要。

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值