图灵奖得主Richard S. Sutton 苦涩的教训(the bitter lesson)|AI经典重温

the bitter lesson导读

在这里插入图片描述
苦涩的教训(the bitter lesson)是richard sutton在2019年3月写的一篇博文,反思了过去AI研究中的一个教训,在设计人工智能算法时,人类试图用更多的专家知识来提升智能水平,而未充分考虑机器本身计算思维的特性,即利用计算能力可带来的智能的提升。
sutton在文中回顾70年来AI研究的经验,介绍了在计算象棋、围棋、语音处理、计算机视觉等领域,利用算力的方法在性能上最终超越了基于专家知识构造的特征方法,得出结论:AI研究者试图将专业知识嵌入智能体中来在各个领域中取得SOTA性能,而最终奏效的方法却是基于算力的搜索和学习方法。
反思这一苦涩的教训,sutton得出在AI研究中的两点启示,一是重视能随算力扩展的通用方法的威力,其中学习和搜索是两大能随算力扩展的通用方法;二是人类的心智是极其复杂的,应该避免人类中心论的思维,停止将专家知识嵌入智能体的方法,寻找能捕捉复杂性的元方法,让智能体能像人类一样去发现,而非将人类发现的知识嵌入智能体。
以下为the bitter lesson原文及翻译:

原文总结

70年AI研究中的教训

The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. The ultimate reason for this is Moore’s law, or rather its generalization of continued exponentially falling cost per unit of computation. Most AI research has been conducted as if the computation available to the agent were constant (in which case leveraging human knowledge would be one of the only ways to improve performance) but, over a slightly longer time than a typical research project, massively more computation inevitably becomes available. Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation. These two need not run counter to each other, but in practice they tend to. Time spent on one is time not spent on the other. There are psychological commitments to investment in one approach or the other. And the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation. There were many examples of AI researchers’ belated learning of this bitter lesson, and it is instructive to review some of the most prominent.

从70年的人工智能研究中得到的最重要的启示是,利用计算能力的通用方法最终是最有效的,而且优势巨大。其根本原因在于摩尔定律,或者更准确地说,是计算能力单位成本持续呈指数级下降的普遍趋势。大多数人工智能研究都假设智能体可利用的计算能力是固定的(在这种情况下,利用人类知识将是提升性能的少数途径之一),然而在比典型研究项目稍长的时间内,必然会有海量的计算能力可供使用。为了在短期内取得显著改进,研究人员试图利用他们对领域的专业知识,但从长远来看,唯一重要的因素是利用计算能力。这两者并不必然相互冲突,但在实践中往往如此。投入一方的时间,就意味着无法投入另一方。人们在一种方法上投入越多,就越难以转向另一种方法。而且,基于人类知识的方法往往会复杂化方法,使其难以利用基于计算能力的通用方法。人工智能研究者们多次迟来地认识到这一痛苦的教训,回顾其中一些最突出的例子是很有启发性的。

In computer chess, the methods that defeated the world champion, Kasparov, in 1997, were based on massive, deep search. At the time, this was looked upon with dismay by the majority of computer-chess researchers who had pursued methods that leveraged human understanding of the special structure of chess. When a simpler, search-based approach with special hardware and software proved vastly more effective, these human-knowledge-based chess researchers were not good losers. They said that ``brute force" search may have won this time, but it was not a general strategy, and anyway it was not how people played chess. These researchers wanted methods based on human input to win and were disappointed when they did not.

在计算象棋领域,1997年击败世界冠军卡斯帕罗夫的方法是基于大规模、深度搜索的。当时,大多数从事计算机国际象棋研究的研究人员都致力于利用人类对国际象棋独特结构的理解来开发方法,而这种基于搜索的简单方法借助特殊的硬件和软件取得了压倒性的优势,这让这些基于人类知识的国际象棋研究者们难以接受失败。他们声称,“暴力搜索”这次或许赢了,但这并非一种通用策略,而且这也不是人类下棋的方式。这些研究者希望基于人类智慧的方法能够胜出,结果却事与愿违,令他们倍感失望。
A similar pattern of research progress was seen in computer Go, only delayed by a further 20 years. Enormous initial efforts went into avoiding search by taking advantage of human knowledge, or of the special features of the game, but all those efforts proved irrelevant, or worse, once search was applied effectively at scale. Also important was the use of learning by self play to learn a value function (as it was in many other games and even in chess, although learning did not play a big role in the 1997 program that first beat a world champion). Learning by self play, and learning in general, is like search in that it enables massive computation to be brought to bear. Search and learning are the two most important classes of techniques for utilizing massive amounts of computation in AI research. In computer Go, as in computer chess, researchers’ initial effort was directed towards utilizing human understanding (so that less search was needed) and only much later was much greater success had by embracing search and learning.

在计算机围棋的研究中,也出现了类似的研究进展模式,只不过比计算机国际象棋晚了20年。最初,研究者们投入了巨大精力,试图通过利用人类知识或围棋的特殊特性来避免搜索,但当搜索技术得以有效且大规模地应用后,这些努力变得无关紧要,甚至起到了反作用。同样重要的是通过自我对弈进行学习以获得价值函数(这在许多其他游戏中也很关键,甚至在国际象棋中也是如此,尽管在1997年首次击败世界冠军的程序中,学习的作用并不显著)。自我对弈学习以及学习本身,与搜索类似,能够充分利用海量计算能力。搜索和学习是人工智能研究中利用海量计算能力的两大重要技术类别。在计算机围棋中,就像在计算机国际象棋中一样,研究者最初致力于利用人类对围棋的理解(从而减少搜索需求),但只有在后来全面拥抱搜索和学习后,才取得了更大的成功。
In speech recognition, there was an early competition, sponsored by DARPA, in the 1970s. Entrants included a host of special methods that took advantage of human knowledge—knowledge of words, of phonemes, of the human vocal tract, etc. On the other side were newer methods that were more statistical in nature and did much more computation, based on hidden Markov models (HMMs). Again, the statistical methods won out over the human-knowledge-based methods. This led to a major change in all of natural language processing, gradually over decades, where statistics and computation came to dominate the field. The recent rise of deep learning in speech recognition is the most recent step in this consistent direction. Deep learning methods rely even less on human knowledge, and use even more computation, together with learning on huge training sets, to produce dramatically better speech recognition systems. As in the games, researchers always tried to make systems that worked the way the researchers thought their own minds worked—they tried to put that knowledge in their systems—but it proved ultimately counterproductive, and a colossal waste of researcher’s time, when, through Moore’s law, massive computation became available and a means was found to put it to good use.

在语音识别领域,早在20世纪70年代,由DARPA赞助的早期竞赛中就出现了类似的情况。参赛方法中包括大量利用人类知识的特殊技术——这些知识涉及词汇、音素、人类声道等。而另一方面,一些新兴的、更具统计性质的方法则进行了更多的计算,基于隐马尔可夫模型(HMMs)。同样,统计方法最终战胜了基于人类知识的方法。这一胜利引发了自然语言处理领域的重大变革,这种变革在数十年间逐渐展开,统计学和计算能力逐渐主导了整个领域。近年来,深度学习在语音识别中的兴起是这一持续趋势的最新进展。深度学习方法对人类知识的依赖更少,却利用了更多的计算能力,并结合海量训练集进行学习,从而产生了性能显著提升的语音识别系统。正如在棋类游戏中一样,研究人员总是试图开发出符合他们对自身思维方式的理解的系统——他们试图将这些知识融入系统中——但最终证明,这种方法是反生产力的,并且当通过摩尔定律,海量计算能力变得可用且找到有效利用这些计算能力的途径时,这种方法成为研究人员时间的巨大浪费。

In computer vision, there has been a similar pattern. Early methods conceived of vision as searching for edges, or generalized cylinders, or in terms of SIFT features. But today all this is discarded. Modern deep-learning neural networks use only the notions of convolution and certain kinds of invariances, and perform much better.

在计算机视觉领域,也呈现出类似的模式。早期的方法将视觉理解为寻找边缘、广义圆柱体,或是基于SIFT特征的分析。然而,如今所有这些方法都已被弃置不用。现代的深度学习神经网络仅依赖于卷积的概念以及某些类型的不变性,其表现却远胜以往。

专家知识 VS 搜索与学习

This is a big lesson. As a field, we still have not thoroughly learned it, as we are continuing to make the same kind of mistakes. To see this, and to effectively resist it, we have to understand the appeal of these mistakes. We have to learn the bitter lesson that building in how we think we think does not work in the long run. The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning. The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.

这是一个深刻的教训。然而作为一个领域,我们仍未充分吸取这一教训,因为我们仍在不断重蹈覆辙。要认识到这一点,并有效地抵制它,我们必须理解这些错误的吸引力。我们必须学会接受这一痛苦的教训:将我们自认为的思维方式嵌入系统之中,从长远来看是行不通的。这一痛苦的教训基于以下历史观察:
1)人工智能研究者常常试图将知识嵌入他们的智能体中;
2)这种做法在短期内总是有所帮助,并且让研究者感到个人满足;
但3)从长远来看,它会停滞不前,甚至阻碍进一步的进步;
而4)突破性的进展最终是通过一种完全相反的方法实现的——这种方法基于通过搜索和学习扩展计算能力。
这种最终的成功带有苦涩的色彩,并且往往没有被完全接受,因为它与那种受青睐的、以人类为中心的方法相反。

苦涩教训的两大启示

One thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great. The two methods that seem to scale arbitrarily in this way are search and learning.

从这一痛苦的教训中,我们应当学到通用方法的巨大威力,这些方法能够随着计算能力的增加而持续扩展,即便可用的计算能力变得极为庞大。在这一方面,似乎能够任意扩展的两种方法是搜索和学习。

The second general point to be learned from the bitter lesson is that the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries. All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity. Essential to these methods is that they can find good approximations, but the search for them should be by our methods, not by us. We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done.

从这一痛苦的教训中应当学到的第二个普遍观点是,心智的实际内容是极其复杂且无法简化的;我们应当停止试图寻找简单的方式来思考心智的内容,例如简单地思考空间、物体、多智能体或对称性等问题。所有这些都属于外部世界中任意且本质上复杂的一部分。它们并不是我们应该嵌入系统的内容,因为它们的复杂性是无止境的。相反,我们应当嵌入的只有能够发现并捕捉这种任意复杂性的元方法。这些方法的核心在于它们能够找到良好的近似解,但寻找这些近似解的过程应该由我们的方法来完成,而不是由我们自己完成。我们希望人工智能智能体能够像我们一样去发现,而不是将我们已经发现的内容嵌入其中。将我们的发现嵌入系统,只会使我们更难看清发现过程是如何实现的。

Richard Sutton介绍

2024年3月,ACM协会将2024年的图灵奖颁给了 Richard S. Sutton和他的博导Andrew G. Barto,以表彰他们在强化学习的概念和算法基础方面的开创性工作。从20世纪80年代开始的一系列论文中,他们引入了强化学习的主要思想,构建了其数学基础,并开发了强化学习领域的重要算法。强化学习是创建智能系统的关键方法之一。
在这里插入图片描述

  • Andrew G. Barto和Richard S. Sutton所著《reinforcement learning an introduction(second edition)》下载地址:http://www.incompleteideas.net/book/RLbook2020.pdf
    http://www.incompleteideas.net/book/the-book.html
  • Richard Sutton个人主页:http://www.incompleteideas.net/

参考资料

http://www.incompleteideas.net/IncIdeas/BitterLesson.html
https://awards.acm.org/about/2024-turing
http://www.incompleteideas.net/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值