Translator : Younix
目录
Long-term Goals, Near-term Research
【译文】
导言
最近有许多关于人工智能风险的讨论,特别是指能力提高的人工智能可能给社会造成的潜在陷阱(短期和长期。 讨论对象包括AI研究人员,如Stuart Russell和Eric Horvitz和Tom Dietterich,企业家,如Elon Musk和Bill Gates,以及研究机构,如机器智能研究所(MIRI)和未来人类研究所(FHI);后者的主任Nick Bostrom甚至写了一本关于这个主题的畅销书。 最后,已拨出1000万美元用于研究如何确保人工智能安全和有益。 鉴于这一点,我认为人工智能研究人员讨论能力越来越强的人工智能系统可能带来的风险的性质和程度,无论是短期还是长期,都是有益的。 作为一名机器学习和人工智能的博士学生,本文将描述我自己对人工智能风险的看法,希望鼓励其他研究人员也详细阐述他们的想法。
为了本文的目的,我将定义“人工智能”为能够在有限或没有人类指导的情况下执行任务的技术,“高级人工智能”为执行比今天可能更复杂和领域一般任务的技术,以及“高能力人工智能”为能够在所有或几乎所有领域优于人类的技术。 由于本文的主要目标受众是其他研究人员,我使用了技术术语(例如。 弱监督学习,反向强化学习),只要它们是有用的,尽管我也试图使这篇文章在可能的情况下更普遍地被访问。
大纲
我认为区分两个问题很重要。 首先,人工智能是否值得与其他技术(如桥梁)同等程度的工程安全考虑)? 第二,人工智能是否需要额外的预防措施,而不是那些被认为是典型的预防措施? 我将认为,第一个答案是肯定的,甚至在短期内,目前的工程方法在机器学习领域甚至没有提供一个典型的安全或健壮性水平。 此外,我将认为,从长远来看,对第二个问题的答案也可能是肯定的,即有一些重要的方法可以使高能力的人工智能构成风险,而这些风险并没有得到典型工程问题的解决。
这篇文章的重点不是危言耸听;事实上,我认为人工智能对人类来说可能是净积极的。 相反,本文的重点是鼓励讨论人工智能带来的潜在陷阱,因为我相信现在所做的研究可以减轻许多这些陷阱。 没有这样的讨论,我们就不太可能理解哪些陷阱是最重要或最可能的,因此无法设计有效的研究方案来防止它们。
对讨论人工智能带来的风险的一个共同反对意见是,对这种风险的担忧似乎有点早,如果我们等到人工智能领域进一步发展之后,讨论可能会更密切。 我认为这一反对在抽象上是相当合理的;然而,正如我将在下面指出的,我认为我们确实合理地理解了人工智能可能造成的至少一些风险,其中一些风险即使在中期也将得到实现,并且有合理的研究方案可以解决这些风险,在许多情况下,这也将具有提高现有人工智能系统可用性的优势。
普通工程
有许多与人工智能安全有关的问题,只是一个好的工程方法的问题。 例如,理想情况下,我们希望系统是透明的、模块化的、健壮的和工作在理解良好的假设下的。 不幸的是,机器学习作为一个领域,没有开发出非常好的方法来获得任何这些东西,因此这是一个重要的问题,以弥补。 换句话说,我认为我们应该把至少同样多的思想投入到建造人工智能中,就像我们在建造一座桥一样。
只是说得很清楚,我不认为机器学习研究人员是不好的工程师;看看任何开源工具,如Torch、Caffe、MLlib等,都清楚地表明,许多机器学习研究人员也是优秀的软件工程师。 相反,我认为,作为一个领域,我们的方法还不够成熟,无法解决统计模型的具体工程设计(与创建它们的算法相反)。 特别是,从机器学习算法中得到的统计模型往往是:
- 不透明:许多机器学习模型由数十万个参数组成,这使得人们很难理解如何做出预测。 通常,从业人员使用错误分析来检查对每个不正确预测影响最大的协变量。 然而,这不是一个非常可持续的长期解决办法,因为即使对于相对狭窄的领域系统,也需要大量努力。
- 单石器时代:部分由于其不透明,模型充当黑匣子,没有模块化或封装行为。 虽然机器学习系统通常被分成较小模型的管道,但缺乏封装可能使这些管道比单个大型模型更难管理;事实上,由于机器学习模型是通过设计为特定的输入分布(即。 无论他们被训练的是什么分布),我们最终都会出现一种“改变任何事情都会改变一切”[1]的情况。
- 脆弱:作为对特定训练分布进行优化的另一个结果,当分布发生变化时,机器学习模型的性能可能任意较差。 例如,Daume和Marcu[2]表明,在一个数据集上具有92%精度的命名实体分类器在表面相似的数据集上的精度下降到58。 虽然这些问题部分是通过关于转移学习和领域适应[3]的工作来解决的,但与监督学习相比,这些领域没有得到很大的发展。
- 很难理解:除了它们的脆弱性,理解机器学习模型什么时候会起作用是很困难的。 我们知道,一个模型将工作,如果它是在相同的分布上测试,它被训练,并有一些扩展超过这种情况(例如。 基于鲁棒优化[4]),但我们几乎没有实际相关的条件,在此条件下,在一种情况下训练的模型将在另一种情况下很好地工作。 虽然它们是相关的,但这个问题与上面的不透明度问题不同,因为它涉及对系统未来行为的预测(特别是对新情况的泛化),而不是理解当前系统的内部工作。
这些问题困扰着机器学习系统,这在机器学习研究人员中可能没有争议。 然而,与注重扩展能力的研究相比,解决这些问题的工作很少。 因此,这一领域的研究似乎特别有影响力,特别是考虑到在日益复杂和安全关键的情况下部署机器学习系统的愿望。
非凡的工程
人工智能是否值得额外的安全预防措施,除了那些被认为是标准的工程实践在其他领域? 在这里,我只关注先进或高能力人工智能系统的长期影响。
我的初步答案是肯定的;似乎有几种不同的方式,人工智能可能会产生不良影响,每一种方式似乎个别地不太可能,但并非不可信。 即使到目前为止发现的每一种风险都不太可能,(一)总风险可能很大,特别是如果有更多的不明风险,(二)存在多个“近误”,也会促使更密切的调查,因为它可能暗示一些基本原则,使人工智能风险增加。 在续集中,我将关注所谓的“全球灾难性”风险,这意味着可能以物质方式影响地球人口的很大一部分的风险。 我之所以选择把重点放在这些风险上,是因为我认为,人工智能系统在损害少数人的方式(这将是一种法律责任,但也许不应在预防措施方面推动重大努力)和可能在全球范围造成损害的人工智能系统之间存在着重要的区别。 后者将证明有充分的预防措施是合理的,我想明确指出,这是我为自己设置的酒吧。
有了这一点,下面是先进或高能力人工智能可能具有特定全球灾难性风险的几种方式。
网络攻击。有两种趋势加在一起,使得人工智能辅助网络攻击的前景似乎令人担忧。 第一个趋势仅仅是网络攻击的日益普遍;甚至在今年,我们还看到俄罗斯攻击乌克兰、朝鲜攻击索尼和中国攻击美国人事管理办公室。 第二,“物联网”意味着越来越多的物理设备将连接到互联网。 假设存在软件来自主控制它们,许多互联网设备,如汽车,可能会被黑客攻击,然后被武器化,从而在短时间内获得决定性的军事优势。 这种攻击可以由一小群人在人工智能技术的帮助下实施,这将使它很难事先被发现。 与核裂变或合成生物学等其他可武器化技术不同,控制人工智能的分布将非常困难,因为它不依赖任何特定的原材料。 最后,请注意,即使是一个计算资源相对较小的团队也可能通过首先创建一个用于计算的僵尸网络来“引导”更多的计算能力;到目前为止,最大的僵尸网络已经跨越了3000万台计算机,其他几个僵尸网络已经超过了100万台。
自主武器。除了网络攻击之外,改进的自主机器人技术,加上无处不在的微型无人机(“无人机”),可以让恐怖分子和政府通过制造既便宜又难以探测或防御的武器(由于其体积小、机动性高),制造一种特别有害的远程战争形式)。 除了直接的恶意意图外,如果自主武器系统或其他强大的自主系统发生故障,那么它们可能会造成大量损害。
错误优化。一个高度能干的人工智能可以获得大量的权力,但追求一个过于狭窄的目标,最终损害人类或人类的价值,同时优化这一目标。 这在表面价值上看起来似乎不可信,但正如我将在下面指出的那样,提高AI能力比提高AI值更容易,这在理论上使这样的灾难成为可能。
失业。自动化程度的提高正在减少现有工作的数量,这已经是事实,因为一些经济学家和决策者正在讨论,如果工作的数量系统地小于寻求工作的人数,该怎么办。 如果人工智能系统允许大量的工作在相对较短的时间内自动化,那么我们可能没有时间计划或实施政策解决方案,然后可能会出现巨大的失业率飙升。 除了对失业者的直接影响外,这种激增还可能通过在全球范围内降低社会稳定而产生间接影响。
不透明系统。另外,越来越多的任务被委托给自主系统,从金融市场的交易到信息提要的汇总。 这些系统的不透明导致了诸如2010年Flash Crash之类的问题,并可能导致未来更大的问题。 从长远来看,随着人工智能系统变得越来越复杂,人类可能失去有意义地理解或干预这类系统的能力,如果在执行一级的职能中使用自治系统,这可能导致主权丧失(例如。 政府、经济)。
除了这些特定的风险,似乎很明显,最终,人工智能将能够在本质上的每个领域都优于人类。 在这一点上,如果不采取具体措施确保这一点,人类将继续对其未来产生直接的因果影响,这一点似乎令人怀疑。 虽然我认为这一天不会很快到来,但我认为现在值得思考的是,我们如何有意义地控制高能力的人工智能系统,我还认为,上面提出的许多风险(以及我们还没有想到的其他风险)将在较短的时间内发生。
最后,让我谈谈与其他人类工程系统相比,人工智能控制可能特别困难的一些具体方法:
- 人工智能可能是“类似代理的”,这意味着可能行为的空间要大得多;我们关于人工智能如何追求给定目标的直觉可能无法解释这一点,因此人工智能行为可能很难预测。
- 由于人工智能可能会从经验中学习,并且可能以比人类更快的串行处理速度运行,它的能力可能会迅速变化,排除了通常的审判和错误过程。
- 人工智能将在一个更开放的领域采取行动。 相反,我们现有的用于指定系统必要属性的工具只在窄域中工作得很好。 例如,对于桥梁,安全涉及成功完成少量任务的能力(例如。 不会摔倒)。 对于这些,它足以考虑良好的工程特性,如抗拉强度。 对于人工智能,我们可能希望它执行的任务数量很大,并且不清楚如何获得少量具有良好特征的属性以确保安全。
- 现有的机器学习框架使得人工智能很容易获得知识,但很难获得价值。 例如,虽然人工智能的现实模型是从数据中灵活学习的,但它的目标/效用函数在几乎所有的情况下都是硬编码的;一个例外是关于反向强化学习[5]的一些工作,但这仍然是一个非常新生的框架。 重要的是,知识(因此能力)和价值观之间的不对称是根本的,而不仅仅是对现有技术的陈述。 这是因为知识是经常被现实所告知的东西,而价值观只是被现实所告知的很弱:一个学习不正确事实的人工智能可能会注意到它做出了错误的预测,但世界可能永远不会“告诉”一个人工智能它学会了“错误的价值观”。 在技术层面上,虽然机器学习中的许多任务都是完全监督的或至少是半监督的,但价值获取是一项弱监督的任务。
总之:高能力人工智能构成了几个具体的全球灾难性风险,也有几个理由相信高能力人工智能将难以控制。 总之,这些都表明,控制高能力的人工智能系统是一个重要的问题,提出了独特的研究挑战。
长期目标,近期研究
上面我提出了一个论点,为什么从长远来看,人工智能可能需要大量的预防努力。 除此之外,我还相信,现在可以做一些重要的研究,以减少长期人工智能风险。 在本节中,我将详细介绍一些具体的研究项目,尽管我的清单并不是详尽无遗的。
- 价值学习:一般来说,设计学习价值/目标系统/效用函数的算法似乎在长期(也在短期内)很重要,而不是要求它们被手工编码。 这方面的一个框架是反向强化学习[5],尽管开发更多的框架也是有用的。
- 弱监督学习:如上所述,与信念相反,推断价值观是一个最弱监督的问题,因为人类本身对他们的价值往往是不正确的,因此任何提供关于价值观的完全注释的培训数据的尝试都可能包含系统错误。 也许可以通过观察人类的行动间接地推断价值观;然而,由于人类经常不道德地行事,人类的价值观随着时间的推移而改变,当前的人类行动与我们理想的长期价值观不一致,因此以天真的方式从行动中学习可能会导致问题。 因此,更好地从根本上理解弱监督学习-特别是关于在充分理解的假设下保证间接观测参数的恢复-似乎很重要。
- 正式规范/验证:使AI更安全的一种方法是为其行为正式指定desiderata,然后证明这些desiderata是满足的。 一个主要的开放挑战是找出如何有意义地为AI系统指定正式属性。 例如,即使一个语音转录系统在转录语音方面做了一项近乎完美的工作,也不清楚人们可能使用什么样的规范语言来正式陈述这一属性。 除此之外,虽然在正式核查方面有许多现有工作,但核查大型系统仍然是极具挑战性的。
- 透明度:只要人工智能的决策过程是透明的,就应该相对容易地确保其影响是积极的。 到决策过程不透明的程度,应该相对难以做到。 不幸的是,透明度似乎很难获得,特别是对于通过复杂的系列串行计算达成决策的AI。 因此,更好的技术使AI推理透明似乎很重要。
- 战略评估和规划:更好地了解人工智能可能产生的影响,将有助于作出更好的反应。 为此目的,绘制并研究具体的具体风险似乎是有价值的;例如,更好地理解机器学习可用于网络攻击的方式,或预测技术驱动的失业的可能影响,并确定围绕这些影响的有用政策。 明确指出超出我们目前所认识到的风险之外的其他可能的风险也显然是有益的。 最后,围绕高级人工智能的不同可能行为进行的思维实验将有助于指导直觉,并指出具体的技术问题。 其中一些任务是由人工智能研究人员最有效地执行的,而另一些任务则应与经济学家、政策专家、安全专家等合作完成。
上述内容构成了至少五个具体的研究方向,我认为今天可以在这方面取得重要进展,这将有意义地改善先进人工智能系统的安全,在许多情况下,在短期内也可能有附带利益。
相关工作
在高层次上,虽然我已经隐含地提供了上面的研究计划,但也有其他建议的研究计划。 也许最早提出的程序是来自MIRI[6],它集中在人工智能对齐问题,即使在简化的设置(例如。 具有无限的计算能力或易于指定的目标),希望以后推广到更复杂的设置。 生命的未来研究所(FLI)还出版了一份研究优先事项文件[7,8]重点更广泛,包括非技术专题,如自主武器的管制和基于人工智能的技术引起的经济变化。 我不一定赞同这两份文件,但认为这两份文件都是朝着正确方向迈出的一大步。 理想情况下,MIRI、FLI和其他人都将证明为什么他们认为他们的问题值得研究,我们可以让最好的论点和反驳上升到顶端。 这在一定程度上已经发生了[9,10,11]但我想看到更多的情况,特别是在机器学习和人工智能[12,13]方面具有专门知识的学者。
此外,我提出的几个具体论点与其他人已经提出的论点相似。 人工智能驱动的失业问题已由Brynjolfsson和McAfee[14]研究,并在FLI研究文件中讨论。 人工智能追求狭隘目标的问题已经通过Bostrom的“论文论证”[15]以及正交性论文[16]来阐述,该论文指出信仰和价值观是相互独立的。 虽然我不同意正交性论文的最强形式,但上面提出的关于价值学习困难的论点在许多情况下可以得出类似的结论。
Omohundro[17]认为,先进的代理将在几乎任何价值体系下追求某些仪器收敛的驱动,这是类似代理的系统不同于没有代理的系统的一种方式。 良好的[18]是第一个认为人工智能能力可以迅速提高的人。 Yudkowsky认为,在没有多少初始资源[19]的情况下,人工智能很容易获得权力,尽管他的例子假设创造了先进的生物技术。
Christiano主张透明人工智能系统的价值,并提出了“顾问游戏”框架,作为透明[20]的潜在操作。
结论
为了确保AI系统的安全,需要进行额外的研究,既要满足普通的短期工程需求,也要针对高能力的AI系统制定额外的预防措施。 在这两种情况下,今天都可以开展明确的研究方案,在许多情况下,这些方案似乎相对于其潜在的社会价值没有得到充分的研究。 因此,我认为,为提高人工智能系统的安全性而进行的有针对性的研究是一项有价值的工作,其额外好处是激发有趣的新研究方向。
致谢
感谢保罗·克里斯托诺、霍尔顿·卡诺夫斯基、珀西·梁、卢克·穆尔豪瑟、尼克·贝克斯特德、内特·苏亚雷斯和豪伊·莱佩尔对这篇文章的草稿提供了反馈。
参考资料
[1]D.Sculley等人。机器学习:技术债务的高级信用卡. 2014.
[2]休·杜姆三世和丹尼尔·马库。 统计分类器的领域适配。人工智能研究杂志,第101-126页,2006年。
[3]辛诺·潘和强阳。 迁移学习调查。IEEE知识和数据工程交易,22(10):1345-1359,2010年。
[4]DimitrisBertsimas,DavidB。 布朗和康斯坦丁·卡拉曼尼斯。 鲁棒优化的理论与应用。SIAM Review,53(3):464-501,2011。
[5]安德鲁·吴和斯图亚特·罗素。 逆强化学习的算法。 在机器学习国际会议上,第663-670页,2000年。
[6]内特·苏亚雷斯和贝尼亚·法伦斯坦。将超级情报与人类利益相结合:一个技术研究议程. 2014.
[7]斯图尔特·罗素,丹尼尔·杜威和马克斯·特马克。健壮有益的人工智能的研究重点. 2015.
[8]丹尼尔·杜威,斯图尔特·罗素和马克斯·特马克。对健壮和有益的人工智能的研究问题的调查. 2015.
[9]保罗·克里斯托。指导问题. 2015.
[10]保罗·克里斯托。稳定自我提升作为AI安全问题. 2015.
[11]卢克·穆哈瑟。如何研究超智力策略. 2014.
斯图尔特·罗素[12]。神话和蒙辛. 2014.
[13]汤姆·迪特里奇和埃里克·霍维茨。人工智能的好处和风险. 2015.
[14]埃里克·布莱恩约弗森和安德鲁·麦克·阿菲。第二个机器时代:在技术辉煌的时代,工作、进步和繁荣。 WWNorton&Company,2014年。
尼克·博斯特罗姆[15](2003年)。先进人工智能中的伦理问题。人类和人工智能决策的认知、情感和伦理方面。
尼克·博斯特罗姆[16]。 “超级智能意志:先进人工智能中的动机和工具理性”头脑和机器22.2(2012):71-85。
Stephen M.Omohundro[17](2008年)。基本的人工智能驱动。人工智能和应用前沿(IOS出版社)。
[18]欧文J。 很好。 “关于第一台超智力机器的推测”计算机的进展6.99(1965):31-83。
[19]Eliezer Yudkowsky。 “人工智能作为全球风险的正负因素”全球灾难性风险1(2008):303。
[20]保罗·克里斯托。游戏顾问. 2015.
【原文】
Introduction
There has been much recent discussion about AI risk, meaning specifically the potential pitfalls (both short-term and long-term) that AI with improved capabilities could create for society. Discussants include AI researchers such as Stuart Russell and Eric Horvitz and Tom Dietterich, entrepreneurs such as Elon Musk and Bill Gates, and research institutes such as the Machine Intelligence Research Institute (MIRI) and Future of Humanity Institute (FHI); the director of the latter institute, Nick Bostrom, has even written a bestselling book on this topic. Finally, ten million dollars in funding have been earmarked towards research on ensuring that AI will be safe and beneficial. Given this, I think it would be useful for AI researchers to discuss the nature and extent of risks that might be posed by increasingly capable AI systems, both short-term and long-term. As a PhD student in machine learning and artificial intelligence, this essay will describe my own views on AI risk, in the hopes of encouraging other researchers to detail their thoughts, as well.
For the purposes of this essay, I will define “AI” to be technology that can carry out tasks with limited or no human guidance, “advanced AI” to be technology that performs substantially more complex and domain-general tasks than are possible today, and “highly capable AI” to be technology that can outperform humans in all or almost all domains. As the primary target audience of this essay is other researchers, I have used technical terms (e.g. weakly supervised learning, inverse reinforcement learning) whenever they were useful, though I have also tried to make the essay more generally accessible when possible.
Outline
I think it is important to distinguish between two questions. First, does artificial intelligence merit the same degree of engineering safety considerations as other technologies (such as bridges)? Second, does artificial intelligence merit additional precautions, beyond those that would be considered typical? I will argue that the answer is yes to the first, even in the short term, and that current engineering methodologies in the field of machine learning do not provide even a typical level of safety or robustness. Moreover, I will argue that the answer to the second question in the long term is likely also yes — namely, that there are important ways in which highly capable artificial intelligence could pose risks which are not addressed by typical engineering concerns.
The point of this essay is not to be alarmist; indeed, I think that AI is likely to be net-positive for humanity. Rather, the point of this essay is to encourage a discussion about the potential pitfalls posed by artificial intelligence, since I believe that research done now can mitigate many of these pitfalls. Without such a discussion, we are unlikely to understand which pitfalls are most important or likely, and thus unable to design effective research programs to prevent them.
A common objection to discussing risks posed by AI is that it seems somewhat early on to worry about such risks, and the discussion is likely to be more germane if we wait to have it until after the field of AI has advanced further. I think this objection is quite reasonable in the abstract; however, as I will argue below, I think we do have a reasonable understanding of at least some of the risks that AI might pose, that some of these will be realized even in the medium term, and that there are reasonable programs of research that can address these risks, which in many cases would also have the advantage of improving the usability of existing AI systems.
Ordinary Engineering
There are many issues related to AI safety that are just a matter of good engineering methodology. For instance, we would ideally like systems that are transparent, modular, robust, and work under well-understood assumptions. Unfortunately, machine learning as a field has not developed very good methodologies for obtaining any of these things, and so this is an important issue to remedy. In other words, I think we should put at least as much thought into building an AI as we do into building a bridge.
Just to be very clear, I do not think that machine learning researchers are bad engineers; looking at any of the open source tools such as Torch, Caffe, MLlib, and others make it clear that many machine learning researchers are also good software engineers. Rather, I think that as a field our methodologies are not mature enough to address the specific engineering desiderata of statistical models (in contrast to the algorithms that create them). In particular, the statistical models obtained from machine learning algorithms tend to be:
- Opaque: Many machine learning models consist of hundreds of thousands of parameters, making it difficult to understand how predictions are made. Typically, practitioners resort to error analysis examining the covariates that most strongly influence each incorrect prediction. However, this is not a very sustainable long-term solution, as it requires substantial effort even for relatively narrow-domain systems.
- Monolithic: In part due to their opacity, models act as a black box, with no modularity or encapsulation of behavior. Though machine learning systems are often split into pipelines of smaller models, the lack of encapsulation can make these pipelines even harder to manage than a single large model; indeed, since machine learning models are by design optimized for a particular input distribution (i.e. whatever distribution they are trained on), we end up in a situation where “Changing Anything Changes Everything” [1].
- Fragile: As another consequence of being optimized for a particular training distribution, machine learning models can have arbitrarily poor performance when that distribution shifts. For instance, Daumé and Marcu [2] show that a named entity classifier with 92% accuracy on one dataset drops to 58% accuracy on a superficially similar dataset. Though such issues are partially addressed by work on transfer learning and domain adaptation [3], these areas are not very developed compared to supervised learning.
- Poorly understood: Beyond their fragility, understanding when a machine learning model will work is difficult. We know that a model will work if it is tested on the same distribution it is trained on, and have some extensions beyond this case (e.g. based on robust optimization [4]), but we have very little in the way of practically relevant conditions under which a model trained in one situation will work well in another situation. Although they are related, this issue differs from the opacity issue above in that it relates to making predictions about the system’s future behavior (in particular, generalization to new situations), versus understanding the internal workings of the current system.
That these issues plague machine learning systems is likely uncontroversial among machine learning researchers. However, in comparison to research focused on extending capabilities, very little is being done to address them. Research in this area therefore seems particularly impactful, especially given the desire to deploy machine learning systems in increasingly complex and safety-critical situations.
Extraordinary Engineering
Does AI merit additional safety precautions, beyond those that are considered standard engineering practice in other fields? Here I am focusing only on the long-term impacts of advanced or highly capable AI systems.
My tentative answer is yes; there seem to be a few different ways in which AI could have bad effects, each of which seems individually unlikely but not implausible. Even if each of the risks identified so far are not likely, (i) the total risk might be large, especially if there are additional unidentified risks, and (ii) the existence of multiple “near-misses” motivates closer investigation, as it may suggest some underlying principle that makes AI risk-laden. In the sequel I will focus on so-called “global catastrophic” risks, meaning risks that could affect a large fraction of the earth’s population in a material way. I have chosen to focus on these risks because I think there is an important difference between an AI system messing up in a way that harms a few people (which would be a legal liability but perhaps should not motivate a major effort in terms of precautions) and an AI system that could cause damage on a global scale. The latter would justify substantial precautions, and I want to make it clear that this is the bar I am setting for myself.
With that in place, below are a few ways in which advanced or highly capable AI could have specific global catastrophic risks.
Cyber-attacks. There are two trends which taken together make the prospect of AI-aided cyber-attacks seem worrisome. The first trend is simply the increasing prevalence of cyber-attacks; even this year we have seen Russia attack Ukraine, North Korea attack Sony, and China attack the U.S. Office of Personnel Management. Secondly, the “Internet of Things” means that an increasing number of physical devices will be connected to the internet. Assuming that software exists to autonomously control them, many internet-enabled devices such as cars could be hacked and then weaponized, leading to a decisive military advantage in a short span of time. Such an attack could be enacted by a small group of humans aided by AI technologies, which would make it hard to detect in advance. Unlike other weaponizable technology such as nuclear fission or synthetic biology, it would be very difficult to control the distribution of AI since it does not rely on any specific raw materials. Finally, note that even a team with relatively small computing resources could potentially “bootstrap” to much more computing power by first creating a botnet with which to do computations; to date, the largest botnet has spanned 30 million computers and several other botnets have exceeded 1 million.
Autonomous weapons. Beyond cyber-attacks, improved autonomous robotics technology combined with ubiquitous access to miniature UAVs (“drones”) could allow both terrorists and governments to wage a particularly pernicious form of remote warfare by creating weapons that are both cheap and hard to detect or defend against (due to their small size and high maneuverability). Beyond direct malicious intent, if autonomous weapons systems or other powerful autonomous systems malfunction then they could cause a large amount of damage.
Mis-optimization. A highly capable AI could acquire a large amount of power but pursue an overly narrow goal, and end up harming humans or human value while optimizing for this goal. This may seem implausible at face value, but as I will argue below, it is easier to improve AI capabilities than to improve AI values, making such a mishap possible in theory.
Unemployment. It is already the case that increased automation is decreasing the number of available jobs, to the extent that some economists and policymakers are discussing what to do if the number of jobs is systematically smaller than the number of people seeking work. If AI systems allow a large number of jobs to be automated over a relatively short time period, then we may not have time to plan or implement policy solutions, and there could then be a large unemployment spike. In addition to the direct effects on the people who are unemployed, such a spike could also have indirect consequences by decreasing social stability on a global scale.
Opaque systems. It is also already the case that increasingly many tasks are being delegated to autonomous systems, from trades in financial markets to aggregation of information feeds. The opacity of these systems has led to issues such as the 2010 Flash Crash and will likely lead to larger issues in the future. In the long term, as AI systems become increasingly complex, humans may lose the ability to meaningfully understand or intervene in such systems, which could lead to a loss of sovereignty if autonomous systems are employed in executive-level functions (e.g. government, economy).
Beyond these specific risks, it seems clear that, eventually, AI will be able to outperform humans in essentially every domain. At that point, it seems doubtful that humanity will continue to have direct causal influence over its future unless specific measures are put in place to ensure this. While I do not think this day will come soon, I think it is worth thinking now about how we might meaningfully control highly capable AI systems, and I also think that many of the risks posed above (as well as others that we haven’t thought of yet) will occur on a somewhat shorter time scale.
Let me end with some specific ways in which control of AI may be particularly difficult compared to other human-engineered systems:
- AI may be “agent-like”, which means that the space of possible behaviors is much larger; our intuitions about how AI will act in pursuit of a given goal may not account for this and so AI behavior could be hard to predict.
- Since an AI would presumably learn from experience, and will likely run at a much faster serial processing speed than humans, its capabilities may change rapidly, ruling out the usual process of trial-and-error.
- AI will act in a much more open-ended domain. In contrast, our existing tools for specifying the necessary properties of a system only work well in narrow domains. For instance, for a bridge, safety relates to the ability to successfully accomplish a small number of tasks (e.g. not falling over). For these, it suffices to consider well-characterized engineering properties such as tensile strength. For AI, the number of tasks we would potentially want it to perform is large, and it is unclear how to obtain a small number of well-characterized properties that would ensure safety.
- Existing machine learning frameworks make it very easy for AI to acquire knowledge, but hard to acquire values. For instance, while an AI’s model of reality is flexibly learned from data, its goal/utility function is hard-coded in almost all situations; an exception is some work on inverse reinforcement learning [5], but this is still a very nascent framework. Importantly, the asymmetry between knowledge (and hence capabilities) and values is fundamental, rather than simply a statement about existing technologies. This is because knowledge is something that is regularly informed by reality, whereas values are only weakly informed by reality: an AI which learns incorrect facts could notice that it makes wrong predictions, but the world might never “tell” an AI that it learned the “wrong values”. At a technical level, while many tasks in machine learning are fully supervised or at least semi-supervised, value acquisition is a weakly supervised task.
In summary: there are several concrete global catastrophic risks posed by highly capable AI, and there are also several reasons to believe that highly capable AI would be difficult to control. Together, these suggest to me that the control of highly capable AI systems is an important problem posing unique research challenges.
Long-term Goals, Near-term Research
Above I presented an argument for why AI, in the long term, may require substantial precautionary efforts. Beyond this, I also believe that there is important research that can be done right now to reduce long-term AI risks. In this section I will elaborate on some specific research projects, though my list is not meant to be exhaustive.
- Value learning: In general, it seems important in the long term (and also in the short term) to design algorithms for learning values / goal systems / utility functions, rather than requiring them to be hand-coded. One framework for this is inverse reinforcement learning [5], though developing additional frameworks would also be useful.
- Weakly supervised learning: As argued above, inferring values, in contrast to beliefs, is an at most weakly supervised problem, since humans themselves are often incorrect about what they value and so any attempt to provide fully annotated training data about values would likely contain systematic errors. It may be possible to infer values indirectly through observing human actions; however, since humans often act immorally and human values change over time, current human actions are not consistent with our ideal long-term values, and so learning from actions in a naive way could lead to problems. Therefore, a better fundamental understanding of weakly supervised learning — particularly regarding guaranteed recovery of indirectly observed parameters under well-understood assumptions — seems important.
- Formal specification / verification: One way to make AI safer would be to formally specify desiderata for its behavior, and then prove that these desiderata are met. A major open challenge is to figure out how to meaningfully specify formal properties for an AI system. For instance, even if a speech transcription system did a near-perfect job of transcribing speech, it is unclear what sort of specification language one might use to state this property formally. Beyond this, though there is much existing work in formal verification, it is still extremely challenging to verify large systems.
- Transparency: To the extent that the decision-making process of an AI is transparent, it should be relatively easy to ensure that its impact will be positive. To the extent that the decision-making process is opaque, it should be relatively difficult to do so. Unfortunately, transparency seems difficult to obtain, especially for AIs that reach decisions through complex series of serial computations. Therefore, better techniques for rendering AI reasoning transparent seem important.
- Strategic assessment and planning: Better understanding of the likely impacts of AI will allow a better response. To this end, it seems valuable to map out and study specific concrete risks; for instance, better understanding ways in which machine learning could be used in cyber-attacks, or forecasting the likely effects of technology-driven unemployment, and determining useful policies around these effects. It would also be clearly useful to identify additional plausible risks beyond those of which we are currently aware. Finally, thought experiments surrounding different possible behaviors of advanced AI would help inform intuitions and point to specific technical problems. Some of these tasks are most effectively carried out by AI researchers, while others should be done in collaboration with economists, policy experts, security experts, etc.
The above constitute at least five concrete directions of research on which I think important progress can be made today, which would meaningfully improve the safety of advanced AI systems and which in many cases would likely have ancillary benefits in the short term, as well.
Related Work
At a high level, while I have implicitly provided a program of research above, there are other proposed research programs as well. Perhaps the earliest proposed program is from MIRI [6], which has focused on AI alignment problems that arise even in simplified settings (e.g. with unlimited computing power or easy-to-specify goals) in hopes of later generalizing to more complex settings. The Future of Life Institute (FLI) has also published a research priorities document [7, 8] with a broader focus, including non-technical topics such as regulation of autonomous weapons and economic shifts induced by AI-based technologies. I do not necessarily endorse either document, but think that both represent a big step in the right direction. Ideally, MIRI, FLI, and others will all justify why they think their problems are worth working on and we can let the best arguments and counterarguments rise to the top. This is already happening to some extent [9, 10, 11] but I would like to see more of it, especially from academics with expertise in machine learning and AI [12, 13].
In addition, several specific arguments I have advanced are similar to those already advanced by others. The issue of AI-driven unemployment has been studied by Brynjolfsson and McAfee [14], and is also discussed in the FLI research document. The problem of AI pursuing narrow goals has been elaborated through Bostrom’s “paperclipping argument” [15] as well as the orthogonality thesis [16], which states that beliefs and values are independent of each other. While I disagree with the orthogonality thesis in its strongest form, the arguments presented above for the difficulty of value learning can in many cases reach similar conclusions.
Omohundro [17] has argued that advanced agents would pursue certain instrumentally convergent drives under almost any value system, which is one way in which agent-like systems differ from systems without agency. Good [18] was the first to argue that AI capabilities could improve rapidly. Yudkowsky has argued that it would be easy for an AI to acquire power given few initial resources [19], though his example assumes the creation of advanced biotechnology.
Christiano has argued for the value of transparent AI systems, and proposed the “advisor games” framework as a potential operationalization of transparency [20].
Conclusion
To ensure the safety of AI systems, additional research is needed, both to meet ordinary short-term engineering desiderata as well as to make the additional precautions specific to highly capable AI systems. In both cases, there are clear programs of research that can be undertaken today, which in many cases seem to be under-researched relative to their potential societal value. I therefore think that well-directed research towards improving the safety of AI systems is a worthwhile undertaking, with the additional benefit of motivating interesting new directions of research.
Acknowledgments
Thanks to Paul Christiano, Holden Karnofsky, Percy Liang, Luke Muehlhauser, Nick Beckstead, Nate Soares, and Howie Lempel for providing feedback on a draft of this essay.
References
[1] D. Sculley, et al. Machine Learning: The High-Interest Credit Card of Technical Debt. 2014.
[2] Hal Daumé III and Daniel Marcu. Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research, pages 101–126, 2006.
[3] Sinno J. Pan and Qiang Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10):1345–1359, 2010.
[4] Dimitris Bertsimas, David B. Brown, and Constantine Caramanis. Theory and applications of robust optimization. SIAM Review, 53(3):464–501, 2011.
[5] Andrew Ng and Stuart Russell. Algorithms for inverse reinforcement learning. In International Conference in Machine Learning, pages 663–670, 2000.
[6] Nate Soares and Benja Fallenstein. Aligning Superintelligence with Human Interests: A Technical Research Agenda. 2014.
[7] Stuart Russell, Daniel Dewey, and Max Tegmark. Research priorities for robust and beneficial artificial intelligence. 2015.
[8] Daniel Dewey, Stuart Russell, and Max Tegmark. A survey of research questions for robust and beneficial AI. 2015.
[9] Paul Christiano. The Steering Problem. 2015.
[10] Paul Christiano. Stable self-improvement as an AI safety problem. 2015.
[11] Luke Muehlhauser. How to study superintelligence strategy. 2014.
[12] Stuart Russell. Of Myths and Moonshine. 2014.
[13] Tom Dietterich and Eric Horvitz. Benefits and Risks of Artificial Intelligence. 2015.
[14] Erik Brynjolfsson and Andrew McAfee. The second machine age: work, progress, and prosperity in a time of brilliant technologies. WW Norton & Company, 2014.
[15] Nick Bostrom (2003). Ethical Issues in Advanced Artificial Intelligence. Cognitive, Emotive and Ethical Aspects of Decision Making in Humans and in Artificial Intelligence.
[16] Nick Bostrom. “The superintelligent will: Motivation and instrumental rationality in advanced artificial agents.” Minds and Machines 22.2 (2012): 71-85.
[17] Stephen M. Omohundro (2008). The Basic AI Drives. Frontiers in Artificial Intelligence and Applications (IOS Press).
[18] Irving J. Good. “Speculations concerning the first ultraintelligent machine.” Advances in computers 6.99 (1965): 31-83.
[19] Eliezer Yudkowsky. “Artificial intelligence as a positive and negative factor in global risk.” Global catastrophic risks 1 (2008): 303.
[20] Paul Christiano. Advisor Games. 2015.