不确定性人工智能_人工智能与不确定性之间的相互作用

最新推荐文章于 2024-01-04 00:03:52 发布

勃斯李

最新推荐文章于 2024-01-04 00:03:52 发布

阅读量902

点赞数

文章标签：人工智能 python

原文链接：https://medium.com/swlh/the-interplay-between-artificial-intelligence-and-uncertainty-a19e01197230

版权

不确定性人工智能

In this first article, we highlight how intelligence and rationality are tightly coupled with the uncertainty present in the world. We also discuss how uncertainty plays a critical role in designing beneficial general-purpose artificial intelligence (AI), as described by the work of Stuart Russel and Peter Norvig on Modern AI [1][2].

在第一篇文章中，我们强调了智力和理性如何与世界上存在的不确定性紧密结合在一起。正如斯图尔特·拉塞尔(Stuart Russel)和彼得·诺维格(Peter Norvig)在现代人工智能[1]的工作中所描述的，我们还讨论了不确定性如何在设计有益的通用人工智能(AI)中发挥关键作用。

Human intelligence, both social and individual, is what has been driving advances achieved by the human civilization. Having access to even greater intelligence in the form of machine artificial intelligence (AI) can potentially lead to even further advances, and will help us solve major problems such as eliminating poverty and disease, solving open scientific and mathematical problems, and offering personal assistance targeting billions of people worldwide. This is subject of course to the finite resources of land and raw material available on earth.

人类智慧，无论是社会的还是个人的，都是推动人类文明取得进步的动力。以机器人工智能(AI)的形式获得更大的情报可能会带来更大的进步，并将帮助我们解决重大问题，例如消除贫困和疾病，解决公开的科学和数学问题以及针对个人提供帮助全球数十亿人。当然，这取决于地球上可用的土地和原材料的有限资源。

Scientists differentiate between narrow AI that is designed to perform a narrow task, and that it may outperform human beings at this task, and general-purpose AI that outperforms humans at nearly every cognitive task. An example of narrow AI is the deep learning techniques that produced starting 2011 huge advances in speech recognition, visual object recognition, and machine translation. Machines now match or exceed human capabilities in these areas. On the other hand, we can imagine general-purpose AI to have access to all the knowledge and skills of our human race, with their embodiments in the real world just differing in physical capabilities depending on the application.

科学家们区分了旨在执行狭窄任务的狭窄AI(在此任务下可能胜过人类)和在几乎所有认知任务中胜过人类的通用AI。狭窄的AI的一个例子是深度学习技术，该技术从2011年开始在语音识别，视觉对象识别和机器翻译方面取得了巨大进步。现在，这些领域的机器达到或超过了人类的能力。另一方面，我们可以想象通用AI可以访问我们人类的所有知识和技能，它们在现实世界中的体现根据应用的不同而在物理功能上也有所不同。

Narrow AI is becoming a pervasive aspect of our present life, and it is making the news headlines on a weekly even daily basis. It is hard to predict when super-intelligent general-purpose AI will arrive, but nevertheless we must plan for the possibility that machines will far exceed the human capacity for decision making in the real world. Prediction of the arrival of super-intelligent AI is difficult because as we know from other scientific fields (nuclear physics for example), scientific breakthroughs are hard to predict, and perhaps this is why there is a long history of such predictions going wrong. However, we should not deny the possibility of success when our future is at stake. We are working on developing entities that are far more powerful than humans, so we need to ensure they never have power over us.

狭窄的AI正在成为我们当今生活中无处不在的方面，并且它每周甚至每天都成为新闻头条。很难预测超级智能通用人工智能何时会到来，但是尽管如此，我们仍必须为机器可能远远超出现实世界中决策能力的可能性做出计划。很难预测超级智能AI的到来，因为正如我们从其他科学领域(例如核物理)知道的那样，很难预测科学的突破，也许这就是为什么这样的预测长期存在错误的原因。但是，当我们的未来处于危急关头时，我们不应否认成功的可能性。我们正在开发比人类强大得多的实体，因此我们需要确保它们永远不会掌控我们。

Super-human intelligence can be the biggest event in human history and its last. To give you an example of how current narrow AI technologies, that are not particularly intelligent, can still affect billions of people around the world, consider the content selection algorithm used on social media platforms whose objective is to maximize user click-through (proportional to monetary revenue), by presenting the user with items to click on. When the user preferences are hard to predict, reinforcement learning algorithms, instead of presenting the user with items they like, will try to make the user preferences more predictable by changing the user’s preferences themselves. As we know, extreme preferences are easier to predict (think extreme left or right in politics), and so the algorithm will potentially attempt to “radicalize” the user’s mind in order to maximize its click-through reward. In this case, there is a mismatch between the human’s intended objective of increasing revenue and the AI’s realized objective of maximizing clicks by biasing people’s behavior.

超级人类情报可能是人类历史上最大的事件，也是最后的事件。为了举例说明当前并非特别智能的狭窄AI技术如何仍然可以影响全球数十亿人，请考虑社交媒体平台上使用的内容选择算法，其目的是最大程度地提高用户点击率(与货币收入)，方法是向用户展示要点击的项目。当用户偏好难以预测时，强化学习算法不会通过向用户展示他们喜欢的项目，而是会通过更改用户的偏好来使用户偏好更具可预测性。众所周知，极端的偏好更容易预测(在政治中认为极端偏左或偏右)，因此该算法将潜在地尝试“激进”用户的思想，以最大程度地提高点击率。在这种情况下，人类增加收入的预期目标与AI通过使人们的行为产生偏见来最大化点击次数的已实现目标之间并不匹配。

Humans are intelligent to the extent that our actions (based on what we perceive) can be expected to achieve our objectives. And in the example above, we can see that we designed the machine using the same notion in the sense that its actions can be expected to achieve its objectives. The objectives are fed into the machine by a human in the form of an optimization formulation. Using this definition of intelligence, the problem in the example above occurred because the purpose put into the machine is not the purpose which we really desire. As humans, we are often uncertain about our objectives and knowledge, so if we put the wrong objective into a machine that is more intelligent than us, it will achieve the objective, but this might be catastrophic. Also, we cannot continue to rely on ironing out the major errors in an objective function by trial and error especially for machines of increasing intelligence and increasingly global impact. So, it seems necessary to remove the assumption that machines should have a definite objective as we discuss below.

在一定程度上，人类是聪明的，可以预期我们的行动(基于我们的感知)可以实现我们的目标。在上面的示例中，我们可以看到我们使用相同的概念设计了机器，这意味着可以预期其动作可以实现其目标。人工将目标以优化公式的形式输入到机器中。使用这种智能定义，出现上述示例中的问题是因为放入机器的目的不是我们真正想要的目的。作为人类，我们通常不确定目标和知识，因此，如果将错误的目标放入比我们更智能的机器中，它将实现目标，但这可能是灾难性的。同样，我们不能继续依靠反复试验来消除目标函数中的主要错误，尤其是对于智能增强和全球影响力日益增强的机器。因此，似乎有必要取消以下假设：机器应该具有确定的目标。

We focus hereafter on understanding intelligence, human intelligence in particular, since explaining how the mind works is a step towards developing beneficial artificial intelligence. The first cornerstone of intelligence is learning, because we can use it to adapt to a range of circumstances. There is so much we still do not understand about the human brain, but one of the aspects related to learning that we are beginning to understand is the reward system in the brain. This is an internal signaling system, mediated by dopamine, that connects positive and negative stimuli to behavior. It is extremely difficult for an organism to decide what actions are most likely, in the long run, to result in successful propagation of its genes, so evolution provided us with breadcrumbs. This is closely related to the method of reinforcement learning developed in AI.

此后，我们将重点放在理解智能，尤其是人类智能上，因为解释大脑的工作方式是朝着发展有益的人工智能迈出的一步。智力的第一个基石是学习，因为我们可以使用它来适应各种情况。关于人脑，我们仍然有很多不了解的地方，但是我们开始了解的与学习有关的方面之一就是大脑的奖励系统。这是一个由多巴胺介导的内部信号系统，可将正面和负面刺激与行为联系起来。从长远来看，生物体很难决定最有可能导致其基因成功繁殖的作用，因此进化为我们提供了面包屑。这与AI中开发的强化学习方法密切相关。

However it is important to notice that learning and evolution does not necessarily point in the same direction, since reward can be obtained by taking drugs and playing video games all day, which will reduce the likelihood that one’s genes will propagate. Similarly, our understanding of intelligence was first based on the assumption that what a person wants is fixed and known, and that rational action is one that easily and best produces the desired goal. However, this did not take uncertainty into account, where in the real world, few actions or sequences of actions are truly guaranteed to achieve the intended end. Here probability and gambling play a central role in explaining the trade-offs between the certainty of success and the cost of ensuring that degree of certainty. Furthermore, in the eighteenth century, Swiss mathematician Daniel Bernoulli, explained that bets should be evaluated according to expected utility rather than expected monetary value to reflect what is useful or beneficial to a person. Utility is distinct from monetary value and exhibits diminishing returns with respect to money. Moreover, the utility values of bets are not directly observable but are inferred from the preferences exhibited by an individual. In the middle of the twentieth century, John von Neumann and Oskar Morgenstern published an axiomatic basis for utility theory which states the following: as long as the preferences exhibited by an individual satisfy certain basic axioms that any rational agent should satisfy, then necessarily the choices made by that individual can be described as maximizing the expected value of a utility function. In short, a rational agent acts to maximize expected utility. Moreover, maximizing expected utility may not require calculating any expectations or any utilities. It is a purely external description of a rational entity. There is a lot of debate whether human beings are rational or not. It can be argued that our preferences only seem irrational because we try to compensate for the mismatch between our small brains and the complexity of the decision problem that we face all the time.

但是，重要的是要注意，学习和进化并不一定指向同一方向，因为可以通过全天服用毒品和玩电子游戏来获得回报，这将降低一个人的基因传播的可能性。同样，我们对智力的理解首先基于这样一个假设，即一个人想要的东西是固定的并且是已知的，而理性的行动是可以轻松，最佳地实现预期目标的行动。但是，这没有考虑到不确定性，在现实世界中，只有很少的动作或动作序列可以真正保证达到预期的目的。在此，概率和赌博在解释成功的确定性与确保该确定性程度的成本之间的取舍方面起着核心作用。此外，在18世纪，瑞士数学家丹尼尔·伯努利(Daniel Bernoulli)解释说，应该根据预期效用而非预期货币价值来评估赌注，以反映对人有用或有益的东西。效用与货币价值不同，并且对货币的收益递减。此外，下注的效用值不是直接可观察到的，而是根据个人的偏好来推断的。在20世纪中叶，约翰·冯·诺伊曼(John von Neumann)和奥斯卡·摩根斯特恩(Oskar Morgenstern)发表了效用理论的公理基础，其中阐明了以下内容：只要个人所表现出的偏好满足了任何理性主体应满足的某些基本公理，那么选择必然该人所做的事情可以描述为使效用函数的期望值最大化。简而言之，理性主体的作用是使预期效用最大化。而且，最大化期望效用可能不需要计算任何期望或任何效用。它是对理性实体的纯粹外部描述。人类是否理性，存在着很多争论。可以说，我们的偏好似乎只是不合理的，因为我们试图补偿小脑子之间的不匹配以及我们一直面临的决策问题的复杂性。

Moreover, with the presence of other humans and machines with different objectives than ours, an agent will need yet another way to make rational decisions. This is where game theory plays a big role in attempting to extend the notion of rationality to situations with multiple agents. Here, just like gambling, the trick is that every agent does not choose one action, but a randomized strategy instead. Each agent mentally tosses a suitably biased coin (depending on their strategy) just before picking an action, so they do not give away their intentions. By acting unpredictably, even if the competing agent figures out our randomized strategy, there is not much they can do about it without a crystal ball.

而且，在其他人和机器的目标与我们的目标不同的情况下，代理商将需要另一种方式做出理性的决定。这就是博弈论在试图将理性的概念扩展到具有多个主体的情况时发挥重要作用的地方。在这里，就像赌博一样，诀窍在于每个代理人都不选择一个动作，而是选择随机策略。每个特工在采取行动之前都会在精神上投掷适当偏见的硬币(取决于他们的策略)，因此他们不会放弃自己的意图。即使竞争者想出了我们的随机策略，通过不可预测的行动，如果没有水晶球，他们也无能为力。

Based on this new notion of intelligence, AI researchers are starting to adopt the tools of probability theory and utility theory and thereby connecting AI to other fields such as statistics, control theory, economics, and operations research. This change marked the beginning of what some observers call modern AI. However, the way we build intelligent agents depends on the nature of the problem we face. We make a list of factors that can change the nature of the problem an agent is facing:

基于这种新的智能概念，人工智能研究人员开始采用概率论和效用理论的工具，从而将人工智能与统计，控制理论，经济学和运筹学等其他领域联系起来。这一变化标志着一些观察者称之为现代AI的开始。但是，我们构建智能代理的方式取决于我们面临的问题的性质。我们列出了可以改变座席面临的问题性质的因素：

1. the nature of the environment the agent will operate in, and whether this environment is fully observable or partially observable.

1.代理将在其中运行的环境的性质，以及该环境是完全可观察还是部分可观察。

2. whether the environment and actions are discrete or effectively continuous.

2.环境和行动是离散的还是实际上是连续的。

3. whether the environment contains other agents or not.

3.环境是否包含其他代理。

4. whether the outcomes of actions are predictable or unpredictable.

4.行动的结果是可预测的还是不可预测的。

5. whether the rules or “physics” of the environment are known or unknown.

5.环境的规则或“物理”是已知的还是未知的。

6. whether the environment is dynamically changing, so that the time to make decisions is tightly constrained or not.

6.环境是否在动态变化，因此制定决策的时间受到严格限制。

7. the length of the horizon over which decision quality is measured according to the objective; this may be short, of intermediate duration, or very long .

7.根据目标衡量决策质量的时间范围；这可能很短，持续时间很长或非常长。

Building an AI system for any of these problems requires a great deal of problem-specific engineering. On the other hand, the goal of general-purpose AI would be a method that is applicable across all problem types . The agent would learn what it needs to learn from all the available resources, ask questions when necessary, and begin formulating and executing plans that work. Again, just because such a general-purpose method does not yet exist, it does not mean we are not moving closer, and a lot of progress towards general AI results from research on narrow AI. Currently, instead of building one agent with general-purpose AI, we instead build a group of agents each addressing a different type of problem.

针对这些问题中的任何一个构建AI系统都需要大量针对特定问题的工程。另一方面，通用AI的目标将是适用于所有问题类型的方法。代理将从所有可用资源中了解需要学习的内容，在必要时提出问题，然后开始制定和执行有效的计划。再有，仅仅因为还没有这样一种通用方法，并不意味着我们没有走得更近，而对狭义AI的研究也为通用AI带来了许多进步。当前，我们不是构建具有通用AI的代理，而是构建一组分别解决不同类型问题的代理。

For each of these agents to deal with uncertainty, instead of using a goal, modern AI uses a utility function to describe the desirability of different outcomes or sequences of states. The utility is expressed as a sum of rewards for each of the states in the sequence. Therefore, the machine aims to produce behavior that maximizes its expected sum of rewards, averaged over the possible outcomes weighted by their probabilities. For this purpose, researchers have developed a variety of algorithms for decision making under uncertainty. One example of such algorithms is what are known as “dynamic programming” algorithms. These are the probabilistic cousins of lookahead search and planning. For the case when the number of states is enormous and the reward comes only at the end of the game, AI researchers have developed a method called reinforcement learning that learns from direct experience of reward signals in the environment.

对于这些不确定因素中的每一个，而不是使用目标，现代AI使用效用函数来描述不同结果或状态序列的可取性。效用表示为序列中每个状态的奖励总和。因此，该机器旨在产生一种行为，该行为会最大化其预期的奖励总和，并按其概率加权的可能结果平均。为此，研究人员开发了各种算法来确定不确定性。这种算法的一个例子是所谓的“动态编程”算法。这些是先行搜索和计划的概率表亲。对于状态数量巨大且奖励仅在游戏结束时出现的情况，AI研究人员开发了一种称为强化学习的方法，该方法从环境中的奖励信号的直接经验中学习。

In this article, we summarized how uncertainty affects intelligence and designing AI agents. In the upcoming article, we discuss the potential danger and misuse of super intelligent AI.

在本文中，我们总结了不确定性如何影响情报和设计AI代理。在接下来的文章中，我们讨论了超级智能AI的潜在危险和滥用。

[1] Russel, S., Norvig, P. “Artificial Intelligence: A Modern Approach,” Pearson 2020.

[1] Russel，S.，Norvig，P。 “人工智能：一种现代方法”，皮尔逊(Pearson)2020年。

[2] Russel, S. “Human Compatible — Artificial Intelligence And The Problem Of Control,” Penguin Random House 2019.

[2] Russel，S. “与人类兼容的人工智能和控制问题”， Penguin Random House，2019年。