人工智能的未来是强化学习_多主体强化学习与AI的未来

人工智能的未来是强化学习

Editor’s note: The Towards Data Science podcast’s “Climbing the Data Science Ladder” series is hosted by Jeremie Harris. Jeremie helps run a data science mentorship startup called SharpestMinds. You can listen to the podcast below:

编者按:迈向数据科学播客的“攀登数据科学阶梯”系列由杰里米·哈里斯(Jeremie Harris)主持。 杰里米(Jeremie)帮助运营一家名为 SharpestMinds 的数据科学指导创业公司 您可以收听以下播客:

Reinforcement learning has gotten a lot of attention recently, thanks in large part to systems like AlphaGo and AlphaZero, which have highlighted its immense potential in dramatic ways. And while the RL systems we’ve developed have accomplished some impressive feats, they’ve done so in a fairly naive way. Specifically, they haven’t tended to confront multi-agent problems, which require collaboration and competition. But even when multi-agent problems have been tackled, they’ve been addressed using agents that just assume other agents are an uncontrollable part of the environment, rather than entities with rich internal structures that can be reasoned and communicated with.

最近,强化学习受到了广泛关注,这在很大程度上要归功于像AlphaGo和AlphaZero这样的系统,它们以戏剧性的方式彰显了其巨大的潜力。 尽管我们开发的RL系统已经完成了一些令人印象深刻的壮举,但它们以一种相当幼稚的方式做到了。 具体来说,他们并没有倾向于面对多代理问题,而这需要协作和竞争。 但是即使解决了多主体问题,也可以使用那些仅假定其他主体是环境不可控制的组成部分的主体来解决这些问题,而不是使用具有丰富内部结构的实体来进行推理和沟通。

That’s all finally changing, with new research into the field of multi-agent RL, led in part by OpenAI, Oxford and Google alum, and current FAIR research scientist Jakob Foerster. Jakob’s research is aimed specifically at understanding how reinforcement learning agents can learn to collaborate better and navigate complex environments that include other agents, whose behavior they try to model. In essence, Jakob is working on giving RL agents a theory of mind.

随着多代理RL领域的新研究(部分由OpenAI,牛津大学和Google校友以及现任FAIR研究科学家Jakob Foerster领导),这一切最终改变了。 Jakob的研究专门针对了解强化学习代理如何学习更好地协作以及如何在复杂环境中导航,这些环境包括他们尝试建模的其他代理。 本质上,Jakob正在努力为RL特工提供一种心理理论。

Our conversation spanned a mix of fundamental and philosophical topics, but here were some of my favourite take-homes:

我们的对话涵盖了基本主题和哲学主题的混合,但以下是我最喜欢的一些内容:

  • When I asked Jakob what his fundamental definition of “learning” was, he answered in terms of sample complexity — the number of samples needed in order to train a machine learning model. The true goal of learning, he argues, is to “learn how to learn” — to find the algorithms and strategies that reduce sample complexity fastest. It’s in that sense that the evolutionary process that gave rise to human beings was a worse “learner” than the cognitive process that human beings use to understand the world and make predictions: whereas it takes millions of individuals’ deaths to allow a species’ genome to “learn” something, a human brain can do so with a minuscule number of data points (and sometimes, with none at all).

    当我问雅各布(Jakob)他对“学习”的基本定义是什么时,他回答了样本的复杂性-训练机器学习模型所需的样本数量。 他认为,学习的真正目标是“学习如何学习”,以找到可以最大程度降低样品复杂性的算法和策略。 从这个意义上讲,与人类用来理解世界并做出预测的认知过程相比,导致人类进化的进化过程是一个更糟糕的“学习者”:而要使一个物种的基因组死亡需要数百万个人的死亡为了“学习”某物,人脑可以使用少量的数据点(有时甚至根本没有)来做到这一点。
  • Jakob argues that RL agents can benefit from explicitly recognizing other agents not as parts of the environment over which they can’t have any control, but as, well, agents — complete with a thought process of their own. In a circumstance where all agents are identical, any given agent can construct a fairly accurate model of its fellow agents, and that model can serve as the basis for effective collaboration and coordination among groups of agents.

    雅各布(Jakob)认为,RL代理可以受益于明确地认识到其他代理,而不是他们无法控制的环境的一部分,而是代理,他们自己完成了自己的思考过程。 在所有座席都相同的情况下,任何给定的座席都可以构建其同伴的相当准确的模型,并且该模型可以用作座席组之间有效协作和协调的基础。
  • One of the challenges that arises when modeling agents in this way is communication. In order for agents to communicate, they have to develop a common language, but that’s not as easy as it may seem: one agent may develop a way of expressing itself, but if the other doesn’t happen to have developed the same exact method, communication will be fruitless. So one important constraint, Jakob suggests, is that agents need to learn language together — so that, if they decide to try to improve their communication method, they do so together to maintain their ability to understand one another.

    以这种方式对代理进行建模时出现的挑战之一是通信。 为了使座席进行交流,他们必须开发一种通用语言,但这并不像看起来那样容易:一个座席可以开发一种表达自己的方式,但是如果另一个座席没有恰好开发出相同的方法,沟通将无济于事。 因此,雅各布(Jakob)建议,一个重要的限制条件是,座席需要一起学习语言,这样,如果他们决定尝试改善其沟通方式,他们就会一起学习以保持彼此理解的能力。
  • When I asked Jakob whether he believes that the conceptual tools that we already have — like deep learning and reinforcement learning — are going to be sufficient to build a fully general artificial intelligence, he said no. In fact, he’s among the more pessimistic guests I’ve had when it comes to this question: in his view, that kind of development could be a century away.

    当我问雅各布(Jakob)他是否相信我们已经拥有的概念工具(如深度学习和强化学习)足以构成一个完全通用的人工智能时,他拒绝了。 实际上,在我遇到这个问题时,他是我最悲观的客人之一:在他看来,这种发展可能还需要一个世纪的时间。

You can also follow Jakob on Twitter here, and me here.

您也可以按照雅各布在Twitter这里 ,和我在这里

翻译自: https://towardsdatascience.com/multi-agent-reinforcement-learning-and-the-future-of-ai-524fc1b5e25

人工智能的未来是强化学习

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值