WordCraft —

WordCraft is a framework for testing general knowledge of the world for RL agents. The environment is based on the video game Little Alchemy 2. WordCraft’s features lie in its lightness and in that it contains real-world semantics. Researchers have evaluated methods for teaching representations in the environment. In addition, they proposed a method for integrating knowledge graphs with an RL agent.

WordCraft是一个框架,用于测试RL代理的世界常识。 该环境基于视频游戏Little Alchemy2。WordCraft的功能在于它的轻巧和包含真实世界的语义。 研究人员已经评估了在环境中表达表征的方法。 此外,他们提出了一种将知识图与RL代理集成的方法。

问题描述 (Description of the problem)

The ability to quickly solve a wide range of real-world problems requires a sound understanding of the world. However, the problem of integrating a natural language corpus with RL agents remains relevant. This is partly due to the lack of lightweight simulation environments that adequately reflect real-world semantics. Researchers created WordCraft to make it easier for agents to train general knowledge about the world.

快速解决各种现实问题的能力需要对世界有深刻的了解。 但是,将自然语言语料库与RL代理集成在一起的问题仍然很重要。 这部分是由于缺少能够充分反映现实世界语义的轻量级模拟环境。 研究人员创建了WordCraft,以便代理商更轻松地训练有关世界的常识。

Image for post
Arxiv Arxiv

环境如何运作 (How the environment works)

The RL environment WordCraft is based on the video game Little Alchemy 2. Little Alchemy 2 is a simple associative game. The player receives an initial set of four items. From this starter kit, the player must collect as many new items as possible. Each new item can be created by combining two other items. For example, combining “moon” and “butterfly” gives “mole”. There are 700 items in total and 3,417 possible item combinations.

RL环境WordCraft基于视频游戏Little Alchemy 2 。 Little Alchemy 2是一个简单的关联游戏。 玩家会收到一组初始的四个项目。 玩家必须从该入门工具包中收集尽可能多的新物品。 可以通过组合其他两个项目来创建每个新项目。 例如,将“ moon”“ butterfly”组合在一起可得到“ mole” 。 总共有700个项目,并且可能有3,417个项目组合。

Solving a game without trying all the possible combinations requires knowing the relationships between general concepts. WordCraft is a simplified version of Little Alchemy 2:

在不尝试所有可能组合的情况下解决游戏需要了解一般概念之间的关系。 WordCraft是Little Alchemy 2的简化版本:

  1. The interface is textual, not graphical, as in Little Alchemy 2;

    界面是文本的,而不是图形的, 就像Little Alchemy 2一样

  2. Instead of one open task, WordCraft has many simple tasks. Each task is created through random sampling of the target item, compound items, and distractions

    WordCraft代替了一项未完成的任务,而拥有许多简单的任务。 每个任务都是通过对目标项目,复合项目和分散注意力进行随机抽样创建的

The agent’s job is to choose which items the target item consists of. The difficulty of the problem is regulated by two indicators:

代理的工作是选择目标物料所包含的物料。 问题的难度由两个指标来调节:

  • Number of distractions

    分心数
  • The number of intermediate items that must be crafted before the target item can be crafted

    可以制作目标物品之前必须制作的中间物品的数量

在环境中测试算法 (Testing algorithms in the environment)

Researchers tested algorithms for learning representations in an environment on a zero-shot generalization problem. The set of recipes was divided into training (80%) and test (20%) samples. The TorchBeast implementation of IMPALA was used as a model.

研究人员测试了一种用于在零镜头泛化问题上学习环境中表示形式的算法。 食谱集分为训练样本( 80% )和测试样本( 20% )。 IMPALA的TorchBeast实现被用作模型。

Image for post
Arxiv Arxiv

翻译自: https://medium.com/deep-learning-digest/wordcraft-reinforcement-learning-environment-for-common-sense-testing-7066242bb1ae

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值