Deep Surrogate Assisted Generation of Environments（NIPS 2022）-CSDN博客

本文链接：https://blog.csdn.net/m0_57455568/article/details/133943993

Deep Surrogate Assisted Generation of Environments

深度代理辅助环境生成
演化算法

要解决什么问题

多样性和质量： QD优化旨在同时优化多样性和质量。这意味着不仅要寻找高质量的解决方案，还要确保这些解决方案在问题空间中具有多样性，以覆盖不同类型的解。【通俗解释就是说想生成多样的关卡，质量的话，暂时理解为生成的关卡是符合约束并且可通过的】

Abstract

Recent progress in reinforcement learning (RL) has started producing generally capable agents that can solve a distribution of complex environments. These agents are typically tested on fixed, human-authored environments. On the other hand,quality diversity (QD) optimization has been proven to be an effective component of environment generation algorithms, which can generate collections of high-quality environments that are diverse in the resulting agent behaviors. However, these algorithms require potentially expensive simulations of agents on newly generated environments. We propose Deep Surrogate Assisted Generation of Environments(DSAGE), a sample-efficient QD environment generation algorithm that maintains a deep surrogate model for predicting agent behaviors in new environments. Results in two benchmark domains show that DSAGE significantly outperforms existing QD environment generation algorithms in discovering collections of environments that elicit diverse behaviors of a state-of-the-art RL agent and a planning agent.Our source code and videos are available at https://dsagepaper.github.io/
主要针对的问题是现有的QD环境生成算法最后都需要智能体对生成的每个环境进行模拟仿真（也就是每个关卡都需要智能体跑一遍来验证他的可通关性以及根据智能体的轨迹来判断生成的关卡的多样性），这种方式比较昂贵。作者针对这个问题提出了一种依靠一个预测模型来预测智能体轨迹的方式，从而直接通关预测得到行为轨迹，不需要拿智能体再真的跑一遍。作者衡量多样性的方式就是看智能体在生成的关卡里的轨迹的多样性。

Introduction

We make the following contributions: (1) We propose the use of deep surrogate models to predict
agent performance in new environments. Our algorithm, Deep Surrogate Assisted Generation of
Environments (DSAGE) (Fig. 1), integrates deep surrogate models into quality diversity optimization
to efficiently generate diverse environments. (2) We show in two benchmark domains from previous
work, a Maze domain [3, 4] with a trained ACCEL agent [4] and a Mario domain [21, 16] with an
A* agent [22], that DSAGE outperforms state-of-the-art QD algorithms in discovering diverse agent
behaviors. (3) We show with ablation studies that training the surrogate model with ancillary agent
behavior data and downsampling a subset of solutions from the surrogate archive results in substantial
improvements in performance, compared to the surrogate models of previous work [19].

Problem Definition

QD for environment generation. We assume a single agent acting in an environment parameterized
by θ ∈ Rn. The environment parameters can be locations of different objects or latent variables that
are passed as inputs to a generative model [27].2 A QD algorithm generates new solutions θ and
evaluates them by simulating the agent on the environment parameterized by θ. The evaluation returns
an objective value f and measure values m. The QD algorithm attempts to generate environments
that maximize f but are diverse with respect to the measures m.

这段文本描述了如何使用Quality Diversity（QD）算法来生成环境。这里有一个单一代理（agent），该代理在一个由参数θ参数化的环境中执行。环境参数θ可以代表不同对象的位置或传递给生成模型的潜在变量。环境参数θ可以控制环境的不同特征，如游戏中的关卡布局或物体的位置。
QD算法的工作流程如下：

生成新的环境参数 θ：算法会生成新的环境参数 θ，这些参数用于描述不同类型的环境。这可能包括在游戏中创建不同的关卡布局，或者调整环境中不同对象的位置。
评估生成的环境：生成的环境参数 θ 用于模拟代理在相应环境中的行为。代理在每个环境中执行，并返回一个目标值f以及一些额外的测量值m。这些值描述了代理在不同环境中的表现。
多样性和质量评估： QD算法的目标是生成多样性的环境，其中多个环境具有不同的特点，但仍然具有高质量。质量通常由目标值f来衡量，而多样性由测量值m来衡量。多样性测量值m可以包括关于环境不同特征的信息，以确保生成的环境不仅在质量上优秀，还在特征上多样化。
迭代生成：算法将不断生成新的环境参数 θ，模拟代理的行为，并评估它们的多样性和质量。重复这个过程，以尝试找到一组环境参数，它们在多样性和质量上都表现出色。
这里的f代表质量，好像是用用来预测关卡通过路径的模型预测出来的行为和真实行为之间的不同。（不是很理解）
m代表多样性，应该指的是训练好的智能体在关卡上的行为和之前关卡行为的对比

Background

in the procedural content generation (PCG) field [46], an environment generator produces video game levels that result in player enjoyment. Since diversity of player experience and game mechanics is valued in games, many level generation systems incorporate QD optimization [47, 16, 48–52].

Environment generation methods have also been proposed by the scenario generation community in
robotics. Early work explored automated methods for generating road layouts, vehicle arrangements,
and vehicle behaviors for testing autonomous vehicles [66–72]. Outside of autonomous vehicles,
prior work [73] evaluates robot motion planning algorithms by generating environments that target
specific motion planning behaviors. In human-robot interaction, QD algorithms have been applied as
environment generators to find failures in shared autonomy systems [17] and human-aware planners
tested in the collaborative Overcooked domain [15]

Model

在这里插入图片描述

Alogrithm

训练一个deep surrogate model来预测fixed agent（eg:Mario里面的A*智能体）在生成的新环境上的表现；

参考文献

Illuminating mario scenes in the latent space of a generative adversarial network：提出QD环境问题
A. Gaier, A. Asteroth, and J.-B. Mouret, “Data-efficient design exploration through surrogate-assisted
illumination,” Evolutionary Computation, 2018 ：提升环境检测效率