文献笔记 - Learning to Fly—a Gym Environment with PyBullet Physics forReinforcement Learning of Multi-a-CSDN博客

本文链接：https://blog.csdn.net/zhuidushi4406/article/details/142371369

这篇博文是自己看文章顺手做的笔记只是简单翻译和整理仅做个人参考学习和分享

如果作者看到觉得内容不妥请联系我我会及时处理

本人非文章作者，文献的引用格式如下，原文更有价值

Panerati, Jacopo & Zheng, Hehui & Zhou, Siqi & Xu, James & Prorok, Amanda & Schoellig, Angela. (2021). Learning to Fly—a Gym Environment with PyBullet Physics for Reinforcement Learning of Multi-agent Quadcopter Control. 7512-7519. 10.1109/IROS51168.2021.9635857.

摘要

机器人仿真对于学术，教育以及安全应用中起到很关键的作用。

强化学习环境（environments）——通过简单的仿真把问题以奖励函数的形式体现，对于学习算法也很关键。

但是，全套的仿真full-scale simulators中通常缺乏可移植性和并行性。反之亦然，许多强化学习的仿真环境中权衡了真实性和高的样本吞吐量，弄得像玩具一样。

虽然公开数据集有助于升读学习和计算视觉，我们还是缺乏能同步的开发和公平的比较控制算法和强化学习方法的软件工具。

本文中，提出了一个开源的类似OpenAI Gym 的，用于仿真多个四旋翼的，基于Bullet physics引擎的，强化学习环境。

它的多智能体和基于视觉的强化学习界面，以及对真实碰撞和气动效应的支持，使得它成为了据我们所知，第一个这样的。

我们通过一些引用展示了它的引用，包括用于控制（PID的轨迹跟踪，在带有下洗气流影响下的多机飞行）或用于强化学习（单和多智能体稳定任务），希望启发将来的结合控制论和机器学习的future research。

I.INTRODUCTION

机器学习，特别是深度学习，发展的很好。

学习和机器开发都需要仿真软件，很多基于ROS插件比如Gazebo和Webots。但是它们缺乏便携性，不满足机器学习的workflow，需要远程，高度并行，计算集群执行。

为满足RL的研究，过去几年RL environments发展很快，OpenAI’s Gym 很好。但是有些场景权衡了真实性和高数据吞吐，但是对调真实性的仿真环境，可能不利于机器人强化学习的进展。

本文想提供一个开源的类似Gym的仿真环境，支持多个学习任务（多智能体RL，基于视觉的RL，等）的，一个实际的机器人应用：一个或多个微型四旋翼的控制。

本文提供的软件, gym-pybullet-drones，可以帮助roboticists和ML engineers来开发model free或model-based的端到端四旋翼RL控制。其主要特性包括：

1) Realism: support for realistic collisions, aerodynamics effects, and extensible dynamics via Bullet Physics [10].真实：通过Bullet Physics支持逼真的碰撞、空气动力学效果和可扩展的动力学[10]。

2) RL Flexibility: availability of Gym-style environments for both vision-based RL and multi-agent RL— simultaneously, if desired.

3) Parallelizability: multiple environments can be easily executed, with a GUI or headless, with or without a GPU, with minimal installation requirements.

4) Ease-of-use: pre-implemented PID control, as well as Stable Baselines3 [11] and RLlib workflows [12].

II. RELATED WORK

A. Reinforcement Learning Environments

The OpenAI Gym toolkit [5]

MuJoCo physics engine [13].

DeepMind’s dm_control [14].

然而，由于接触的平滑和其他简化，即使在这些环境中成功训练的运动策略也不一定表现出容易转移到物理机器人的步态[3]。

open-source alternatives such as Georgia Tech/CMU’s DART and Google’s Bullet Physics [10] (with its Python binding, PyBullet).

One of the most popular Gym environment for quadcopters is gymfc [18].

To the best of our knowledge, gym-pybullet-drones is the first general purpose multi-agent Gym environment for quadcopters.

B. Quadcopter Simulators

RotorS [20] is a popular quadcopter simulator based on ROS and Gazebo.

有许多AscTec的多旋翼模型和传感器的仿真，但是不能直接用于RL而且依赖Gazebo使其很难在学习的应用中并行运作。

CrazyS [9] is an extension of RotorS that is specifically targeted to the Bitcraze Crazyflie 2.x nanoquadcopter.

和前者有同样问题。

Microsoft’s AirSim [8] is one of the best known simulators supporting multiple vehicles—car and quadcopters— and photorealistic rendering through Unreal Engine 4.

适合开发自动驾驶，计算要求高并且简化了碰撞（四旋翼模型用了FastPhysicsEngine）让它不太适合学习控制。而且缺乏Gym interface。

The most recent and closely related work to ours is ETH’s Unity-based Flightmare [7].

可以同步得到逼真画面和快的高度并行的动力学。还提供GymAPI ，包括了RL workflow。

但是不如我们的gym-pybullet-drones的是，它不能提供Gym里的视觉观察也不能用于多智能体学习。

III. METHODS

A. Gym Environment Classes

step observation reward

B. Bullet Physics

The work in this paper is based on the open-source Bullet Physics engine [10].

C. Quadcopter Dynamics

We use PyBullet to model the forces and torques acting on each quadcopter in our Gym and leverage the physics engine to compute and update the kinematics of all vehicles.

1) Quadcopter models:

The default quadcopter model in gym-pybullet-drones is the Bitcraze Crazyflie 2.x.

2) PyBullet-based Physics Update:

The forces Fi’s applied to each of the 4 motors and the torque T induced around the drone’s z-axis are proportional to the squared motor speeds Pi’s in RPMs. These are linearly related to the input PWMs and we assume we can control them nearinstantaneously [25]:

a) Explicit Python Dynamics Update:

写了一个not based on Bullet的算dynamic的式子用于debug

3) Aerodynamic Effects:

（1）式没考虑地效和下洗干扰，这里分别单独算，然后PyBullet支持一起用这些模型

a) Drag:

我们的建模基于[23]，它指出空气阻力与四轴飞行器速度˙x、转子角速度和[23]中实验得出的系数矩阵kD成正比：

b) Ground Effect:

接近地面时推力增加，由于旋翼气流和地面之间的干涉引起，这叫地效

Based on [26] and real-world experiments with Crazyflie hardware (see Figure 4), we model contributions Gi’s for each motor that are proportional to the propellers’ radius rP , speeds Pi’s, altitudes hi’s, and a constant kG:

c) Downwash:

When two quadcopters cross paths at different altitudes, the downwash effect causes a reduction in the lift of the bottom one.

For simplicity, we model it as a single contribution applied to the center of mass of the quadcopter whose module W depends on the distances in x, y, and z between the two vehicles (δx, δy, δz) and constants kD1 , kD2 , kD3 that we identified experimentally:

D. Observation Spaces

代码里有个词典装了每个无人机的信息

In our code base, we provide several implementations.

Yet, they all include the following kinematic information: a dictionary whose keys are drone indices n ∈ [0..N] and values contain positions

1) Adjacency Matrix of Multi-Robot Systems:

提供了临近无人机的接邻矩阵

2) Vision and Rendering:

可以调用无人机的相机输出

E. Action Spaces

提供多种实施方法可以控制电机转速或者期望机体速度

1) Propellers’ RPMs:

The default action space of gym-pybullet-drones is a dictionary whose keys are the drone indices n ∈ [0..N] and the values contain the corresponding 4 motor speeds, in RPMs, for each drone:

2) Desired Velocity Input:

Alternatively, drones can be controlled through a dictionary of desired velocity vectors, in the following format:

3) Other Control Modes:

The inputs of class DynAviary are the desired thrust and torques—from which it derives feasible RPMs using non-negative least squares.

F. Learning Workflows

Having understood how gym-pybullet-drones’s dynamics and observations/actions spaces work, using it in an RL workflow only requires a few more steps.

1) Reward Functions and Episode Termination:

Reward functions are very much task-dependent and one must be implemented.

2) Stable Baselines3 Workflow:

We provide a complete training workflow for single agent RL based on Stable Baselines3 [11]. This is a collection of RL algorithms— including A2C, DDPG, PPO, SAC, and TD3—implemented in PyTorch.

3) RLlib Workflow:

We also provide an example training workflow for multi-agent RL based on RLlib [12]. RLlib is a library built on top of Ray’s API for distributed applications, which includes TensorFlow and PyTorch implementations of many popular RL (e.g., PPO, DDPG, DQN) and MARL (e.g., MADDPG, QMIX) algorithms.