[论文阅读] Day 2 Towards General and Autonomous Learning of Core Skills: A Case Study in Locomotion-CSDN博客

本文链接：https://blog.csdn.net/Fourier_Legend/article/details/108559412

题目:Towards General and Autonomous Learning of Core
Skills: A Case Study in Locomotion
作者: Roland Hafner ; Tim Hertweck 等
单位: Google Deepmind
文章链接: [链接]
视频链接: [Video - Youtube]

摘要

现代强化学习（RL）算法有望直接从原始的传感器输入来解决棘手的电机控制问题。由于RL 可以代表一类通用的方法，即使在对于人类而言困难或昂贵的情况下，它们也可以通过合理设置的奖励和最少的先验知识来学习并获得解决方案。为了使RL真正实现这样的，我们需要能够以最小的针对问题的调整或工程来解决广泛问题的算法和学习设置。

在本文中，我们研究了运动领域中的通用性这一思想。我们开发了一个学习框架，可以学习各种有腿机器人的复杂运动行为，例如两足动物，绊倒动物，四足动物和六足动物，包括轮式变体。我们的学习框架依赖于数据有效的，off-policy 多任务RL算法和少量的奖励函数，这些奖励函数在机器人之间在语义上是相同的。

为了强调该方法的一般适用性，我们在整个实验过程中保持超参数设置和奖励定义不变，并且仅依赖于机载感应。对于包括现实世界中的四足机器人在内的九种不同类型的机器人，我们证明了相同的算法可以快速学习各种可重复使用的运动技能，而无需任何特定于平台的调整或学习设置的附加工具。

Modern Reinforcement Learning (RL) algorithms promise to solve difficult motor control problems directly from raw sensory inputs. Their attraction is due in part to the fact that they can represent a general class of methods that allow to learn a solution with a reasonably set reward and minimal prior knowledge, even in situations where it is difficult or expensive for a human expert. For RL to truly make good on this promise, however, we need algorithms and learning setups that can work across a broad range of problems with minimal problem specific adjustments or engineering.
In this paper, we study this idea of generality in the locomotion domain. We develop a learning framework that can learn sophisticated locomotion behavior for a wide spectrum of legged robots, such as bipeds, tripeds, quadrupeds and hexapods, including wheeled variants. Our learning framework relies on a data-efficient, off-policy multi-task RL algorithm and a small set of reward functions that are semantically identical across robots.
To underline the general applicability of the method, we keep the hyper-parameter settings and reward definitions constant across experiments and rely exclusively on on-board sensing. For nine different types of robots, including a real-world quadruped robot, we demonstrate that the same algorithm can rapidly learn diverse and reusable locomotion skills without any platform specific adjustments or additional instrumentation of the learning setup.

Result

论文的目标是希望构建一个通用的学习框架,能够应用在多种造型的机器人身上
基于 MujuCo 的仿真环境构造的单元, 希望能够反映出该构造的机器人机械和动态的特性, 一共构造了七种机器类型, 并在由 HEBI Robotic 提供的两种类型的真机上验证了算法.
核心之一是对于关键性的基础动作设置合理的奖励, 如 : StandUpright, TurnLeft, TurnRight, WalkForward, WalkBackward, WalkLeft , WalkRight, Lift Foot.
基于 SAC-X 的多任务同时训练