RL
文章平均质量分 84
Hazekiah
这个作者很懒,什么都没留下…
展开
-
[OpenAI SpinningUp] Key Concepts and Terminology
Key Concepts and Terminologyoverviewagent, environment, reward, returnstates and observationsstateob.representationsdifferencein practiceaction spacesdiscretecontinuousrepresentationswhy ...原创 2018-11-21 14:30:42 · 190 阅读 · 0 评论 -
Robust Adversarial Reinforcement Learning
motivationCurrent RL methods fail to generalize due to two issues:test generalizationdata is scarce especially in the sense of real-world data. So RL models often overfit to the training scenarios...原创 2018-11-24 02:46:37 · 1123 阅读 · 0 评论 -
One-Shot Visual Imitation Learning via Meta-Learning
IntroductionThe goal of this work is to enable a robotic generalist to only learn from very few demonstrations, which may even be raw videos. This problem setting instantly brings us into the setting...原创 2018-11-25 20:14:07 · 869 阅读 · 3 评论 -
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
motivationModel-based approaches enjoys 1) sample efficiency (meaning they learn quickly), 2) and a reward-independent dynamics model (thinking of model-free approaches require the reward function to...原创 2018-11-24 17:27:02 · 668 阅读 · 0 评论 -
[cs294-112 notes] lecture 6 actor-critic
p4recapping policy gradients.the gradient is computed on a sampling estimate of the original objective. The estimate is averaged across n trajectories and each T time steps.‘reward to go’ is the su...原创 2018-12-12 16:09:32 · 171 阅读 · 0 评论 -
One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning
IntroductionThe goal is to : enable a robot to learn from one raw video of human demonstrations on a new task, with the help of the prior knowledge of some old tasks, where both human demonstrations ...原创 2018-12-11 00:43:25 · 462 阅读 · 0 评论 -
Learning to Adapt: Meta-Learning for Model-Based Control
sudden changes in environment cause failureif encounter pertub in past experience, can in pri. learn to adaptstudy model-based online adaptationsample efficient than model-freealleviate a challeng...原创 2018-12-11 01:02:25 · 648 阅读 · 0 评论 -
[CS294-112] model-based RL
Control and PlanningOpen-loop Trajectory optimization methodsassumptions: a (learned) dynamics model in handobjective: find the optimal action sequence that maximizes the expected return of the tra...原创 2018-12-28 18:19:54 · 1303 阅读 · 1 评论