Reinforcement Learning
文章平均质量分 85
止于至玄
愿无岁月可回首
展开
-
Actor - Critic Algorithms: A Brief Note
Actor - CriticA class of algorithms that precede Q-Learning and SARSA are actor - critic methods.原创 2017-05-02 21:13:13 · 1552 阅读 · 0 评论 -
漫谈引导策略搜索 - A Review of Guided Policy Search (GPS)
引导策略搜索方法(Guided Policy Search)最早见于2015年 Sergey Levine 的博士论文 Levine S, “Motor skill learning with local trajectory methods,” PhD thesis, Stanford University, 2014.GPS将策略搜索分为两步:控制相和监督相。这样有什么好处呢?这...原创 2018-03-15 23:41:15 · 5126 阅读 · 4 评论 -
浅谈强化学习中的函数估计问题 - Function Approximation in RL
下面我们简单讨论下强化学习中的函数估计问题,这里对于强化学习的基本原理、常见算法以及凸优化的数学基础不作讨论。假设你对强化学习(Reinforcement Learning)有最基本的了解。概述价值函数估计增量式/梯度下降方法批处理方法深度强化学习浅析(DQN)Double DQN带有优先回放的Double DQN( Prioritized Replay )Dueling D...原创 2018-03-10 21:41:44 · 6088 阅读 · 0 评论 -
漫谈基于模型的强化学习方法 PILCO - Probabilistic Inference for Learning Control
基于模型的强化学习方法最大的问题是模型误差。针对此类问题,业界提出了 PILCO (Probabilistic Inference for Learning Control)算法。它把模型误差纳入考虑的范围。它解决模型偏差的方法不是集中于一个单独的动力学模型,而是建立了概率动力学模型,即动力学模型上的分布。也就是说,PILCO建立的模型并不是具体的某个确定性函数,而是建立一个可以描述一切可行模型(...原创 2018-03-29 21:01:25 · 8042 阅读 · 7 评论 -
TRPO 简述 - A Brief Introduction to Trust Region Policy Optimization
本文我们来简单介绍下一种强化学习方法——TRPO (Trust Region Policy Optimization),中文名称是“置信域策略优化”。该方法由伯克利博士生 John Schulman 提出。TRPO 是策略搜索方法中的一类随机策略搜索方法,它正面解决了梯度更新步长选择的问题,给出了一种单调的策略改善方法。本文仅简要论述其原理,更多细节请参考: Schulman J., ...原创 2018-03-14 14:22:06 · 3594 阅读 · 0 评论 -
漫谈逆向强化学习 - A Brief Review to Inverse Reinforcement Learning
下面我们来探讨下逆向强行学习的基本原理和典型方法,我们假设您已经对强化学习和凸优化的基本原理有一定的了解。概述基于最大边际的逆向强化学习学徒学习最大边际规划(MMP)基于结构化分类的方法神经逆向强化学习基于最大熵的逆向强化学习基于最大信息熵的逆向强化学习基于相对熵的逆向强化学习深度逆向强化学习概述我们先介绍下逆向强化学习的概念预分类:什么是...原创 2018-04-01 15:17:10 · 27555 阅读 · 2 评论 -
在Windows下使用OpenAI Gym - HelloGym
OS: Win10 x64 IDE: Visual Studio 2017 Community Python: Anaconda3 (v5.0.0 python3.6 x64)下面我们介绍下如何在Windows系统中使用OpenAI Gym,Linux系统中的用法可参考此文。我们从安装开始。安装OpenAI Gym一个简单的例子安装OpenAI Gym在Win...原创 2018-04-04 18:59:09 · 12132 阅读 · 2 评论 -
重要性采样简述
重要性采样(Importance Sampling)是统计学中的一种采样方法。它主要用于一些难以直接采样的数据分布上。假设有一个很复杂的概率密度函数 p(x)p(x)p(x) ,求解随机变量基于此概率下的某个函数期望,即 Ex∼p(x)[f(x)]Ex∼p(x)[f(x)] E_{x\sim p(x)}[f(x)] 如果采用解析法: Ex∼p(x)[f(x)]=∫xp(x)f(x)dxEx...转载 2018-06-07 14:12:28 · 5708 阅读 · 0 评论 -
深度强化学习与 Deep Q-Learning(DQN)
本文中我们来探讨一下深度强化学习(以DQN为主)的原理与实例。这里假设读者对强化学习的基本原理与神经网络的基本知识已经有了一定的了解。初探深度强化学习Deep Q-LearningDouble DQN带有优先回放的Double DQN( Prioritized Replay )Dueling DQN初探深度强化学习Deep Q-Learning这里介绍的 D...转载 2018-07-15 21:50:53 · 21516 阅读 · 0 评论 -
浅谈强化学习中的策略梯度算法
本文将主要介绍基于策略梯度的强化学习算法。这里我们假设读者对强化学习的基本原理有一定了解。策略梯度法REINFORCEActor - Critic使用梯度进行估计或优化的方法可见于诸多领域,如凸优化和机器学习领域。在强化学习中,我们可以使用梯度来估计某个策略的价值函数或者直接估计策略。本文中我们仅讨论后一种情况。策略梯度法REINFORCEActor ...原创 2018-07-22 22:15:19 · 4052 阅读 · 2 评论 -
强化学习中的有限马尔可夫决策过程 Finite Markov Decision Processes in RL
Thanks Richard S. Sutton and Andrew G. Barto for their great work of Reinforcement Learning: An Introduction - 2nd Edition.Here we summarize some basic notions and formulations in most reinforcement...转载 2017-12-28 09:51:03 · 778 阅读 · 0 评论 -
A Brief Note about Action Exploration Strategies
Softmax Exploration StrategyUpper-Confidence-Bound Action Selection转载 2017-04-28 15:12:27 · 337 阅读 · 0 评论 -
Continuous Multi-Step TD, Eligibility Traces and TD(λ): A brief note
Thanks Richard S. Sutton and Andrew G. Barto for their great work in Reinforcement Learning: An Introduction.We focus on episodic case only and deal with continuous state and action spaces. Suppose ...转载 2017-05-19 10:27:16 · 860 阅读 · 0 评论 -
Sarsa(λ) and Q(λ) in Tabular Case
Eligibility Traces in Prediction ProblemsIn the backward view of TD(λ)TD(\lambda), there is a memory variable associated with each state, its eligibility trace...转载 2017-05-23 23:50:02 · 2012 阅读 · 0 评论 -
Reinforcement Learning in Continuous State and Action Spaces: A Brief Note
The problems of sequential decision making in continuous domains with delayed reward signals, the main purpose for the algorithms is to...转载 2017-05-15 09:46:13 · 1080 阅读 · 0 评论 -
Play with OpenAI Gym in Ubuntu 16.04: Hello World
OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms.转载 2017-07-12 20:13:39 · 920 阅读 · 0 评论 -
Typical Policy Representation in Policy Search Methods
Thanks Jan Peters et al for their great work of A Survey on Policy Search for Robotics. Policy representation may be categorized into time-independent representation π(x)\pi(x) and time-dependent...转载 2017-08-17 23:46:54 · 284 阅读 · 0 评论 -
Importance Sampling in Reinforcement Learning - An overview
Thanks Sutton and Barto for their great work of Reinforcement Learning: An Introduction.Almost all off-policy reinforcement learning methods utilize importance sampling, a general technique for...转载 2017-07-25 14:52:35 · 1829 阅读 · 0 评论 -
Typical Exploration Strategies in Model-free Policy Search
The exploration strategy is used to generate new trajectory samples τ[i]. All exploration strategies in model-free policy search are local and use stochastic policies to implement exploration....原创 2017-08-18 17:13:44 · 306 阅读 · 0 评论 -
Typical Policy Evaluation Strategies in Model-free Policy Search
Thanks J. Peters et al for their great work of A Survey for Policy Search in Robotics.Policy evaluation strategies are used to assess the quality of the executed policy. They may be used to transform...原创 2017-08-19 00:13:35 · 209 阅读 · 0 评论 -
A Policy Update Strategy in Model-free Policy Search: Policy Gradient
Now let’s discurss different ways of policy update used in policy search. Typical policy update methods of model-free policy consist of policy gradent methods, expectation-maximization-based methods, ...转载 2017-08-22 14:54:04 · 912 阅读 · 0 评论 -
A Policy Update Strategy in Model-free Policy Search: Expectation-Maximization
Policy gradient methods require the user to specify the learning rate which can be problematic and often results in an unstable learning process or slow convergence. By formualting policy search as an inference转载 2017-08-25 09:56:28 · 421 阅读 · 0 评论 -
马尔可夫过程简述 - A Brief Tutorial of Markov Process
本文主要介绍一下马尔可夫随机过程的概念及特性。原创 2017-10-24 10:26:48 · 4249 阅读 · 0 评论 -
Dynamic Movement Primitve - My Superficial Review
Let’s talk about the Dynamic Movement Primitive (DMP) for robots learning from demonstration. In this article, we make an assumption that you readers all have the background of control theory and robo...原创 2019-03-09 17:34:03 · 1083 阅读 · 1 评论