强化学习
wang2008start
这个作者很懒,什么都没留下…
展开
-
Reinforcement Learning_By David Silver笔记六: Value Function Approximation
Value Function Approximation原创 2017-12-11 17:06:08 · 287 阅读 · 0 评论 -
Reinforcement Learning_By David Silver笔记七: Policy Gradient Methods
Policy Gradient Methods原创 2017-12-11 17:06:57 · 402 阅读 · 0 评论 -
Reinforcement Learning_By David Silver笔记九: Exploration and Exploitation
Exploration and Exploitation原创 2017-12-11 17:08:20 · 338 阅读 · 0 评论 -
Reinforcement Learning_By David Silver笔记八: Integrating Learning and Planning
Integrating Learning and Planning原创 2017-12-11 17:07:40 · 224 阅读 · 0 评论 -
Reinforcement Learning_By David Silver笔记一: Introduction
IntroductionAgent and Environment,History and state, Agent state, Environment state, Information stat, Fully observable enviroments, Partially observable enviroments 环境完全可观测 环境部分可观测 Policy:原创 2017-12-11 15:52:20 · 278 阅读 · 0 评论 -
Reinforcement Learning_By David Silver笔记二: Markov Decision Processes
Markov Process Markov Reward Process Markov Decision Process (Markov reward process with decisions) a policy is a distribution over actions given states. GIven an MDP and policy,原创 2017-12-11 17:00:03 · 258 阅读 · 0 评论 -
Reinforcement Learning_By David Silver笔记三: Planning by Dynamic Programming
Policy Evaluation Policy Iteration Value Itera 2. Policy Iteration(Any optimal policy can be subdivided into two components:An optimal first action A,Followed by an optimal poli原创 2017-12-11 17:01:53 · 285 阅读 · 0 评论 -
Reinforcement Learning_By David Silver笔记四: Model Free Prediction
前面的动态规划主要用来解决model已知的MDP问题,这里主要解决model/环境未知时的MDP预估价值函数问题,方法主要有: MC方法:不需要知道转移矩阵或回报矩阵,在非马尔科夫环境中高效 时序差分方法:Monte-Carlo Learning直接从experience的episode中学习不需要MDP的transition、rewards主要思想:value = mean return原创 2017-12-11 17:03:36 · 197 阅读 · 0 评论 -
Reinforcement Learning_By David Silver笔记五: Model Free Control
(Optimise the value function of an unknown MDP)On-policy learning —— Learn about policy π from experience sampled from πOff-policy learning —— Learn about policy π from experience sampled from uOn-Po原创 2017-12-11 17:05:04 · 227 阅读 · 0 评论