RL
lh15123as
这个作者很懒,什么都没留下…
展开
-
Chapter 4. Dynamic Programming
@[TOC]强化学习 Chapter 4. Dynamic Programming (4.1)v(s)=maxaE[Rt+1+γv(St+1)∣St=s,At=a] =maxa∑s′,rp(s′,r∣s,a)[r+γv∗(s′)] \begin{aligned} v_(s)& = \max a\mathbb{E}[R{t+1}+\gamma v_(S_{t+1}) |...原创 2019-09-03 11:59:24 · 207 阅读 · 0 评论 -
Chapter 5. Monte Carlo Methods
文章目录5.1 Monte Carlo Prediction5.2 Monte Carlo Estimation of Action Values5.3 Monte Carlo Control5.4 Monte Carlo Control without Exploring Starts5.5 Off-policy Prediction via Importance Sampling5.6 Inc...原创 2019-09-04 23:56:08 · 199 阅读 · 0 评论 -
Chapter 6. Temporal-Difference Learning
文章目录6.1 TD Prediction 6.1 TD Prediction原创 2019-09-05 14:25:50 · 120 阅读 · 0 评论 -
Chapter 7. n-step Bootstrapping
文章目录7.1 n-step TD Prediction7.2 n-step Sarsa7.3 n-step Off-policy Learning by Importance Sampling 7.1 n-step TD Prediction 输入:策略 :π\piπ 算法参数:步长 α∈(0,1]\alpha \in (0,1]α∈(0,1],正整数 nnn 对 s∈Ss \in \math...原创 2019-09-05 17:00:32 · 223 阅读 · 0 评论