
RL
lh15123as
这个作者很懒,什么都没留下…
展开
-
Chapter 4. Dynamic Programming
@[TOC]强化学习Chapter 4. Dynamic Programming(4.1)v(s)=maxaE[Rt+1+γv(St+1)∣St=s,At=a] =maxa∑s′,rp(s′,r∣s,a)[r+γv∗(s′)] \begin{aligned} v_(s)& = \max a\mathbb{E}[R{t+1}+\gamma v_(S_{t+1}) |...原创 2019-09-03 11:59:24 · 223 阅读 · 0 评论 -
Chapter 5. Monte Carlo Methods
文章目录5.1 Monte Carlo Prediction5.2 Monte Carlo Estimation of Action Values5.3 Monte Carlo Control5.4 Monte Carlo Control without Exploring Starts5.5 Off-policy Prediction via Importance Sampling5.6 Inc...原创 2019-09-04 23:56:08 · 211 阅读 · 0 评论 -
Chapter 6. Temporal-Difference Learning
文章目录6.1 TD Prediction6.1 TD Prediction原创 2019-09-05 14:25:50 · 139 阅读 · 0 评论 -
Chapter 7. n-step Bootstrapping
文章目录7.1 n-step TD Prediction7.2 n-step Sarsa7.3 n-step Off-policy Learning by Importance Sampling7.1 n-step TD Prediction输入:策略 :π\piπ算法参数:步长 α∈(0,1]\alpha \in (0,1]α∈(0,1],正整数 nnn对 s∈Ss \in \math...原创 2019-09-05 17:00:32 · 231 阅读 · 0 评论