- 博客(4)
- 资源 (1)
- 收藏
- 关注
原创 Chapter 7. n-step Bootstrapping
文章目录7.1 n-step TD Prediction7.2 n-step Sarsa7.3 n-step Off-policy Learning by Importance Sampling7.1 n-step TD Prediction输入:策略 :π\piπ算法参数:步长 α∈(0,1]\alpha \in (0,1]α∈(0,1],正整数 nnn对 s∈Ss \in \math...
2019-09-05 17:00:32 212
原创 Chapter 6. Temporal-Difference Learning
文章目录6.1 TD Prediction6.1 TD Prediction
2019-09-05 14:25:50 109
原创 Chapter 5. Monte Carlo Methods
文章目录5.1 Monte Carlo Prediction5.2 Monte Carlo Estimation of Action Values5.3 Monte Carlo Control5.4 Monte Carlo Control without Exploring Starts5.5 Off-policy Prediction via Importance Sampling5.6 Inc...
2019-09-04 23:56:08 189
原创 Chapter 4. Dynamic Programming
@[TOC]强化学习Chapter 4. Dynamic Programming(4.1)v(s)=maxaE[Rt+1+γv(St+1)∣St=s,At=a] =maxa∑s′,rp(s′,r∣s,a)[r+γv∗(s′)] \begin{aligned} v_(s)& = \max a\mathbb{E}[R{t+1}+\gamma v_(S_{t+1}) |...
2019-09-03 11:59:24 195
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人