分布式强化学习基础概念（Distributional RL ）

最新推荐文章于 2024-06-19 09:40:20 发布

a1424262219

最新推荐文章于 2024-06-19 09:40:20 发布

阅读量1.2k

点赞数

文章标签：人工智能

原文链接：http://www.cnblogs.com/wangxiaocvpr/p/8283718.html

版权

分布式强化学习基础概念（Distributional RL）

1. Q-learning

在 Q-learning 中，我们想要优化如下的 loss：

Distributional RL 的主要思想是：to work directly with the full distribution of the return rather than with its expectation.

假设随机变量 Z(s, a) 是获得的回报（return），那么：Q(s, a) = E(Z(s, a)) ; 并非像公式（1）中所要最小化的误差那样，也就是期望的距离。

我们可以直接最小化这两个分布之间的距离，which is a distance between full distribution：

其中，R(s, a) 是即刻奖赏的随机变量，sup 是函数值的上界的意思，英文解释为：supremum。并且：

注意的是，我们依然用的是 Q(s, a)，但是，此处我们尝试优化 distributions，而不是这些分布的期望。

2. Policy Evaluation：

Reference Paper：

1. https://arxiv.org/pdf/1707.06887.pdf

2. https://arxiv.org/pdf/1710.10044.pdf

转载于:https://www.cnblogs.com/wangxiaocvpr/p/8283718.html

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

关注关注