【论文原文】:q-Learning in Continuous Time
【作者信息】:
Yanwei Jia |
Xun Yu Zhou |
获取地址:
22-0755.pdf (jmlr.org)https://www.jmlr.org/papers/volume24/22-0755/22-0755.pdf博主关键词:continuous-time reinforcement learning, policy improvement, q-function,
martingale, on-policy and off-policy
摘要:
We study the continuous-time counterpart of Q-learning for reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation introduced b