Continuous control with deep reinforcement learning

https://arxiv.org/abs/1509.02971

We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.
Comments: 10 pages + supplementary
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
Cite as: arXiv:1509.02971 [cs.LG]
  (or arXiv:1509.02971v5 [cs.LG] for this version)

Submission history

From: Jonathan Hunt [ view email
[v1] Wed, 9 Sep 2015 23:01:36 GMT (344kb,D)
[v2] Wed, 18 Nov 2015 17:34:41 GMT (338kb,D)
[v3] Thu, 7 Jan 2016 19:09:07 GMT (338kb,D)
[v4] Tue, 19 Jan 2016 20:30:47 GMT (339kb,D)
[v5] Mon, 29 Feb 2016 18:45:53 GMT (339kb,D)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值