Deep RL Bootcamp Lecture 8 Derivative Free Methods

最新推荐文章于 2021-01-17 06:51:08 发布

weixin_30564901

最新推荐文章于 2021-01-17 06:51:08 发布

阅读量71

点赞数

原文链接：http://www.cnblogs.com/ecoflex/p/8979721.html

版权

you wouldn't try to explore any problem structure in DFO

low dimension policy

30 degrees of freedom

120 paramaters to tune

keep the positive results in a smooth way.

How does evolutionary method work well in high dimensional setting?

If you normalize the data well, evolutionary method could work well in MOJOCO, with random search.

Could always only get stuck at local minima.

humanoid 200k parameters need to be tuned, and it's learnt by evolutionary method.

The four videos are actually four different local minima, and once you get stuck on it, it can never get out of it.

evolutionary method is roughly 10 times worse than action space policy gradient.

evolutionary method is hard to tune because previously people didn't get it to work with deep net

转载于:https://www.cnblogs.com/ecoflex/p/8979721.html

weixin_30564901

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Deep RL Bootcamp Lecture 8 Derivative Free Methods

you wouldn't try to explore any problem structure in DFOlow dimension policy30 degrees of freedom120 paramate...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。