【李宏毅深度强化学习笔记】2、Proximal Policy Optimization算法(PPO) qqqeeevvv 2020-01-15 15:00:01 6525 收藏 12 分类专栏: # 理论知识 强化学习 </div> </div> <div class="up-time"><span>最后发布:2020-01-15 15:00:01</span><span>首发:2020-01-15 15:00:01</span></div> <div class="slide-content-box"> <div class="all-tags-box"> </div> <div class="article-copyright"> <div class="creativecommons"> <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"></a> </div> <div class="creativecommons"> 版权声明:本文为博主原创文章,遵循<a href="http://creativecommons.org/licenses/by-sa/4.0/" target="_blank" rel="noopener"> CC 4.0 BY-SA </a>版权协议,转载请附上原文出处链接和本声明。 </div> <div class="article-source-link"> 本文链接:<a href="https://blog.csdn.net/ACL_lihan/article/details/103989581">https://blog.csdn.net/ACL_lihan/article/details/103989581</a> </div> </div> </div> <div class="operating"> <a class="href-article-edit slide-toggle">版权</a> </div> </div> </div> </div> <!--python安装手册结束--> <article class="baidu_pl"> <div id="article_content" class="article_content clearfix"> <link rel="stylesheet" href="https://csdnimg.cn/release/phoenix/template/css/ck_htmledit_views-211130ba7a.css"> <div class="htmledit_views" id="content_views"> <p><a href="https://blog.csdn.net/ACL_lihan/article/details/104020259">【李宏毅深度强化学习笔记】1、策略梯度方法(Policy Gradient)</a></p> 【李宏毅深度强化学习笔记】2、Proximal Policy Optimization (PPO) 算法(本文) 【李宏毅深度强化学习笔记】3、Q-learning(Basic Idea) 【李宏毅深度强化学习笔记】4、Q-learning更高阶的算法