【李宏毅深度强化学习笔记】1、策略梯度方法(Policy Gradient) qqqeeevvv 2020-01-17 17:55:57 8099 收藏 44 分类专栏: 强化学习 # 理论知识 </div> </div> <div class="up-time" style="left: 120.906px; display: none;"><span>最后发布:2020-01-17 17:55:57</span><span>首发:2020-01-17 17:55:57</span></div> <div class="slide-content-box"> <div class="all-tags-box"> </div> <div class="article-copyright"> <div class="creativecommons"> <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"></a> </div> <div class="creativecommons"> 版权声明:本文为博主原创文章,遵循<a href="http://creativecommons.org/licenses/by-sa/4.0/" target="_blank" rel="noopener"> CC 4.0 BY-SA </a>版权协议,转载请附上原文出处链接和本声明。 </div> <div class="article-source-link"> 本文链接:<a href="https://blog.csdn.net/ACL_lihan/article/details/104020259">https://blog.csdn.net/ACL_lihan/article/details/104020259</a> </div> </div> </div> <div class="operating"> <a class="href-article-edit slide-toggle">版权</a> </div> </div> </div> </div> <!--python安装手册结束--> <article class="baidu_pl"> <div id="article_content" class="article_content clearfix"> <link rel="stylesheet" href="https://csdnimg.cn/release/phoenix/template/css/ck_htmledit_views-211130ba7a.css"> <div class="htmledit_views" id="content_views"> <p> </p> 【李宏毅深度强化学习笔记】1、策略梯度方法(Policy Gradient)(本文) 【李宏毅深度强化学习笔记】2、Proximal Policy Optimization (PPO) 算法 【李宏毅深度强化学习笔记】3、Q-learning(Basic Idea)