NFQ文献中参考文献的作用

最新推荐文章于 2024-04-24 19:47:22 发布

Maggie_zhangxin

最新推荐文章于 2024-04-24 19:47:22 发布

阅读量682

点赞数

分类专栏：强化学习资料，论文文章标签：强化学习 NFQ 论文

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/Maggie_zhangxin/article/details/77163816

版权

[BM95] Boyan and Moore. Generalization in reinforcement learning: Safely approximating the value function. In Advances in Neural Information Processing Systems 7. Morgan Kaufmann, 1995.

运用多层感知器表示价值函数，所存在的问题

[EPG05] D. Ernst and and L. Wehenkel P. Geurts. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6:503–556, 2005.

NFQ是其中’Fitted Q Iteration’的special realisation

[Gor95] G. J. Gordon. Stable function approximation in dynamic programming. In A. Prieditis and S. Russell, editors, Proceedings of the ICML, San Francisco, CA, 1995.

定值迭代算法fitted value iteration algorithm，NFQ基于此

[Lin92] L.-J. Lin. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8:293–321, 1992.

运用多层感知器表示价值函数的成功案例；

’experience replay‘ technique

[LP03] M. Lagoudakis and R. Parr. Le

最低0.47元/天解锁文章

Maggie_zhangxin

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
NFQ文献中参考文献的作用

[BM95] Boyan and Moore. Generalization in reinforcement learning: Safely approximating the value function. In Advances in Neural Information Processing Systems 7. Morgan Kaufmann, 1995.运用多层感知器表示
复制链接

扫一扫

专栏目录

Maggie_zhangxin CSDN认证博客专家 CSDN认证企业博客

码龄7年

8: 原创

40万+: 周排名

180万+: 总排名

4万+: 访问

: 等级

404: 积分

24: 粉丝

39: 获赞

8: 评论

116: 收藏

私信

关注

热门文章

分类专栏

最新评论

强化学习 Q学习原理及例子（离散）附matlab程序
Zhang_Jin_: 初始状态 Q（5，5）=0; 作者说了
强化学习 Q学习原理及例子（离散）附matlab程序
跑不动的小韩: Q(5,5)不是等于100吗？
python 网络爬虫与信息提取（入门+练习）
罡罡同学: 感谢分享，对我这个菜鸡来说很友好。
强化学习 Q学习原理及例子（离散）附matlab程序
流苏世家回复 luv-letter: 你说的和我想的一样
强化学习 Q学习原理及例子（离散）附matlab程序
qq_36451085: %Q learning of single agent move in N rooms %Matlab Code companion of Q Learning by Example function q=ReinforcementLearning clear; clc; format short%结果显示时保留4位小数 format compact%结果显示时上下两行间不空行 %Two input :R(reward matrix) and gamma %row and colum=states;-Inf=no door between room R=[-inf -inf -inf -inf 0 -inf; -inf -inf -inf 0 -inf -inf -inf -inf -inf 0 -inf -inf -inf 0 0 -inf 0 -inf 0 -inf -inf 0 -inf 100 -inf 0 -inf -inf 0 100]; gamma=0.80;%learning parameter q=zeros(size(R));%initialize Q as zero,q的行数和列数等于矩阵R的维数 q1=ones(size(R))*inf;%initialize previous Q as big number count=0; for episode=0:50000 %random initial state y=randperm(size(R,1));%产生1到6的随机数 %a=size(R,1)把矩阵R的行数返回给a %randperm(n)将1~n顺序随机打乱 %randperm(n,m)在前n个数中随机选m个数，n>=m state=y(1);%取1到6的随机数的第一个数,initialize state %select any action from this state x=find(R(state,:)>=0);%find action of this state %返回矩阵R中第state行中所有列中不小于0的数据的下标 if size(x,1)>0,%size(x,1)返回x的行数 x1=x(rand

最新文章

目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。