论文阅读7-----基于强化学习的推荐系统 DRN: A Deep Reinforcement Learning Framework for News Recommendation

最新推荐文章于 2022-09-06 17:59:39 发布

界限消除者

最新推荐文章于 2022-09-06 17:59:39 发布

阅读量917

点赞数 2

文章标签：深度学习推荐系统强化学习数据挖掘

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/qq_37227782/article/details/112780579

版权

论文阅读7-----基于强化学习的推荐系统 DRN: A Deep Reinforcement Learning Framework for News Recommendation

ABSTRACT

In this paper, we propose a novel Deep Reinforcement Learning framework for news recommendation.

我们提出来RL方法用于新闻推荐。

Online personalized news recommendation is a highly challenging problem due to the dynamic nature of news features and user preferences. Although some online recommendation models have been proposed to address the dynamic nature of news recommendation, these methods have three major issues.

新闻推荐挑战很大，因为新闻特征和用户偏好动态变化大。现存的推荐系统方法有如下缺点。

First, they only try to model current reward(e.g., Click Through Rate).

1.仅仅尝试当前的奖励，下文引出RL方法，因为RL方法适用于长期的奖励。

Second, very few studies consider to use user feedback other than click / no click labels (e.g., how frequent user returns) to help improve recommendation.

2.没考虑用户反馈，即使考虑了也不过click/no click labels.(反馈不够丰富，下文提出回归时间凑数)

最低0.47元/天解锁文章

界限消除者

关注

2
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
论文阅读7-----基于强化学习的推荐系统 DRN: A Deep Reinforcement Learning Framework for News Recommendation

论文阅读7-----基于强化学习的推荐系统 DRN: A Deep Reinforcement Learning Framework for News RecommendationABSTRACTIn this paper, we propose a novel Deep Reinforcement Learning framework for news recommendation.我们提出来RL方法用于新闻推荐。Online personalized news recommendat.
复制链接

扫一扫

界限消除者 CSDN认证博客专家 CSDN认证企业博客

码龄8年

5: 原创

118万+: 周排名

198万+: 总排名

3710: 访问

: 等级

55: 积分

10: 粉丝

4: 获赞

1: 评论

24: 收藏

私信

关注

热门文章

最新评论

论文阅读4-----基于强化学习的推荐系统 Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learnin
weixin_37803082: 作者好，有几个问题读了好几遍论文都不明白，如果有时间是否可以帮忙回答一下，万分感谢！主要集中在如何利用离线数据集去训练DQN？ off-line train步骤中原文是这么说的：【We train the proposed model based on users’ offline log, which records the interaction history between RA’s policy b(st ) and users’ feedback. RA takes the action based on the off-policy b(st ) and obtain the feedback from the offline log. 】想请问feedback（也就是reward）是如何根据用户log就能得到的？因为训练时RA’s policy b(st ) 可能会推荐任意物品，推荐物品时用户的状态也是任意可能的。而reward函数的自变量是state和action。首先所推荐的物品不一定在用户的历史纪录内，而且训练时当时用户状态也不一定和log中相同，怎么保证用户的历史记录中能找到相同的state和action，以得到相应的reward？此外，关于off-line evaluation步骤中原文是这么说的：【The reason why recommender agent only reranks items in this session rather than items in the whole item space is that for the historical offline dataset, we only have the ground truth rewards of the existing items in this session】问题是，为什么在off-line evaluation中，就考虑了log中对有的物品没有ground truth rewards的问题，而在training中不考虑呢？以及online test步骤中原文是这么说的：：【The simulated online environment is also trained on users’ logs, but not

大家在看

最新文章

目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。