【论文笔记(4)】ACM2021_A Survey on Stream-Based Recommender Systems

Ⅰ 论文信息

《A Survey on Stream-Based Recommender Systems》是2021年五月新发表在ACM Computing Survey上的关于流数据推荐系统的综述。
作者自称是第一篇关于SBRS(Stream-Based Recommender Systems)的综述,对于该领域的研究具有比较重要的意义。

Ⅱ 论文框架

In Section 2, we present the characteristics of SBRS and a corresponding algorithmic description, and in Section 3, we present the relations ofSBRS with other families ofRS and other areas. In Section 4, we present the classification ofexisting work that we adopt in this survey. Section 5 reviews existing approaches for SBRS, and Section 6 discusses methodologies for evaluating SBRS. Finally, Section 7 concludes the article and gives directions for future research.

1 INTRODUCTION

在intro中作者大致介绍了RS的发展过程,提到了以batch的形式起作用的RS的两个limitations:

  1. 不能及时考虑到最近的观察,不能考虑到最新的用户变化
  2. 随着数据集不断增加,周期性的retraining的计算非常昂贵,同时也导致了包含存储在内的scalability issues
    虽然有一些方法试图解决以上两个问题,如 time-aware RS 和 distributed 模型,但只有SBRS能同时处理上述两个限制。

SBRS被要求处理连续的数据流,并保持最新的推荐模型,通常依赖于 incremental learning

本文与现有综述的关系:
只有两篇与本文相关。[140]考虑了CF与时间维度的结合;[25]简要的陈述了SBRS的话题并提及了相应的评价方法。

2 CHARACTERISTICS OF SBRS

2.1 - general information about RS,以与SBRS的settings形成对比
2.2 - characteristics of SBRS
2.3 - algorithmic description

2.1 General Information About RS

  1. RS定义
    The recommender algorithm predicts the utility of each item, i.e., the rating or the relevance score used to rank items, for a target user. Items chosen then for recommendation are those maximizing the predicted utility.
  2. RS的发展都是以 batch setting 的模式进行的,这与现实世界的一些环境之间存在gap。
  3. 对于社交媒体、新闻等动态变化的领域之中,SBRS是必须的

2.2 Online Adaptive Learning in the Streaming Setting

  1. Stream setting 与以前的区别
  • Observations arrive online.
  • The system does not control the order nor the rate in which observations arrive.
  • Data streams are possibly unbounded.
  • Once an observation from the stream is processed, it is either discarded or archived in a memory which size is limited and relatively small in comparison with the size of the stream. Only observations stored in the memory can be retrieved when needed.
  1. 对于SBRS的要求
  • 第一,数据只有 single pass。主要思想在于模型进行增量学习,一次处理一个到达的数据。
  • 第二,在SBRS中数据处理的速度必须比其到达的速度要更快,并且可以实时产生推荐。SBRS可以从分布式数据流处理中获得方法。
  • 第三,SBRS还要能够随着时间演化,因为数据中的underlying concepts会随时间变化。能够处理和适应变化的模型——online adaptive learning。
  1. Online adaptive learning 的三个步骤
  • Predict: Recommend N items to the user u that is observed.
  • Evaluate: After observing the true interaction of the user u u u, evaluate the quality of the list of items that was recommended.
  • Update: The true interaction may be used to update the recommendation model.

2.3 Algorithmic Description of SBRS

在这里插入图片描述
在这里插入图片描述

3 RELATIONS WITH OTHER AREAS

本节讲述了SBRS与其他研究领域的关系。
3.1 - time dimension in RS used to model the evolution of concepts
3.2 - concpet drifts in data stream mining

3.1 Time Dimension in RS

TARS与SBRS的区别:
→TARS不能实时保持模型动态更新。
→TARS认为整个数据集在训练时时可得的,并且可以进行 multiple passes, 然而这与SBRS的环境不符。

3.2 Data Stream Mining and Online Adaptive Learning

Compared to data stream mining problems that focus mainly on modeling one concept, SBRS track several entities evolving over time, i.e., users and items.

4 CLASSIFICATION OF EXISTING WORK ON SBRS

本文中提出的SBRS结构借用了数据流综述[59]中提出的online adaptive learning algorithm结构,并进行了改动。
在这里插入图片描述
The framework we adopt is constituted offourmodules, i.e., components [13]: the memory module that selects the data that will be fed to the model, the learning module that defines how we build the recommendation model, the change detection module that is responsible for detecting concept drifts, and the retrieval module that handles the retrieval of recommendations for individual users.

5 APPROACHES FOR SBRS

5.1 Memory Module

5.2 Learning Module

这两小节的笔记都是在ipad上记录的,现附上图片。
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

5.3 Change Detection Module

change detection module 负责 active detection of drifts

事实上,持续随着时间更新模型也是 passively 适应变化的方式。但是这样的话,对changes的反应太慢了:更新模型的速度是一定的,但change发生是随机的。

因此,我们要采用 active 方法来显式检测漂移,从而激发学习机制来更新模型。

Drift detection is handled by the change detection module, and once a drift is reported, several strategies with regard to learning and adaptation can be performed.

论文:

  • [14] : 引入了dynamic local models来适应漂移
    • ⭐过程:
      • 初始时,用户们被分为一些clusters,为每个 cluster 学习一个 neighborhood-based local model ,并将该模型用于生成推荐
    • IDEA : 自动检测导致一个用户产生|其他cluster中的用户的行为|的用户偏好的漂移,并更新相应模型
  • [15]:hybrid RS & topic modelling & news rec

5.4 Retrieval Module

The retrieval module 负责 generating recommendations for a user, which usually includes computing the relevance score of each item, ordering them, and selecting the N items scoring the highest.

有效获取推荐的最常见问题形式被定义为 Maximum Inner Product Search (MIPS)。给定某用户的query p p p m m m个物品的集合 I = { q ∈ R K : 1 , . . . , m } I=\{q\in R^K:1,...,m\} I={qRK:1,...,m},MIPS的目的在于 identify the set of items having the largest values of the inner product with p p p.

解决MIPS的方法:

  1. Hashing methods
  • 用哈希来减少计算,导致估计的结果
  • 需要在计算的速度和推荐的质量之间进行trade-off
  • 主要思想:把空间分成几个部分,来自同一个部分的元素 (users and items) 会得到相同的 hashing code
  • Given a specific user and its corresponding bucket, we only consider the items from the same bucket as candidates for recommendation.
  • 论文
    • [110] , [122]
      • Locality Sensitive Hashing (LSH)
      • 便于解决 nearest neightbor search 问题
    • [110]
      • symmetric LSH
  1. Tree-based methods
  • metric trees etc.
  1. Sequential scan
  • 一个搞笑的解决方案是[130], [131]中提出的引入LEMP框架
    • 采用的思想:通过柯西不等式,筛选掉那些不足以成为候选的item vectors
    • 物品被分为相似长度的簇,我们在每个簇中搜寻余弦相似度最小的物品
    • 同时在有限数量的 维度上计算了 paritial inner products 作为增量修剪搜索空间的上限

6 EVALUATION OF SBRS

SBRS introduces a setting where the chronological order of observations should be considered and where the models continuously evolve, leading to a different framework for evaluation.

Online evaluation provides the strongest evidence regarding the value of an RS.

6.1 Offline Evaluation of RS

offline 实验不能反映出RS在 user behavior
上的影响。
时间维度被忽略,原本的数据的顺序被打乱。

6.2 Offline Evaluation of SBRS

为了解决batch offline evaluation中的问题,大多数方法依赖于 prequential methodology。

6.2.1 Evaluation Methodologies

Prequential evaluation 是对于每个收到的observation都进行 test-then-learn 的过程。

SBRS的evaluation也不全是在streaming setting中进行的,它需要一个初始的training阶段。

[100]中提出将数据集进行如下划分:
(1) The Batch Train part:前30%数据集
(2) The Batch-Test–Stream Train part:接下来20%数据集,可看作validation set
(3) The Stream Test and Train part:最后50%数据集

6.2.2 Evaluation Metrics

The main characteristic in the streaming setting is that we are evaluating against a single item: the number of relevant items for the user is always one item, given that we evaluate the recommendation quality every time we receive a new interaction. This is in contrast to batch RS where we are evaluating once for every user against the whole test set: the number of relevant items could thus be greater than 1, given that the test set could contain several interactions per user.

介绍了precison,recall,DCG,MRR几种metrics

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值