推荐系统笔记一:overview

Motivated by Dr. Wu, briefly summarize the paper for future reference.

This overview is based on my understanding of the paper: Zeynep Batmaz, Ali Yurekli, Alper Bilge, Cihan Kaleli, A review on deep learning for recommender systems: challenges and remedies, 2018.

Why we need RS?
Solve information overload problem

Classification:

  • Collaborative filtering RS

    • Memory-based
      Utilize the entire user-item matrix to identify similar entities. After locating the nearest neighbors, past ratings of these entities are employed for recommendation purposes.
      User-based: Employ past preferences of nearest neighbors to user a
      Item-based: Employ the ratings of similar items to item q

    • Model-based
      Aim to build an offline model by applying machine learning and data mining techniques. Building and training such model allows estimating predictions for online CF tasks.

  • Content-based RS
    The main purpose is to recommend items that are similar to those that a user liked in the past. For instance, if a user likes a website that contains keywords such as “stack”, “queue”, and “sorting”, a content-based recommender system would suggest pages related with data structures and algorithms.

  • Hybrid RS

The main difference between collaborative filtering and content-based is that CF relies on the past history of user behavior, i.e. user and item rating while content-based relies on item or user attribute, i.e. content distribution.

Challenges and solutions

  • Accuracy: usually judged by three ways, the accuracy of rating predictions, usage predictions, and ranking of items.
    Solution: use ML model to extract hidden features and jointly combine information from varying sources.
  • Sparsity or Cold-start: lack of data, i.e. user ratings or new user information
    Solution: use ML model to extract high dimensional and denser feature representation/ use ML model to extract features from heterogeneous data sources/ combine content-based RS for cold-start problem.
  • Scalability: balance between model complexity and respond time.
    Solution: use ML model to extract high dimensional data, i.e. less dimensions/ modify ML model to accelerate training process/ parallel computing

The accuracy in CF system is not simply equal to the prediction accuracy as normal machine learning tasks. A good model should give both related items and thrilling items which might attract users, i.e. it should balance exploration and exploitation.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值