推荐系统小结

最新推荐文章于 2024-10-05 09:59:27 发布

oucpowerman

最新推荐文章于 2024-10-05 09:59:27 发布

阅读量595

点赞数

分类专栏：推荐系统

推荐系统专栏收录该内容

12 篇文章 0 订阅

订阅专栏

转载自 http://blog.csdn.net/lzt1983/article/details/38884435

许久不做推荐系统了，但偶尔也会关注一些相关的论文或资料。最近喜欢整理思路，顺便把自己过去几年对这个方向的理解做个小结吧。

什么是推荐系统：

自动或被动地为用户推送其可能感兴趣的目标单元（商品、人、活动等）的系统。

推荐系统的目的：

通过在用户和目标单元（简称商品）之间建立桥梁（为用户推荐合适的商品，为商品找到合适的用户），为网站的商业目的服务。主要作用包括：

a 信息的过滤。由于网上信息、商品等极大丰富，用户不可能挨个去挑选，推荐系统可以把用户不感兴趣的过滤掉，可能感兴趣的挑出来。

b 缩短用户决策路径，提升用户体验。

c 提升商品的点击、转化，增加成交额。

d 为网站的业务诉求提供一个有效的实现渠道，例如增加长尾商品的曝光，为构建更健康的生态系统服务。

推荐与个性化：

个性化：同一个页面、同样的点击，不同用户得到的信息是不同的，是根据不同用户的特征做过特定处理的。

a. 精准广告又称广告推荐。

b. 搜索系统个性化是一个很重要的方向。

c. 推荐系统有很多个性化的场景，例如个人中心；也有非个性化的，如热门推荐和一般的商品右侧推荐。

广告和推荐领域有交叉，站外推荐=个性化重定向广告

位置的分配问题，例如视频网站，一个好位置，是给推荐扩大流量，还是放广告变现？取决于短期的收入计算 vs 市场份额 vs 长期用户体验的权衡。

推荐系统的分类方法：

a 用户意图明确时的推荐：item相关推荐、个性化搜索；

b 用户意图不明确时的推荐：个人中心推荐、猜你喜欢。

A 为人推荐商品；

B 为人推荐人；

C 为商品推荐商品等。

推荐系统的常用方法：

a 基于集体智慧的推荐。或称协同过滤。前提假设与市场经济原理类似，即用户在做决策时是经过深思熟虑的，只做对自己最有利的判断，比机器的判断更靠谱。因此可以把大量用户的决策记录收集起来进行挖掘，以此为依据对后续需要做决策的用户提供参考。(item cf, user cf, generalized FM)

b 基于内容分析的推荐。分析商品的类别、功能、价格、外观等，给用户推荐与过往消费历史接近的商品。(keyword, classification)

c 基于用户分析的推荐。根据用户的性别、年龄、兴趣爱好、消费能力、历史搜索query等信息，推荐相同人群感兴趣的商品。(demographic, clustering)

d 基于好友关系的推荐。根据其好友感兴趣的内容做推荐，或者直接让其好友为其推荐。(trust based rec)

e 编辑推荐、专家推荐。由经验丰富的用户做决策，来为普通用户作参考。(expert detection)

f 基于地理位置的推荐。

g 基于当前场景的推荐。或称上下文推荐。例如根据当前观看的视频进行推荐，或当前时间，或推荐场景（与家人一起看）。(similar item, tensor factorization)

h 基于搜索行为的推荐。根据相关性，或统计搜索的后续行为。

i 基于推荐理由的推荐。先根据用户行为产生推荐理由，然后根据理由生成推荐列表。

优化目标：

a 优化点击率：

b 优化转化率：

c downstream优化：结合后续一系列行为优化；通过收入的ABtest做线上评测

主要技术：

a index、search、match

b learning to rank

c ensemble、多样性、过滤

d feature engineering：点击、消费、内容; 籍贯与地域、好友、人生阶段、特殊状态（失恋、结婚、生育、生病）、消费周期（尿不湿）、时间段（季节、是否工作日、白天晚上、双11）、环境（天气、污染）、外部事件（广告活动、政治事件）、情绪状态（兴奋、焦躁）

e rule base: 集体智慧 vs 专家智慧（数据 vs 经验），模型 vs 规则

下面是我所理解的推荐系统具体方法的清单，和实施过程面临的主要挑战。

- item cf (item based collaborative filtering)
  - different similarity measurements (cosine, L2-distance, Jacard, Pearson)
  - item popularity and freshness
- user cf
  - user-user similarity calculation
  - topic constrained user cf (for news recommendation)
  - expert cf (firstly detect domain experts, and rec items to users based on similar experts)
- factorization machines
  - svd++
  - libfm
- keyword/topic based methods
  - variable combinations of keywords, like N-gram
  - Bayesian inference of p(topic|user) and topic popularities
- random walk
  - transition matrix
- SNS based methods
  - friend cf (user-friend relationships instead of user-user similarities in user cf)
- LBS based methods
  - local hot reranking
- demographic based methods
  - rec by demographic segmentation
- context-aware methods
  - tensor factorization
  - sequential pattern based prediction
  - learning using context of user choise
- other methods
  - editor's choise
  - rec by top list
  - rec by commodity consumption cycle
- recommend items to items
  - click co-occurence based methods
  - keyword, topic based methods
  - personalized relevance model
  - recommend products by similar images
- recommend users to users
  - EdgeRank
  - Twitter's algorithm
- merging
  - blending (weights of algorithms, weights of items of algorithms, user feedback)
  - ensemble (LR, RBM, GBM, random forest)
  - switching (switch methods by context)
  - cascading (multi-level model)
- evaluation
  - offline metrics (MAP, nDCG, AUC, diversity...)
  - online metrics (CTR, percent conversion, user active degree, long tail item exploration)
- challenges
  - page optimization: by explanation, by method, clustering (k-means, AP clustering, HDP)
  - long-term interest vs. short-term action
  - exploitation vs. exploration: multi-armed bandits
  - accurate vs. diverse
  - freshness vs. stability
  - navigation vs. attention