明大推荐系统导论笔记 week 1

1.Introduction to Recommender Systems

  • Understand what a recommender system is
  • Some history and background

A Bit of History

  • Ants, Cavemen, and Early Recommender Systems
    – The emergence of critics 然后就follow critics啦
  • Information Retrieval and Filtering
  • Manual Collaborative Filtering
  • Automated Collaborative Filtering
  • The Commercial Era

Ants, Cavemen, and Early Recommender Systems

推荐即帮助选择,广泛存在的social navigation 即从众效应,可以帮助蚂蚁找到食物(别人的mark),帮助原始人确定哪些食物时可以食用的(别人的 result: dead or not),帮助体育馆的人找到出口(follow crowd,别人的choice),是一种典型的use information from others.在推荐系统中information 一般是critics

Information Retrieval

like 搜索引擎
- Static content base
– Invest time in indexing content
例如,图书索引
- Dynamic information need
– Queries presented in “real time”
查询是动态的
- Common approach: TFIDF
– Rank documents by term overlap
– Rank terms by frequency

Information Filtering

  • Reverse assumptions from IR
    – Static information need
    – Dynamic content base
    IR中,内容是static的,查询是dynamic,现在内容是dynamic,用户的taste/preferen是static
  • Invest effort in modeling user need
    – Hand-created “profile”
    比如用户follow的标签(科幻小说/传记)?
    – Machine learned profile
    – Feedback/updates
  • Pass new content through filters
    过滤出可能喜欢的内容push给用户,or inverse 过滤出不喜欢的内容(spame email)

Manual Collaborative Filtering

  • Premise
    – Information needs more complex than keywords or topics: quality and taste
    complex:比如,我喜欢科幻小说,但科幻不能太不靠谱
    taste:有时候比较难描述,比如我喜欢什么样的妹子
    It’s easy to figure out if something’s about a topic,but It’s harder to figure out if it matches your taste.
  • Small Community: Manual
    – Tapestry – database of content & comments
    通过别人的comments来filter自己想要的内容,比如找有人评论过:“interesting”的,比如,偏执的我在淘宝上买东西的时候,都要挑评论包含:“无异味”的item
    – Active CF – easy mechanisms for forwarding content to relevant readers
    例如,我们经常用微信分享信息给朋友,就是在filter information(她/他的taste和内容match)给朋友

Automated CF

The GroupLens Project (CSCW ’94)
predict for you which articles you might like to read based on a personalized match between you and other people who shared your taste.

  • ACF for Usenet News
    users rate items
    users are correlated with other users
    personal predictions for unrated items
  • Nearest-Neighbor Approach
    find people with history of agreement
    assume stable tastes

It Works Meaningfully Well!
predicting whether somebody would like an article, than simply looking at the average of what everybody said.

不是很清楚这个结果是怎么得到的

  • Usenet trial: rating/prediction correlation
    • rec.humor: 0.62 (personalized) vs. 0.49 (avg.)
    • comp.os.linux.system: 0.55 (pers.) vs. 0.41 (avg.)
    • rec.food.recipes: 0.33 (pers.) vs. 0.05 (avg.)
    个人的口味一般相差较大
  • Significantly more accurate than predicting average or modal rating.
  • Higher accuracy when partitioned by newsgroup
    ???
  • Relationship with User Behavior
    Twice as likely to read 4/5 than 1/2/3
  • Users Like GroupLens
    Some users stayed 12 months after the trial!

The Commercial Era

这就是我们目前所处的时代,网易云音乐,亚马逊等等
2016-02-28_161210.png-77.9kB
用taste相近的用户的rating加权
2016-02-28_161432.png-31.2kB

User-User Collaborative Filtering

2016-02-28_162034.png-49.4kB
首先measure target和所有人的distance,图中的阴影区域即相似的用户,然后用相似用户的rating{2,3}加权得到target的得分3

当用户群体爆炸的时候,会很慢!

Recommenders

Tools to help identify worthwhile stuff

  • Filtering interfaces
    E-mail filters, clipping services
  • Recommendation interfaces
    Suggestion lists, “top-n,” offers and promotions
  • Prediction interfaces
    Evaluate candidates, predicted ratings

A Little Vocabulary

  • Rating – expression of preference
    – Explicit rating (direct from the user)
    Implicit rating (inferred from user activity)
    -Prediction – estimate of preference
  • Recommendation – selected items for user
  • Content – attributes, text, etc.
    -Collaborative – using data from other users

Historical Challenges

  • Collecting Opinion and Experience Data
  • Finding the Relevant Data for a Purpose
  • Computing the Recommendations
  • Presenting the Data in a Useful Way

Your First Assignment

  • We are building a class ratings dataset using
    the MovieLens infrastructure
    – This will be used for several of the assignments
  • Your assignment is to rate movies through
    our interface:
    http://mooc.grouplens.org/ratemovies/

Welcome to the Course!

可以忽略这一节……

Software Environment

easy….

Taxonomy of Recommender Systems (part 1 of 2)

Learning Objectives

  • To understand the different types of recommender systems
    – A framework for analyzing recommender systems in general
    – A specific overview of different recommendation algorithms
  • To acquire a roadmap for the rest of the course, based on the algorithms studied

Analytical Framework

Dimensions of Analysis

  • Domain
  • Purpose
  • Recommendation Context
  • Whose Opinions
  • Personalization Level
  • Privacy and Trustworthiness
  • Interfaces
  • Recommendation Algorithms

Domains of Recommendation

  • Content to Commerce and Beyond
    – News, information, “text”
    – Products, vendors, bundles(促销组合
    – Matchmaking (other people,比如相亲?)
    Sequences (e.g., music playlists)
  • One particularly interesting property
    – New items (e.g., movies, books, …)
    Re-recommend old ones (e.g., groceries, music)

Google也可以看做web推荐系统

Purposes of Recommendation

  • The recommendations themselves
    – Sales
    – Information
  • Education of user/customer
    软件命令推荐(in this case 衡量指标,不应该是使用推荐命令的接受程度。我觉得这是因为,是否使用还和命令设计的好坏有关)
  • Build a community of users/customers around products or content
    tripAdviser
    2016-02-29_204334.png-212.7kB
    感觉就是大众点评……
    ReferralWeb
    2016-02-29_204757.png-28.4kB
    find technical expertise using key words was mind the network of collaborators that you had.
    then
    looking for an expert in something,perhaps recommender systems.
    And it would find experts that were close to you in your social graph.
    感觉没啥推荐技术,但是利用人际网络圈,有点意思

Recommendation Context

  • What is the User doing at the time of recommendation?
    – Shopping
    – Listening to Music
    如果,我经常切换歌,rs可能会推荐更多陌生的歌
    – Hanging out with other people
    此时,适合推荐多人的,而不是单人的
  • How does the context constrain the recommender?
    – Groups, automatic consumption (vs. suggestion),level of attention, level of interruption(是指不能太频繁推荐,否则会骚扰用户吗??)?

Whose Opinion?

  • “Experts”
    2016-02-29_212313.png-145.3kB
  • Ordinary “phoaks”
    所有人
  • People like you
    2016-02-29_212405.png-87.6kB
    ???
Personalization Level
  • Generic / Non-Personalized
    – Everyone receives same recommendations
    2016-02-29_213057.png-234kB
    哥是男的,给我推荐这个……
  • Demographic
    – Matches a target group
    例如,男女有别
    2016-02-29_213143.png-194.2kB
    I fit into the casual men’s demographic.
  • Ephemeral
    – Matches current activity
    例如,我现在想买本书
    2016-02-29_221903.png-70.3kB
    输入歌手名字,依据别人买的书推荐
  • Persistent
    – Matches long-term interests
    It had a model of his favorite artists
    2016-02-29_223518.png-99.1kB

Privacy and Trustworthiness

  • Who knows what about me?
    – Personal information revealed
    – Identity
    – Deniability of preferences
  • Is the recommendation honest?
    – Biases built-in by operator
    “business rules”
    比如,我只推荐还有库存的item,网易云现在也是推荐有版权的
    – Vulnerability to external manipulation
    老师举了moveilen中的一个怪现象,电影该开始上映的时候,评分往往很高,有人觉得是被黑了,黑客在movielen中创建大量账号,给某一部新上映的电影以洪水般的好评,以求拉高票房获利
    老师说其实并不是这样,刚开始评分高,是因为电影一上映,就去看的人,一般都是很期待很喜欢这部电影的,比如变形金刚的铁粉,他们一般会给高评分。
    – Transparency of “recommenders”; Reputation
    考虑评分人的信用,这可以部分缓解recommendation honest的问题

Interfaces

  • Types of Output
    • Predictions
      score
    • Recommendations
      set of items
    • Filtering
    • Organic vs. explicit presentation
      Agent/Discussion Interface
  • Types of Input
    • Explicit
      比如评分
    • Implicit
      比如,最终有没有买,or how often you return to look at a page?

Recommendation Algorithms

  • Non-Personalized Summary Statistics
  • Content-Based Filtering
    – Information Filtering
    – Knowledge-Based
  • Collaborative Filtering
    – User-User
    – Item-Item
    – Dimensionality Reduction
  • Others
    – Critique / Interview Based Recommendations
    – Hybrid Techniques

Taxonomy of Recommender Systems (part 2 of 2)

Linking these together

2016-03-01_220941.png-26.4kB

Non-Personalized Summary Stats

  • External Community Data
    – Best-seller; Most popular; Trending Hot
    External 是指????
  • Summary of Community Ratings
    – Best-liked
  • Examples
    – Zagat restaurant ratings
    – Billboard music rankings
    – TripAdvisor hotel ratings
    2016-03-01_221335.png-295.1kB

Content-Based Filtering

  • User Ratings x Item Attributes => Model
    item的attribute,比如通过你的rating知道你喜欢动作片,那么就给你推荐受欢迎的动作片
    Model applied to new items via attributes
    2016-03-01_222325.png-649.5kB
    user’s preferences dot product item’s attributes ,见台大机器学习基石第一课
  • Alternative: knowledge-based
    – Item attributes form model of item space
    比如新闻主题
    Users navigate/browse that space
  • Examples
    – Personalized news feeds
    – Artist or Genre music feeds

Personalized Collaborative Filtering

  • Use opinions of others to predict/recommend
  • User model – set of ratings
  • Item model – set of ratings
  • Common core: sparse matrix of ratings
    如果不是sparse matrix 那我们就不用推荐了……
    – Fill in missing values (predict)
    – Select promising cells (recommend)
  • Several different techniques

Techniques

  • User-user
    – Select neighborhood of similar-taste people
  • Variant: select people you know/trust
    – Use their opinions
  • Item-item
    – Pre-compute similarity among items via ratings
    – Use own ratings to triangulate for recommendations
  • Dimensionality reduction
    – Intuition: taste yields a lower-dimensionality matrix
    – Compress and use a taste representation
    2016-03-01_223340.png-400.3kB
    矩阵压缩,听起来很神奇

Note on Evaluation

  • To properly understand relative merits of each approach, we will spend significant time on evaluation
    – Accuracy of predictions
    – Usefulness of recommendations
  • Correctness
  • Non-obviousness
  • Diversity
    – Computational performance

Other Approaches

  • Interactive recommenders
    – Critique-based, dialog-based
  • Hybrids of various techniques

A Tour of Amazon.com

Learning Objectives

  • To explore a wide range of recommender systems in the context of a large,professional site
  • To understand how to review a recommender-enabled site
    未登录
    2016-03-03_210842.png-531.3kB
    2016-03-03_211045.png-661.9kB
    搜索某一商品
    2016-03-03_211407.png-130.9kB
    2016-03-03_211624.png-549.1kB
    关注的是当前,item-item product association比较有意思
    有了购买历史/浏览历史
    2016-03-03_212059.png-125.6kB
    2016-03-03_212217.png-447.7kB
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值