1.Introduction to Recommender Systems
- Understand what a recommender system is
- Some history and background
A Bit of History
- Ants, Cavemen, and Early Recommender Systems
– The emergence of critics 然后就follow critics啦 - Information Retrieval and Filtering
- Manual Collaborative Filtering
- Automated Collaborative Filtering
- The Commercial Era
Ants, Cavemen, and Early Recommender Systems
推荐即帮助选择,广泛存在的social navigation 即从众效应,可以帮助蚂蚁找到食物(别人的mark),帮助原始人确定哪些食物时可以食用的(别人的 result: dead or not),帮助体育馆的人找到出口(follow crowd,别人的choice),是一种典型的use information from others.在推荐系统中information 一般是critics
Information Retrieval
like 搜索引擎
- Static content base
– Invest time in indexing content
例如,图书索引
- Dynamic information need
– Queries presented in “real time”
查询是动态的
- Common approach: TFIDF
– Rank documents by term overlap
– Rank terms by frequency
Information Filtering
- Reverse assumptions from IR
– Static information need
– Dynamic content base
IR中,内容是static的,查询是dynamic,现在内容是dynamic,用户的taste/preferen是static - Invest effort in modeling user need
– Hand-created “profile”
比如用户follow的标签(科幻小说/传记)?
– Machine learned profile
– Feedback/updates - Pass new content through filters
过滤出可能喜欢的内容push给用户,or inverse 过滤出不喜欢的内容(spame email)
Manual Collaborative Filtering
- Premise
– Information needs more complex than keywords or topics: quality and taste
complex:比如,我喜欢科幻小说,但科幻不能太不靠谱
taste:有时候比较难描述,比如我喜欢什么样的妹子
It’s easy to figure out if something’s about a topic,but It’s harder to figure out if it matches your taste. - Small Community: Manual
– Tapestry – database of content & comments
通过别人的comments来filter自己想要的内容,比如找有人评论过:“interesting”的,比如,偏执的我在淘宝上买东西的时候,都要挑评论包含:“无异味”的item
– Active CF – easy mechanisms for forwarding content to relevant readers
例如,我们经常用微信分享信息给朋友,就是在filter information(她/他的taste和内容match)给朋友
Automated CF
The GroupLens Project (CSCW ’94)
predict for you which articles you might like to read based on a personalized match between you and other people who shared your taste.
- ACF for Usenet News
users rate items
users are correlated with other users
personal predictions for unrated items - Nearest-Neighbor Approach
find people with history of agreement
assume stable tastes
It Works Meaningfully Well!
predicting whether somebody would like an article, than simply looking at the average of what everybody said.
不是很清楚这个结果是怎么得到的
- Usenet trial: rating/prediction correlation
• rec.humor: 0.62 (personalized) vs. 0.49 (avg.)
• comp.os.linux.system: 0.55 (pers.) vs. 0.41 (avg.)
• rec.food.recipes: 0.33 (pers.) vs. 0.05 (avg.)
个人的口味一般相差较大 - Significantly more accurate than predicting average or modal rating.
- Higher accuracy when partitioned by newsgroup
??? - Relationship with User Behavior
Twice as likely to read 4/5 than 1/2/3 - Users Like GroupLens
Some users stayed 12 months after the trial!
The Commercial Era
这就是我们目前所处的时代,网易云音乐,亚马逊等等
用taste相近的用户的rating加权
User-User Collaborative Filtering
首先measure target和所有人的distance,图中的阴影区域即相似的用户,然后用相似用户的rating{2,3}加权得到target的得分3
当用户群体爆炸的时候,会很慢!
Recommenders
Tools to help identify worthwhile stuff
- Filtering interfaces
E-mail filters, clipping services - Recommendation interfaces
Suggestion lists, “top-n,” offers and promotions - Prediction interfaces
Evaluate candidates, predicted ratings
A Little Vocabulary
- Rating – expression of preference
– Explicit rating (direct from the user)
– Implicit rating (inferred from user activity)
-Prediction – estimate of preference - Recommendation – selected items for user
- Content – attributes, text, etc.
-Collaborative – using data from other users
Historical Challenges
- Collecting Opinion and Experience Data
- Finding the Relevant Data for a Purpose
- Computing the Recommendations
- Presenting the Data in a Useful Way
Your First Assignment
- We are building a class ratings dataset using
the MovieLens infrastructure
– This will be used for several of the assignments - Your assignment is to rate movies through
our interface:
– http://mooc.grouplens.org/ratemovies/
Welcome to the Course!
可以忽略这一节……
Software Environment
easy….
Taxonomy of Recommender Systems (part 1 of 2)
Learning Objectives
- To understand the different types of recommender systems
– A framework for analyzing recommender systems in general
– A specific overview of different recommendation algorithms - To acquire a roadmap for the rest of the course, based on the algorithms studied
Analytical Framework
Dimensions of Analysis
- Domain
- Purpose
- Recommendation Context
- Whose Opinions
- Personalization Level
- Privacy and Trustworthiness
- Interfaces
- Recommendation Algorithms
Domains of Recommendation
- Content to Commerce and Beyond
– News, information, “text”
– Products, vendors, bundles(促销组合)
– Matchmaking (other people,比如相亲?)
– Sequences (e.g., music playlists) - One particularly interesting property
– New items (e.g., movies, books, …)
– Re-recommend old ones (e.g., groceries, music)
Google也可以看做web推荐系统
Purposes of Recommendation
- The recommendations themselves
– Sales
– Information - Education of user/customer
软件命令推荐(in this case 衡量指标,不应该是使用推荐命令的接受程度。我觉得这是因为,是否使用还和命令设计的好坏有关) - Build a community of users/customers around products or content
tripAdviser
感觉就是大众点评……
ReferralWeb
find technical expertise using key words was mind the network of collaborators that you had.
then
looking for an expert in something,perhaps recommender systems.
And it would find experts that were close to you in your social graph.
感觉没啥推荐技术,但是利用人际网络圈,有点意思
Recommendation Context
- What is the User doing at the time of recommendation?
– Shopping
– Listening to Music
如果,我经常切换歌,rs可能会推荐更多陌生的歌
– Hanging out with other people
此时,适合推荐多人的,而不是单人的 - How does the context constrain the recommender?
– Groups, automatic consumption (vs. suggestion),level of attention, level of interruption(是指不能太频繁推荐,否则会骚扰用户吗??)?
Whose Opinion?
- “Experts”
- Ordinary “phoaks”
所有人 - People like you
???
Personalization Level
- Generic / Non-Personalized
– Everyone receives same recommendations
哥是男的,给我推荐这个…… - Demographic
– Matches a target group
例如,男女有别
I fit into the casual men’s demographic. - Ephemeral
– Matches current activity
例如,我现在想买本书
输入歌手名字,依据别人买的书推荐 - Persistent
– Matches long-term interests
It had a model of his favorite artists
Privacy and Trustworthiness
- Who knows what about me?
– Personal information revealed
– Identity
– Deniability of preferences - Is the recommendation honest?
– Biases built-in by operator
“business rules”
比如,我只推荐还有库存的item,网易云现在也是推荐有版权的
– Vulnerability to external manipulation
老师举了moveilen中的一个怪现象,电影该开始上映的时候,评分往往很高,有人觉得是被黑了,黑客在movielen中创建大量账号,给某一部新上映的电影以洪水般的好评,以求拉高票房获利
老师说其实并不是这样,刚开始评分高,是因为电影一上映,就去看的人,一般都是很期待很喜欢这部电影的,比如变形金刚的铁粉,他们一般会给高评分。
– Transparency of “recommenders”; Reputation
考虑评分人的信用,这可以部分缓解recommendation honest的问题
Interfaces
- Types of Output
- Predictions
score - Recommendations
set of items - Filtering
- Organic vs. explicit presentation
Agent/Discussion Interface
- Predictions
- Types of Input
- Explicit
比如评分 - Implicit
比如,最终有没有买,or how often you return to look at a page?
- Explicit
Recommendation Algorithms
- Non-Personalized Summary Statistics
- Content-Based Filtering
– Information Filtering
– Knowledge-Based - Collaborative Filtering
– User-User
– Item-Item
– Dimensionality Reduction - Others
– Critique / Interview Based Recommendations
– Hybrid Techniques
Taxonomy of Recommender Systems (part 2 of 2)
Linking these together
Non-Personalized Summary Stats
- External Community Data
– Best-seller; Most popular; Trending Hot
External 是指????
- Summary of Community Ratings
– Best-liked - Examples
– Zagat restaurant ratings
– Billboard music rankings
– TripAdvisor hotel ratings
Content-Based Filtering
- User Ratings x Item Attributes => Model
item的attribute,比如通过你的rating知道你喜欢动作片,那么就给你推荐受欢迎的动作片
Model applied to new items via attributes
user’s preferences dot product item’s attributes ,见台大机器学习基石第一课 - Alternative: knowledge-based
– Item attributes form model of item space
比如新闻主题
Users navigate/browse that space - Examples
– Personalized news feeds
– Artist or Genre music feeds
Personalized Collaborative Filtering
- Use opinions of others to predict/recommend
- User model – set of ratings
- Item model – set of ratings
- Common core: sparse matrix of ratings
如果不是sparse matrix 那我们就不用推荐了……
– Fill in missing values (predict)
– Select promising cells (recommend) - Several different techniques
Techniques
- User-user
– Select neighborhood of similar-taste people - Variant: select people you know/trust
– Use their opinions - Item-item
– Pre-compute similarity among items via ratings
– Use own ratings to triangulate for recommendations - Dimensionality reduction
– Intuition: taste yields a lower-dimensionality matrix
– Compress and use a taste representation
矩阵压缩,听起来很神奇
Note on Evaluation
- To properly understand relative merits of each approach, we will spend significant time on evaluation
– Accuracy of predictions
– Usefulness of recommendations - Correctness
- Non-obviousness
- Diversity
– Computational performance
Other Approaches
- Interactive recommenders
– Critique-based, dialog-based - Hybrids of various techniques
A Tour of Amazon.com
Learning Objectives
- To explore a wide range of recommender systems in the context of a large,professional site
- To understand how to review a recommender-enabled site
未登录
搜索某一商品
关注的是当前,item-item product association比较有意思
有了购买历史/浏览历史