Google新闻推荐
We combine the information filtering mechanism using learned user profiles with an existing collaborative filtering mechanism to generate personalized news recommendations.
基于点击日志分析,构建贝叶斯框架,预测用户兴趣,探索新闻趋势
用户兴趣随着时间变化,并且和新闻事件的整体趋势一致
We found that their interests do vary over time but follow the aggregate trend of news events.
文章解决三个问题:
分析海量日志中用户兴趣的一致性
基于用户点击日志(用户的真实兴趣和新闻事件趋势),预测用户新闻事件兴趣
结合信息过滤和协同过滤方法改进推荐精准度
以前的google新闻推荐基于协同过滤方法,存在两个问题。
冷启动:新闻推荐存在及时性问题,需要不断更新,协同过滤需要经过很长的时间收集用户点击日志数据,从而产生推荐
the system cannot recommend stories that have not yet been read by other users For news recommendations, this is a serious
problem, as news service websites strive to present the most updated information to users in a timely manner
用户之间的兴趣是不同的:not all users are equal to each other, and the collaborative filtering method may not account for the individual variability between users。譬如娱乐新闻一般都比较流行,因此那些没有点击娱乐新闻的用户,也会产生推荐。The reason is
that entertainment news stories are generally very popular
为了解决这两个问题,需要构建用户画像信息user profile,描述用户的真实兴趣,从而过滤掉那些用户不感兴趣的新闻,譬如上面提到的那些流行的娱乐新闻。
The short-term interest usually is related to hot news events and changes quickly. In contrast, long term interest often reflects actual
user interest.
1用户兴趣日志分析
假设条件:The basic assumption of personalization is that users have reasonably consistent interests
数据集
We examine the anonymized click logs of those Google News users who were signed into their Google account and explicitly enabled history tracking over 14-month period, from 2007/7/1 to 2008/6/30. From users who made at least 10 clicks per month in that period, we randomly sampled 16,848 users. These users are from more than 10 different countries and regions.
点击分布
主题分类:C = {c1, c2 ,..., cn}
Ni 表示用户u在t月份在类别i上的新闻点击次数,Ntotal表示该时间内总的点击次数
基于时间的用户兴趣变化
Comparison between the click distribution of the month to be predicted and those of previous months
The figure shows that users’ news interests do change over time and their clicks in older history become less useful in predicting their
future interests.
新闻流行趋势
一般新闻变化趋势对个人新闻兴趣趋势的影响
an individual user’s click distribution is more similar to the click distribution of the general public in the same location than to a randomly selected location
日志分析总结
用户对新闻的偏好是随时间变化的
The click distributions of the general public reflect the news trend, which correspond to the big news events
不同地区新闻趋势不一样
To a certain extent, the individual user’s news interests correspond with the news trend in the location that the user belongs to
2通过贝叶斯方法预测用户兴趣
Short-term:通过群体共同的点击模式表示
贝叶斯方法:
(1)predicts user’s genuine news interests regardless of the news trend, using the user’s clicks in each past time period
(2)the predictions made with data in a series of past time periods are combined to gain an accurate prediction of the user’s genuine news interests
(3)predicts the user’s current interests by combining her genuine news interests and the current news trend in her location
预测用户真实兴趣
用户对类目ci的兴趣
组合时间段内的用户兴趣
假设用户对某个新闻感兴趣的先验概率不变
预测用户当前新闻兴趣
G表示虚拟点击,平滑作用
3 新闻推荐
Rec(article) = IF(article) ×CF(article)
IF(article)信息过滤,用上述公式
CF(article)协同过滤,来自文章Google news personalization: scalable online collaborative
Filtering
参考文献:
(1)Personalized News Recommendation Based on Click Behavior
(2)Google news personalization: scalable online collaborative
Filtering