基于用户点击行为的新闻个性推荐

最新推荐文章于 2023-03-02 15:29:40 发布

weixin_34232363

最新推荐文章于 2023-03-02 15:29:40 发布

阅读量1k

点赞数

文章标签：数据结构与算法数据库人工智能

原文链接：https://my.oschina.net/manmao/blog/1517650

版权

为什么80%的码农都做不了架构师？>>>

摘要

本文根据论文《Personalized News Recommendation Based on Click Behavior》理论翻译总结而来。Google News在www2010上放出了《Personalized News Recommendation Based on Click Behavior》。这篇文章重点解决推荐精准性和新资讯的冷启动问题，文章想法也很朴素自然，主要是基于贝叶斯理论进行建模。他们假设用户兴趣有两个方面：个人不断变化的兴趣以及当前新闻热点。在具体建模之前，作者先基于历史数据进行了统计分析，验证了他们的假设，得到如下基本结论：用户的兴趣是随时间变化的，新闻热点也是随时间变化的（如图1所示）。还有一个比较比较有趣的结论是：不同地区同一时间的新闻热点是不一样的(如图2)。

图1. 不同时间段分类点击率。（图片来自论文内容）

从上图可以看出，不同时间段，用户对sport,health,national,entertainment这几个分类的点击率是不同的，证明随着时间得变化，用户的兴趣点也在变化。

图2. Spain,US,UK这三个地区不同时间段对sport分类的点击率。（图片来自论文内容）

从上图，可以看出，三个不同的地区在不同的时间段，对Sport类的新闻的点击率是不同。

算法模型

C = {c1,c2,...,cn}

新闻分类：包括“world”, “sports”, “entertainment”等

D(u,t)

，表示用户u,时间段t,分类c 的点击分布概率。Ni表示用户u在过去的一个月t点击分类ci的点击次数。Ntotal是用户u过去对分类ci所有的点击数据。

Then, for each user u , we computed the distribution of her clicks in every month t , D(u,t)

D(t)

过去一个时间段t,该地区所有用户对各个分类的点击数分布概率。所有用户某个分类所有点击数/所有用户总的点击数，就是D(t)

For each country, the general interests can be represented by the distribution of all the clicks made by the users from that country in a past time period t , represented as D(t) .

贝叶斯分布概率推荐模型

用户兴趣预测模型。

Pt (category = ci | click) 用户u对分类ci的新闻的点击数占用户u总点击数的比例。(个人兴趣分布)

= 用户u对分类ci的文章的点击数/用户u总对文章总的点击数

英文原文：

pt (category = ci | click) is the probability that the user’s clicks being in category ci . It can be estimated by the click distribution D(u,t) observed in time period t , as
computed in Equation

Pt (category = ci ) 近似为D(t),公众的兴趣分布。所有用户在时间段t的对分类ci的点击分布，也就是该地区所有用户在时间段t内点击分类ci的点击数占所有人总点击数的一个比例。

=所有用户在时间段t内点击分类ci的文章点击数/所有用户总的文章点击数

英文原文：

pt (category = ci ) is the prior probability of an article being about category ci . This is the proportion of news articles published about that category in the time period,which correlates with the news trend in the location. As more news events happen in a given topic category, more news articles will be written in that category. Thus, we can approximate this probability with the click distribution of the general public D(t)

Pt (click) 该用户在时间段t内的对所有发布的新闻的点击率，不论文章分类。

=用户u点击文章数/发布的文章总数

英文原文：

pt (click) is the prior probability of the user clicking on any news article, regardless of the article category.