基于at模型的微博用户兴趣挖掘研究-计算机工程与应用
126 2015 ,51(13) Computer Engineering and Applications 计算机工程与应用
基于AT 模型的微博用户兴趣挖掘研究
王永贵,张 旭,刘宪国
WANG Yonggui, ZHANG Xu, LIU Xianguo
辽宁工程技术大学 软件学院,辽宁 葫芦岛 125105
College of Software, Liaoning Technical University, Huludao, Liaoning 125105, China
WANG Yonggui, ZHANG Xu, LIU Xianguo. Research on micro-blog user ’s interest mining based on author-topic
model. Computer Engineering and Applications, 2015, 51 (13):126-130.
Abstract :As micro-blog grows more popular and widely used, micro-blogging site such as Sina has become a huge
source of information, although the traditional method of texts, topic mining has been extensively applied research. For
this special kind of text of micro-blogging, traditional text mining algorithm can not be well studied. In order to compensate
the deficiencies of current topic mining for micro-blogging platform and considering the sparsity and multidimensional
characteristics of micro-blogging, this paper proposes targeted pretreatment method and combines the users ’micro-blogging
data with AT model, then mining the micro-blog topics by gibbs sampling, getting users ’interest through extracting the
topics of authors. Through the experiments on a real data sets, as well as comparison with LDA models prove that the
model can get micro-blog topics effectively.
Key words :micro-blog; topic mining; author-topic model; Gibbs sampling
摘 要:随着微博的日趋流行与广泛使用,新浪等微博网站已经成为海量信息的来源,虽然传统的文本主题挖掘方
法已经得到广泛的应用研究,但对于微博这种特殊结构的文本,传统的挖掘算法不能很好地对其进行研究。为了弥
补目前微博平台主题挖掘方法的不足,以及考虑到微博信息的稀疏性,多维性等特点,提出有针对性的预处理方法,
将用户微博数据与AT 模型结合,通过吉布斯采样进行微博主题挖掘,对作者主题进一步提取得到用户兴趣。通过
在真实数据集上的实验,以及与LDA 模型对比,证明该模型能有效得到微博主题。
关键词:微博;主题挖掘;AT 模型;吉布斯采样
文献标志码:A 中图分类号:TP 181 doi :10.3778/j.issn. 1002-8331.1307-0086
1 引言 方式有关注、被关注以及互相关注。系统通过“为你推
作为一个提供用户互相交流与分享信息的社交网 荐”图框以及“可能感兴趣的人”等功能达到以上目的。
站,微博以其简单、便捷,传递速度快等特性,