RankSvm-基于点击数据的搜索排序算法

题记:开始做毕设了,关于行人重识别的算法研究,目测过程中会遇到很多问题,把问题和解决问题的方法都记录下来,方便回顾。由于原文的数学公式在显示时会间断性出错,所以采用了截图的方式,便于查看。

关于相似性匹配的问题,打算引入推荐排序算法中的RankSvm,看到一篇写得比较详细的帖子,转载过来学习学习~

原文:http://kubicode.me/2016/03/30/Machine%20Learning/RankSvm-Optimizing-Search-Engines-using-Clickthrough-Data/?utm_source=tuicool&utm_medium=referral,by Kubi Code。


       RankSvm是Pairwise的学习排序中最早也是非常著名的一种算法,主要解决了传统PontWise构建训练样本难的问题,并且基于Pair的构建的训练样本也更为接近排序概念。


基本介绍
        RankSvm是在2002年提出的,之前工作关于LTR的工作貌似只有Pointwise相关的,比如PRanking,这样的排序学习算法Work需要含有档位标注的训练样本,一般有以下几种获取方式:
1、需要人工/专家标注。

2、诱导用户对展现的搜索结果进行反馈。
这样就会存在会成本高、可持续性低、受标注者影响大等缺点。
而RankSvm只需要根据搜索引擎的点击日志构建Pair对即可,相对于先前的工作在算法的实用性上有了非常大的改善。


训练样本设计
    

基本思想



RankSVM排序













This paper focuses on the problem of Question Routing (QR) in Community Question Answering (CQA), which aims to route newly posted questions to the potential answerers who are most likely to answer them. Traditional methods to solve this problem only consider the text similarity features between the newly post-ed question and the user profile, while ignoring the important statistical features, including the question-specific statistical fea-ture and the user-specific statistical features. Moreover, tradition-al methods are based on unsupervised learning, which is not easy to introduce the rich features into them. This paper proposes a general framework based on the learning to rank concepts for QR. Training sets consist of triples (q, asker, answerers) are first col-lected. Then, by introducing the intrinsic relationships between the asker and the answerers in each CQA session to capture the intrinsic labels/orders of the users about their expertise degree of the question q, two different methods, including the SVM-based and RankingSVM-based methods, are presented to learn the mod-els with different example creation processes from the training set. Finally, the potential answerers are ranked using the trained mod-els. Extensive experiments conducted on a real world CQA da-taset from Stack Overflow show that our proposed two methods can both outperform the traditional query likelihood language model (QLLM) as well as the state-of-the-art Latent Dirichlet Allocation based model (LDA). Specifically, the RankingSVM-based method achieves statistical significant improvements over the SVM-based method and has gained the best performance.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值