Ranking SVM 简介

Ranking SVM 简介

Learning to Rank

Learning to Rank(简称LTR)用机器学习的思想来解决排序问题(关于Learning to Rank的简介请见(译)排序学习简介)。LTR有三种主要的方法:PointWise,PairWise,ListWise。Ranking SVM算法是PairWise方法的一种,由R. Herbrich等人在2000提出, T. Joachims介绍了一种基于用户Clickthrough数据使用Ranking SVM来进行排序的方法(SIGKDD, 2002)。

Ranking SVM

我们可以学习得到一个分类器,例如SVM,来对对象对的排序进行分类并将分类器运用在排序任务中。这被Herbrich隐藏在Ranking SVM方法后的思想。

图1展示了一个排序问题的例子。假设在特征空间中存在两组对象(与两个查询相关联的文献)。进一步假设有三个等级(级别)。 例如,第一个组中的对象 x1 , x2 x3 分别有三个不同的级别。权重向量 ω 对应的线性函数 f(x)=ω,x 可以对对象进行评分并排序。使用排序函数对对象进行排序等价于将对象投影到向量,并根据投影向量对对象进行排序。 如果排序函数是‘优秀’,那么等级3的对象应该排在等级2的对象之前,以此类推。要注意属于不同组的对象之间不能进行比较。

Fig.1

Fig. 1 Example of Ranking Problem

Fig. 2

Fig. 2 Transformation to Pairwise Classification

图2显示了图1描述的排序问题可以被转化为线性的SVM分类问题。同一组中的两个特征向量之间的差别被作为新的特征向量对待, e.g.,x1x2,x1x3 , and x2x3 . 进一步,标签也被赋给了新的特征向量。例如, x1x2,x1x3 , and x2x3 为正数。同一级别的特征向量或者不同组的特征向量不会被组成新的特征向量。可以通过训练得到对图5中表示的新特征向量进行分类的线性SVM分类器。 几何学上,SVM模型的边缘表示两个等级对象对之间预测的最小间距。 注意到SVM分类器的分类超平面通过对应对的原点和正样本还有负样本。 例如 x1x2 and x2x1 代表正样本和负样本。SVM分类器的权重向量 ω

This paper focuses on the problem of Question Routing (QR) in Community Question Answering (CQA), which aims to route newly posted questions to the potential answerers who are most likely to answer them. Traditional methods to solve this problem only consider the text similarity features between the newly post-ed question and the user profile, while ignoring the important statistical features, including the question-specific statistical fea-ture and the user-specific statistical features. Moreover, tradition-al methods are based on unsupervised learning, which is not easy to introduce the rich features into them. This paper proposes a general framework based on the learning to rank concepts for QR. Training sets consist of triples (q, asker, answerers) are first col-lected. Then, by introducing the intrinsic relationships between the asker and the answerers in each CQA session to capture the intrinsic labels/orders of the users about their expertise degree of the question q, two different methods, including the SVM-based and RankingSVM-based methods, are presented to learn the mod-els with different example creation processes from the training set. Finally, the potential answerers are ranked using the trained mod-els. Extensive experiments conducted on a real world CQA da-taset from Stack Overflow show that our proposed two methods can both outperform the traditional query likelihood language model (QLLM) as well as the state-of-the-art Latent Dirichlet Allocation based model (LDA). Specifically, the RankingSVM-based method achieves statistical significant improvements over the SVM-based method and has gained the best performance.
评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值