对于RankSVM的一点理解(没解释明白,求大佬点拨)

RankSVM的原始形式:

[1]

对比SVM的原始形式:

[2]

假设yi=1,则RankSVM与SVM的不同之处就在约束条件中的核函数部分,前者意思为hi-hj,后者为hi。我们假设h为训练模型所得的决策函数。

我一直在思考的问题是RankSVM所要最大化的距离是哪一段距离,在An efficient method for learning nonlinear ranking SVM functions这篇论文,他所给出的结论:RankSVM所要最大化的距离是最近的样本对到超平面的距离[3]。这样能使得相邻f(x)之间的间隔尽可能的远,在较大范围内能够保证两者之间的大小关系。

我是这样理解这一结论的,我们可以将fi(xi)-fi(xj)看作一个整体,也就是一个向量,这样就能代入SVM中进行理解了。通过下图更容易理解。

由于上面RankSVM的公式中的P是指yi-yj>0的样本对集,所以我们只有正的样本例。图中的x1-x2我们可以理解为fi(x1)-fi(x2),我们当他是一个新的向量。那么RankSVM可以被理解为用正样本训练出来的SVM(这样好像训练不出SVM...)。因此,把上述RankSVM的原始公式改成如下形式(完全仿照SVM的形式,其中yij表示sgn(yi-yj),这样就会出现正样本和负样本了):

[4]

RankSVM最大化的间隔是超平面到最近的fi(xi)-fi(xj)的距离,也就是上图中的x1-x2,以及另外两个靠近虚线的矩形。求出满足以上约束的w之后,决策函数h(x) = w*fi(x)也出来了。

参考文献:

[1] T.-M. Kuo, C.-P. Lee and C.-J. Lin. Large-scale Kernel RankSVM . SIAM International Conference on Data Mining, 2014.

[2] Nello Cristianini, John Shawe-Taylo. 支持向量机导论[M]. 电子工业出版社, 2004.

[3] Yu H ,  Kim J ,  Kim Y , et al. An efficient method for learning nonlinear ranking SVM functions[J]. Information ences, 2012, 209(none):37–48.

[4] Runarsson T P . Ordinal Regression in Evolutionary Computation[C]// 2006:1048-1057.

This paper focuses on the problem of Question Routing (QR) in Community Question Answering (CQA), which aims to route newly posted questions to the potential answerers who are most likely to answer them. Traditional methods to solve this problem only consider the text similarity features between the newly post-ed question and the user profile, while ignoring the important statistical features, including the question-specific statistical fea-ture and the user-specific statistical features. Moreover, tradition-al methods are based on unsupervised learning, which is not easy to introduce the rich features into them. This paper proposes a general framework based on the learning to rank concepts for QR. Training sets consist of triples (q, asker, answerers) are first col-lected. Then, by introducing the intrinsic relationships between the asker and the answerers in each CQA session to capture the intrinsic labels/orders of the users about their expertise degree of the question q, two different methods, including the SVM-based and RankingSVM-based methods, are presented to learn the mod-els with different example creation processes from the training set. Finally, the potential answerers are ranked using the trained mod-els. Extensive experiments conducted on a real world CQA da-taset from Stack Overflow show that our proposed two methods can both outperform the traditional query likelihood language model (QLLM) as well as the state-of-the-art Latent Dirichlet Allocation based model (LDA). Specifically, the RankingSVM-based method achieves statistical significant improvements over the SVM-based method and has gained the best performance.
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值