ranksvm

rank问题目标是合理的排序。
最简单的是使用分类和回归来模拟,将每个url对应的特征和相关性打分作为样本进行学习(例如二分类),这属于pointwise方法,所选用模型例如Logistic regression,得分是一个0-1的概率。但这种方法没有考虑排序结果之间的相对关系。
为了考虑个体间相对关系,我们希望di>dj时,f(xi) > f(xj).这是pairwise的方法。
svm作为分类方法,拥有结构化风险最优的理论基础,实际应用中验证效果良好。
利用线性svm对rank问题求解,需要将样本变成同一个query下两个url的pair,样本对应特征变为两个样本差值。
对于特征是非线性的,需要特征预处理时进行shaping。
对于pair问题求解过程如下:
1.初始化w0(可以全为0)
2.for t=1 to T do
a)随机抽取pair(di,dj|di>dj)
b)w^t < -update(w^(t-1), xi-xj)
3.返回w参数
在随机抽样pair的时候,可以均匀抽样,也可以根据样本权重抽样,将惩罚因素融入其中。
参数更新最简单的是根据梯度直接更新:
w=w^(t-1)+η∇w=w^(t-1)+η(λw^(t1 )-(xi-xj ))
di>dj && 1>< w^(t-1), xi-xj>
η=1/λt
为了加快收敛,对w做scaling,||w||<1/√λ

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
This paper focuses on the problem of Question Routing (QR) in Community Question Answering (CQA), which aims to route newly posted questions to the potential answerers who are most likely to answer them. Traditional methods to solve this problem only consider the text similarity features between the newly post-ed question and the user profile, while ignoring the important statistical features, including the question-specific statistical fea-ture and the user-specific statistical features. Moreover, tradition-al methods are based on unsupervised learning, which is not easy to introduce the rich features into them. This paper proposes a general framework based on the learning to rank concepts for QR. Training sets consist of triples (q, asker, answerers) are first col-lected. Then, by introducing the intrinsic relationships between the asker and the answerers in each CQA session to capture the intrinsic labels/orders of the users about their expertise degree of the question q, two different methods, including the SVM-based and RankingSVM-based methods, are presented to learn the mod-els with different example creation processes from the training set. Finally, the potential answerers are ranked using the trained mod-els. Extensive experiments conducted on a real world CQA da-taset from Stack Overflow show that our proposed two methods can both outperform the traditional query likelihood language model (QLLM) as well as the state-of-the-art Latent Dirichlet Allocation based model (LDA). Specifically, the RankingSVM-based method achieves statistical significant improvements over the SVM-based method and has gained the best performance.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值