SIGIR 2016 Improving Language Estimation with the Paragraph Vector Model for Ad-hoc Retrieval

中文简介:本文对如何基于Paragraph Vector model改进Ad-hoc Retrieval task进行了分析,主要针对IR的场景提出了对PV model的三方面的改进。实验表明,改进后的模型进行检索的效果超过了基于topic model增强的LM的效果。
论文出处:SIGIR'16

英文摘要:Incorporating  topic  level  estimation  into  language  models has  been  shown  to  be beneficial for  information  retrieval(IR) models such as cluster-based retrieval and LDA-based document representation.  Neural embedding models,  such as paragraph vector (PV) models, on the other hand have shown their eeffectiveness and efficiency in learning semantic representations of documents and words in multiple Natural  Language  Processing  (NLP)  tasks.   However,  their  effectiveness in information retrieval is mostly unknown.  In this  paper,  we  study  how  to  effectively  use  the  PV  model to  improve  ad-hoc  retrieval.   We  propose  three  major  improvements over the original PV model to adapt it for the IR scenario:  (1) we use a document frequency-based rather than the corpus frequency-based negative sampling strategy so that the importance of frequent words will not be sup-pressed excessively; (2) we introduce regularization over the document  representation  to  prevent  the  model  over tting short documents along with the learning iterations; and (3) we employ a joint learning objective which considers both the  document-word  and  word-context  associations  to  produce better word probability estimation.  By incorporating this enhanced PV model into the language modeling frame-work, we show that it can significantly outperform the state-of-the-art topic enhanced language models

下载链接:https://ciir-publications.cs.umass.edu/pub/web/getpdf.php?id=1227

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值