【论文阅读 CIKM‘2021】Learning Multiple Intent Representations for Search Queries

Original Paper

Learning Multiple Intent Representations for Search Queries

More related papers can be found in :

Motivation

The typical use of representation models has a major limitation in that they generate only a single representation for a query, which may have multiple intents or facets.

  • propose NMIR(Neural Multiple Intent Representations) to support multiple intent representations for each query

Method

Task Description and Problem Formulation

  • training query set: Q = { q 1 , ⋯   , q n } Q = \{q_1,\cdots,q_n\} Q={q1,,qn}
  • D i = d i 1 , ⋯   , d i m D_i = {d_{i1},\cdots,d_{im}} Di=di1,,dim be the top m retrieved documents in response to the query q i q_i qi
  • F i = { f i 1 , ⋯   , f i k } F_i=\{f_{i1},\cdots,f_{ik}\} Fi={fi1,,fik} denote the set of all textual intent descriptions associated with the query q i q_i qi
    • k i k_i ki is the number of query intents

NMIR Framework: A High-Level Overview

  • one straightforward solution:
    • using an encoder-decoder architecture
      • input: query q i q_i qi
      • output: generates multiple query intent descriptions of the query by taking the top k i k_i ki most likely predictions
    • drawback: These generations are often synonyms or refer to the same concept
  • another straightforward solution:
    • task as a sequence-to-sequence problem
      • input: query q i q_i qi
      • output: generate all the query intent descriptions concatenated with each other(like translation)
    • drawback:
      • different intent representations are not distinguishable in the last layer of the model.
      • most existing effective text encoding models are not able to represent long sequences of tokens, such as a concatenation of the top 𝑚 retrieved documents

NMIR Framework:

  • 𝜙 (·) and 𝜓 (·) denote a text encoder and decoder pair

Step1: NMIR assigns each learned document representation to one of the query intent descriptions f i j f_ij fij ∈ 𝐹𝑖 using a document-intent matching algorithm 𝛾:

在这里插入图片描述

  • C i ∗ C_i^* Ci is a set of documents and each C i j ∗ C_{ij}^* Cij is a set of documents form D i D_i Di that are assigned to f i j f_{ij} fij by 𝛾.

Step2: NMIR then transforms the encoded general query representation to its intent representations through a query intent encoder 𝜁.

  • the representation for the j t h j^{th} jth​ query intent is obtained using 𝜁 (𝑞𝑖 , C i j ∗ C_{ij}^* Cij ;𝜙).

Train: training for a mini-batch 𝑏 is based on a gradient descent-based minimization:

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

  • q i j ∗ q_{ij}^* qij​ is a concatenation of the query string, the first 𝑗 −1 intent descriptions, and 𝑘𝑖 − 𝑗 mask tokens
    • given the associated cluster C i j ∗ C_{ij}^* Cij and the encoded query text plus the past 𝑗−1 intent descriptions.
    • helps the model avoid generating the previous intent representations and learn widely distributed representations

where L C E L_{CE} LCE is the cross-entropy loss

在这里插入图片描述

  • f i j t f_{ijt} fijt is the t t h t^{th} tth token in the given intent description f i j f_{ij} fij.

Inference: q i j ∗ q_{ij}^* qij s are constructed differently.

  • first feed“𝑞𝑖 …” to the model and apply beam search to the decoder’s output to obtain
    the first intent description f i 1 f_{i1} fi1'.
  • then use the model’s output to iteratively create the input for the next step “𝑞𝑖 f i 1 f_{i1} fi1’ …”and repeat this process

Model Implementation and Training

在这里插入图片描述

Figure1(a) represents the model architecture.

  • use Transformer encoder and decoder architectures(pre-trained BART) for implementing 𝜙 and𝜓, respectively

The intent encoding component 𝜁 : use N ′ N' N layers Guided Transformer model

  • Guided Transformer is used for influencing an input representation by the guidance of some external information.
    • we use 𝜙 ( q i j q_{ij} qij ) as the input representation and 𝜙 (𝑑) :∀𝑑 ∈ C i j ∗ C_{ij}^* Cij as the external information.

The document-intent matching component 𝛾 : develop a clustering algorithm

  • encodes all the top retrieved documents and creates k i k_i ki clusters, using a clustering algorithm(use K-Means).

    在这里插入图片描述

    • C i j = { C i 1 , ⋯   , C i k i } C_{ij} = \{C_{i1},\cdots,C_{ik_i}\} Cij={Ci1,,Ciki}​ denotes a set of clusters and each C i j C_{ij} Cij contains all the documents in the 𝑗 th cluster associated with the query 𝑞𝑖 .
    • M i = { μ i 1 , ⋯ μ i k i } M_i=\{\mu_{i1},\cdots\mu_{ik_i}\} Mi={μi1,μiki}is a set of all cluster centroids such that μ i j \mu_{ij} μij = centroid( C i j C_{ij} Cij).
  • K-Means requires the number of clusters as input.

    • consider two cases at inference time
      • assume the number of clusters is equal to a tuned hyper-parameter 𝑘∗ for all queries
      • replace the K-Means algorithm by a non-parametric version of K-Means
  • Issue: The component 𝛾 requires a one-to-one assignment between the cluster centroids and the query intents in the training data, all clusters may be assigned to a single most dominant query intent. So we use the intent identification function I:

    • my view: the problem is how to assign centroids to query intents after clustering.

    在这里插入图片描述

  • output:

    在这里插入图片描述

𝛾 is not differentiable and cannot be part of the network for gradient descent-based optimization. We move it to an asynchronous process as figure1(b) below:

在这里插入图片描述

Asynchronous training: use asynchronous training method to speed up(the clustering of document representations is an efficiency bottleneck) described as figure1(b)

Data

  • training data: We follow a weak supervision solution based on the MIMIC-Click dataset, recently released by Zamani et al. MIMICS: A Large-Scale Data Collection for Search Clarification
  • evaluation data: Qulac dataset
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
"No matching distribution found for pacs"是一个Python包安装时出现的错误信息。根据引用中的描述,这个错误可能是由于无法找到与pacs匹配的可用发行版造成的。这可能是因为pacs包可能不存在或者没有与您的操作系统和Python版本兼容的版本。要解决此问题,我建议您尝试以下方法: 1. 确认是否输入了正确的包名称。请确保拼写正确并且大小写与包的名称匹配。 2. 检查您的Python环境。请确保您的Python版本与pacs包的要求相匹配。您可以使用`python --version`命令来检查您当前正在使用的Python版本。 3. 查找替代包。如果没有找到与pacs匹配的发行版,您可以尝试寻找替代包或其他具有类似功能的包。 4. 检查您的网络连接。有时候这个错误可能是由于网络问题导致的。请确保您的网络连接正常,然后重新尝试安装pacs包。 如果上述方法仍然无法解决问题,我建议您在搜索引擎或Python开发者社区中寻求帮助。<span class="em">1</span><span class="em">2</span><span class="em">3</span> #### 引用[.reference_title] - *1* *2* [今日arXiv精选 | 34篇顶会论文CIKM/ ACL/ Interspeech/ ICCV/ ACM MM](https://blog.csdn.net/c9Yv2cf9I06K2A9E/article/details/119833562)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_2"}}] [.reference_item style="max-width: 50%"] - *3* [python用pip安装selenium过程中的问题](https://download.csdn.net/download/weixin_38748555/13745263)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_2"}}] [.reference_item style="max-width: 50%"] [ .reference_list ]

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

长命百岁️

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值