Stochastic Optimization of Text Set Generation for Learning Multiple Query Intent Representations
Foreword
- This is a short paper from
CIKM 2022
.So, there are some details not mentioned in this paper - There are many papers that have not been shared. More papers can be found in: ShiyuNee/Awesome-Conversation-Clarifying-Questions-for-Information-Retrieval: Papers about Conversation and Clarifying Questions (github.com)
Abs
Learning multiple intent representations for queries has potential applications in facet generation, document ranking, search result diversification,and search explanation. The state-of-the-art model for this task assumes that there is a sequence of intent representations. In this paper, we argue that the model should not be penalized as long as it generates an accurate and complete set of intent representations. Based on this intuition,we propose a stochastic permutation invariant approach for optimizing such networks. We extrinsically evaluate the proposed approach on a facet generation task and demonstrate significant improvements compared to competitive baselines. Our analysis shows that the proposed permutation invariant
approach has the highest impact on queries with more potential intents.
Intro
NMIR
ignores the premutation invariance nature of query intents(loss function)
- it assumes that the query intents should be generated as a sequence
We propose PINMIR
, looks at the query intents as a set rather than a sequence(use permutation invariant loss)
- permutation invariant loss often consider all possible permutations of the predicted output
- computationally inefficient
- propose a stochastic variation of out permutation invariant loss
The Permutation Invariant NMIR
Limitations of NMIR
- uses the cross-entropy loss function of seq2seq model, thus expects that the predictions follow the same order as the ground truth.
- uses a greedy algorithm for assigning each cluster to a ground truth query intent during training. Therefore, the model’s performance depends on this heuristic cluster-intent assignment algorithm
In PINMIR
, we no longer need the intent-cluster matching algorithm, since the order of generated intents do not matter.
- A side benefit: in reality, sometimes documents address more than one query intent and assigning only one intent to a document would be sub-optimal(If we also use one cluster to generate one intent description, i think there is no such benefit)
Note: just no longer need intent-cluster mapping, not no longer need clustering! (my personal view)
- we don’t care the order of generated facet descriptions, so we can use one document cluster to generate any facet description
Loss Function
First, we need to define a permutation invariant loss function for training the model
Common permutation invariant loss functions include Chamfer loss and Hungarian loss
- Chamfer loss is based on Chamfer distance and it’s not applicable to our work due to the design of decoder for text generation
We extend the Hungarian loss for text set generation. The proposed loss function for a query q i q_i qi is:
- π ( F i ) \pi(F_i) π(Fi) denotes all permutations of ground truth intents for query q i q_i qi
- v v v denotes the encoder representation
- L C E L_{CE} LCE is the average seq2seq loss for generating each facet description
- quite expensive to compute
Propose to use a stochastic variation of this loss that instead of iterating over all possible permutations, takes 𝑠 samples from the permutation set and computes the loss based on the sampled query intent sequences
Position Resetting:
Although the order does not matters between each intent description, it matters within the intent description
we modify the standard decoder architecture in transformer.
- The decoder generates tokens one-by-one and each token becomes the decoder’s input for generating the next token.(modify?)
- we reset the position embedding of the decoder for every new intent description(the position embedding of the decoder for every intent description is equal)
- Thus, the decoder representations for every permutation of a given set of intents would be identical
Experiment
Data: MIMICS-Click
- The top retrieved documents in response to each query is obtained by the Bing’s public web search API.
- Only use the documents’ snippets to represent a document
variable / max: the number of facets for each query
- The improvements in terms of exact match are marginal, while we observe significant improvements for term overlap
F1
,BLEU 4-gram
, andSet BERT-Score
(variable) - The improvements are statistically significant in nearly all cases, except for term overlap recall and Set BERT-Score recall.(max)
- The permutation invariant model has higher impacts on the queries with more intents.