文章目录
前言
\quad
文本匹配在信息检索、自动问答、对话系统当中有广泛的应用,这些任务都可以抽象成query和候选doc之间的匹配问题。工作期间我零零碎碎的去熟悉和掌握相关模型和方法,不过我还是觉得很有必要将这些东西系统的整理一遍。
\quad
web检索引擎整体流程:
\quad
标红的部分即为文本匹配所在的位置,可以说是整个检索引擎的最核心部分。
传统方法
\quad
传统的方法主要基于人工提取的特征,因此问题的焦点在于如何设置合适的文本匹配学习算法来学习到最优的匹配模型。
\quad
常用方法有:BM25、TF-IDF、偏最小二乘(PLS)、正则化隐空间映射(RMLS)、监督语义索引模型(SSI)、双语话题模型(BLTM)、统计机器翻译模型(SMT)。
深度文本匹配
\quad 与传统的机器学习方法相比,深度学习方法在四个方面有所改善:
- 利用神经网络获得更丰富的语义表示信息;
- 利用神经网络可以构建更加强大的文本匹配模型;
- 以端到端的方式学习表征和匹配函数;
- 多模态匹配,可以学习通用的语义空间来普遍表示不同模态的数据。
DSSM :
Learning Deep Structured Semantic Models for Web Search using Clickthrough Data
CDSSM:
A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval
ARC II:
Convolutional Neural Network Architectures for Matching Natural Language Sentences
CNTN:
Convolutional neural tensor network architecture for community-based question answering
LSTM-RNN:
MV-LSTM
A Deep Architecture for Semantic Matching with Multiple Positional Sentence Representations
MatchPyramid
Text Matching as Image Recognition
Match-SRNN
Match-SRNN: Modeling the Recursive Matching Structure with Spatial RNN
KNRM
End-to-End Neural Ad-hoc Ranking with Kernel Pooling
Conv-KNRM
Convolutional neural networks for soft-matching N-grams in ad-hoc search
DRMM
A Deep Relevance Matching Model for Ad-hoc Retrieval
Siamese-LSTM
Siamese Recurrent Architectures for Learning Sentence Similarity
Learning Text Similarity with Siamese Recurrent Networks
DAM
A Decomposable Attention Model for Natural Language Inference
ESIM
Enhanced LSTM for Natural Language Inference
DUET
Learning to Match using Local and Distributed Representations of Text for Web Search
BiMPM
Bilateral Multi-Perspective Matching for Natural Language Sentences
DIIN
Natural Language Inference Over Interaction Space(DIIN)
DRCN
Semantic Sentence Matching with Densely-connected Recurrent and Co-attentive Information
RE2
Simple and Effective Text Matching with Richer Alignment Features
DUA
Modeling Multi-turn Conversation with Deep Utterance Aggregation
BERT
基本任务之一:BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
长文本匹配解决方法:Simple Applications of BERT for Ad Hoc Document Retrieval