ESIM,简称 “Enhanced LSTM for Natural Language Inference“。顾名思义,一种专为自然语言推断而生的加强版 LSTM。至于它是如何加强 LSTM,请继续往下看:
Unlike the previous top models that use very complicated network architectures, we first demonstrate that carefully designing sequential inference models based on chain LSTMs can outperform all previous models.Based on this, we further show that by explicitly considering recursive architectures in both local inference modeling and inference composition,we achieve additional improvement.
上面一段话我摘选自ESIM论文的摘要,总结来说,ESIM 能比其他短文本分类算法牛逼主要在于两点:
- 精细的设计序列式的推断结构。
- 考虑局部推断和全局推断。
作者主要是用句子间的注意力机制(intra-sentence attention),来实现局部的推断,进一步实现全局的推断。
ESIM主要分为三部分:input encoding,local inference modeling 和 inference composition。如下图所示,ESIM 是左边一部分。
input encoding
没啥可说的,就是输入两句话分别接 embeding + BiLSTM。这里为什么不用最近流行的 BiGRU,作者解释是实验效