论文链接:https://aclanthology.org/2021.acl-short.107.pdf
Hi-Transformer: Hierarchical Interactive Transformer for Efficient and Effective Long Document Modeling
Abstract
因为输入文本长度的复杂性,Transformer难以处理长文档。
为此,提出一种分层交互式的HI-Transformer模型对长文档进行建模。
Model
整体架构如图所示:
首先经过sentence Transformer来学习每个句子的语义表示;然后结合句子的位置信息,经过Document Transformer,得到对整个文档建模的句子语义信息和Document context-aware的句子表示;然后经过sentence Transformer来增强全局上下文句子建模,得到Global contenxt-aware sentence embedding;最后经过池化(pooling)得到document embedding。
Experiments
Datasets
three benchmark document modeling datasets:
The first one is Amazon Electronics (He and McAuley, 2016)(denoted as Amazon), which is for product review rating prediction.
The second one is IMDB (Diao et al., 2014), a widely used dataset for movie re-view rating prediction.
The third one is the MIND dataset (Wu et al., 2020c), which is a large-scale dataset for news intelligence.
此外还研究了文本长度对模型性能和计算成本的影响,对比Transformer和Hi-Transformer
实验证明HI-Transformer效果更好,对长序列的性能更好。