- Hard Attention/Soft Attention[attention model survey](https://arxiv.org/abs/1601.06823)
- Self-Attention[1](https://arxiv.org/abs/1705.04304) [2](https://arxiv.org/abs/1703.03130) 类似 NMT的[coverage](https://arxiv.org/pdf/1601.04811.pdf), 也和ASR中的 (ARGS)[https://arxiv.org/abs/1506.07503] 相似 ,attention都是自回归
输出减少trigram(三元) 将attention用于decode
- Attention is all you need
- Global Attention/Local Attention/
- [monotonic Online Attention](https://arxiv.org/abs/1704.00784)