在阅读Transformer-XL : Attentive Language Models Beyond a Fixed-Length Context之前,建议先阅读Character-Level Language Modeling with Deeper Self-Attention
阅读笔记参考
了解https://blog.csdn.net/ZY_miao/article/details/112699941
这个讲的比较好https://blog.csdn.net/pingpingsunny/article/details/105056297
然后再读XL的论文
这个比较详细https://blog.csdn.net/Magical_Bubble/article/details/89060213
论文笔记 — Transformer-XL [更优秀的长文本编码器]_IndexFziQ CSDN-CSDN博客_文本编码器https://blog.csdn.net/sinat_34611224/article/details/93718378文献阅读笔记:Transformer-XL : Attentive Language Models Beyond a Fixed-Length Context_ljp1919的专栏-CSDN博客0.背景机构:CMU、谷歌大脑作者:Zihang Dai、Zhilin Yang发布地方:arxiv面向任务:Language Understanding论文地址:https://arxiv.org/abs/1901.02860论文代码:https://github.com/kimiyoung/transformer-xl0-1 摘要Transformer具有学习长程依赖关系的潜力...https://blog.csdn.net/ljp1919/article/details/94577523