Linformer: Self-Attention with Linear Complexity
Abstract
Because of the standard self-attention mechanism of transformer uses O(n2)O(n^2)O(n2) time and space with respect to sequence length, this paper demonstrates that self-attention mechanism can be approximated by a low-rank matrix and further proposes a new self-attention mechanism by introducing 2 projection matrices, which reduces the overall complexity to O(n)O(n)

最低0.47元/天 解锁文章
425

被折叠的 条评论
为什么被折叠?



