复杂度(Complexity) Self-Attention复杂度 A t t e n t i o n ( Q , K , V ) = S o f t m a x ( Q K T d ) V Attention(Q,K,V) = Softmax(\frac{QK^{T}}{\sqrt{d}})V Attention(Q,K,V)=Softmax(d QKT)V 线性Attention Transformer应用 MSA、W-MSA