【注意力MHA,MQA,GQA,MLA】

最新推荐文章于 2024-10-02 08:00:00 发布

Ai君臣

最新推荐文章于 2024-10-02 08:00:00 发布

阅读量1k

点赞数 22

文章标签：网络

本文链接：https://blog.csdn.net/liuchenbaidu/article/details/140850425

版权

注意力机制优化简明图解

1. 多头注意力（MHA）

图示：

Input --> [Attention Head 1]
       --> [Attention Head 2]
       --> [Attention Head 3]
       --> ...
       --> [Attention Head N]
       --> [Concatenate] --> Output

公式：

$\text{Output} = \text{Concat}(\text{head}_1, \text{head}_2, \ldots, \text{head}_N)$
$\text{head}_i = \text{Attention}(Q, K, V)$

2. 多查询注意力（MQA）

图示：

Input --> [Shared Keys & Values]
       --> [Attention Head 1]
       --> [Attention Head 2]
       --> [Attention Head 3]
       --> ...
       --> [Concatenate] --> Output

公式：
$\text{Output} = \text{Concat}(\text{head}_1, \text{head}_2, \ldots, \text{head}_N)$
$\text{head}_i = \text{Attention}(Q, K_{\text{shared}}, V_{\text{shared}})$

3. 分组查询注意力（GQA）

图示：

Input --> [Attention Group 1]
       --> [Attention Group 2]
       --> ...
       --> [Concatenate] --> Output

公式：
$\text{Output} = \text{Concat}(\text{group}_1, \text{group}_2, \ldots, \text{group}_M)$
$\text{group}_j = \text{Attention}(Q_{\text{group}_j}, K_{\text{group}_j}, V_{\text{group}_j})$

4. 多头潜在注意力（MLA）

图示：

Input --> [Compressed Keys & Values]
       --> [Attention Head 1]
       --> [Attention Head 2]
       --> [Attention Head 3]
       --> ...
       --> [Concatenate] --> Output

公式：
$\text{Output} = \text{Concat}(\text{head}_1, \text{head}_2, \ldots, \text{head}_N)$
$\text{head}_i = \text{Attention}(Q, K_{\text{compressed}}, V_{\text{compressed}})$