【Llama源码】归一化RMSNorm

最新推荐文章于 2024-09-03 07:58:16 发布

甄天真学AI

最新推荐文章于 2024-09-03 07:58:16 发布

阅读量750

点赞数 13

分类专栏： LLM 文章标签： llama

本文链接：https://blog.csdn.net/OldButSimple/article/details/139289550

版权

LLM 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

数学公式与代码

RMSNorm是在Layer Norm之上的改进，它通过舍弃中心不变性来降低计算量。

$\overline a_i = \frac {a_i}{RMS(a)} g_i \\ 其中，RMS(a)=\sqrt { { \frac1n}{\sum_{r=1}^n a_i^2}}$

class LlamaRMSNorm(nn.Module):
    def __init__(self, hidden_size, eps=1e-6):
        """
        LlamaRMSNorm is equivalent to T5LayerNorm
        """
        super().__init__()
        self.weight = nn.Parameter(torch.ones(hidden_size))
        self.variance_epsilon = eps

    def forward(self, hidden_states):
    	# 求RMS(a)
        variance = hidden_states.to(torch.float32).pow(2).mean(-1, keepdim=True)
        # 计算 ai/RMS(a) 
        hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon) 

        # convert into half-precision if necessary
        if self.weight.dtype in [torch.float16, torch.bfloat16]:
            hidden_states = hidden_states.to(self.weight.dtype)
		# 计算 ai/RMS(a) * gi
        return self.weight * hidden_states

torch.rsqrt(input, *, out=None) 函数

针对输入input的每个元素的平方根的倒数来返回一个新的Tensor。

$out_i =\frac1{\sqrt {input_i}}$

Example

x = torch.tensor([1,4,9,16])
torch.rsqrt(x)

Result

tensor([1.0000, 0.5000, 0.3333, 0.2500])

参考链接

llama 代码详读 - RMSNorm
RMSNorm的原理和代码

甄天真学AI

关注

13
点赞
踩
12

收藏

觉得还不错? 一键收藏
0
评论
【Llama源码】归一化RMSNorm

RMSNorm是在Layer Norm之上的改进，它通过舍弃中心不变性来降低计算量。针对输入input的每个元素的平方根的倒数来返回一个新的Tensor。
复制链接

扫一扫

专栏目录