数学公式与代码
RMSNorm是在Layer Norm之上的改进,它通过舍弃中心不变性来降低计算量。
a ‾ i = a i R M S ( a ) g i 其中, R M S ( a ) = 1 n ∑ r = 1 n a i 2 \overline a_i = \frac {a_i}{RMS(a)} g_i \\ 其中,RMS(a)=\sqrt { { \frac1n}{\sum_{r=1}^n a_i^2}} ai=RMS(a)aigi其中,RMS(a)=n1r=1∑nai2
class LlamaRMSNorm(nn.Module):
def __init__(self, hidden_size, eps=1e-6):
"""
LlamaRMSNorm is equivalent to T5LayerNorm
"""
super().__init__()
self.weight = nn.Parameter(torch.ones(hidden_size))
self.variance_epsilon = eps
def forward(self, hidden_states):
# 求RMS(a)
variance = hidden_states.to(torch.float32).pow(2).mean(-1, keepdim=True)
# 计算 ai/RMS(a)
hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon)
# convert into half-precision if necessary
if self.weight.dtype in [torch.float16, torch.bfloat16]:
hidden_states = hidden_states.to(self.weight.dtype)
# 计算 ai/RMS(a) * gi
return self.weight * hidden_states
torch.rsqrt(input, *, out=None) 函数
针对输入input的每个元素的平方根的倒数来返回一个新的Tensor。
o u t i = 1 i n p u t i out_i =\frac1{\sqrt {input_i}} outi=inputi1
- Example
x = torch.tensor([1,4,9,16])
torch.rsqrt(x)
- Result
tensor([1.0000, 0.5000, 0.3333, 0.2500])