前言
NTK-ALiBi原理:NTK-ALiBi:通过插值实现大模型ALiBi位置编码的长文本外推
代码实现
打开百川模型文件夹中的modeling_baichuan.py
1、增加build_dynamically_alibi_tensor
函数:
def build_dynamically_alibi_tensor(num_heads, max_pos) -> torch.Tensor:
"""Psuedo code for Dynamic NTK-ALiBi."""
# dynamic ntk factor according to actual sequence length
a0 = 1.0
train_seq_len = 4096
a = a0 * torch.tensor(max_pos) / train_seq_len # [batch, 1]
a = a.masked_fill(a < 1.0, 1.0) # dynamic step 1: dynamic ntk scaling factor
scale = a ** (1.0 / (num_heads-1)) # dynamic step 2: coefficient b, for computation convenience
closest_power_of_2 = 2 ** math.floor(math.log2(num_heads))
base = torch.tensor(
2 ** (-(2