swing transformer中相对位置编码理解

十末.

已于 2024-08-22 21:47:24 修改

阅读量205

点赞数 2

文章标签： transformer 深度学习人工智能

于 2024-08-22 21:46:43 首次发布

本文链接：https://blog.csdn.net/qq_44646994/article/details/141438783

版权

# define a parameter table of relative position bias
self.relative_position_bias_table = nn.Parameter(
	torch.zeros((2 * window_size[0] - 1) * (2 * window_size[1] - 1), num_heads))  # 2*Wh-1 * 2*Ww-1, nH

# https://blog.csdn.net/weixin_40723264/article/details/127632545 the position embedding are added to the attention score
# get pair-wise relative position index for each token inside the window
coords_h = torch.arange(self.window_size[0]) # 纵坐标
coords_w = torch.arange(self.window_size[1]) # 横坐标
coords = torch.stack(torch.meshgrid([coords_h, coords_w]))  # 2, Wh, Ww
coords_flatten = torch.flatten(coords, 1)  # 2, Wh*Ww
relative_coords = coords_flatten[:, :, None] - coords_flatten[:, None, :]  # 2, Wh*Ww, Wh*Ww 横坐标之间的差值 以及 纵坐标之间的差值
relative_coords = relative_coords.permute(1, 2, 0).contiguous()  # Wh*Ww, Wh*Ww, 2
relative_coords[:, :, 0] += self.window_size[0] - 1  # shift to start from 0 纵坐标之间的差值最大是 windows_size[0] - 1 取值范围是 [-(m-1), (m-1)] -> [0, 2(m-1)]
relative_coords[:, :, 1] += self.window_size[1] - 1  # 横坐标坐标之间的差值最大是 windows_size[0] - 1
relative_coords[:, :, 0] *= 2 * self.window_size[1] - 1 # 让每一行 行列差值相加具有唯一性 由于两者的取值范围都是 [0, 2(m-1)], 所以相加会出现相等的现象
relative_position_index = relative_coords.sum(-1)  # Wh*Ww, Wh*Ww 最大值为 2(m-1)(2m-1)+2(m-1) 最小值为 0 因此一共有 2(m-1)(2m-1)+2(m-1)+1个元素 = (2m-1)(2m-1) = relative_position_bias_table.shape[0]
self.register_buffer("relative_position_index", relative_position_index)