pos_embedding是一个三维向量,之后看到pos_embedding[:, :(N + 1)],就比较懵逼,所以写个代码比较一下吧。
import torch
from torch import nn
pos_embedding = nn.Parameter(torch.randn(1, 64+1, 768))
#N就是64,Parameter就是构建可训练的参数矩阵
print(pos_embedding)
b = pos_embedding[:, :(64 + 1)]
print(b)
结果如下所示:
Parameter containing:
tensor([[[ 1.5456, 1.1878, 0.6838, ..., 0.5616, -0.0925, -0.9745],
[ 0.8832, -0.9164, -0.0560, ..., -0.5402, -0.0292, -0.2129],
[ 1.2765, 0.7601, 1.4828, ..., 1.2544, -0.4517, -2.1844],
...,
[-0.4512, 0.3073, 0.0606, ..., -1.3688, 0.3718, 0.7764],
[ 0.9134, -1.5110, 0.3739, ..., -1.4267, -0.9200, 2.0078],
[ 0.1594, 1.4138, -1.7815, ..., 0.2668, 1.0543, -0.7126]]],
requires_grad=True)
tensor([[[ 1.5456, 1.1878, 0.6838, ..., 0.5616, -0.0925, -0.9745],
[ 0.8832, -0.9164, -0.0560, ..., -0.5402, -0.0292, -0.2129],
[ 1.2765, 0.7601, 1.4828, ..., 1.2544, -0.4517, -2.1844],
...,
[-0.4512, 0.3073, 0.0606, ..., -1.3688, 0.3718, 0.7764],
[ 0.9134, -1.5110, 0.3739, ..., -1.4267, -0.9200, 2.0078],
[ 0.1594, 1.4138, -1.7815, ..., 0.2668, 1.0543, -0.7126]]],
grad_fn=<SliceBackward>)
肉眼一看,不是完全没有变化么,但也不能确定中间省略号的部分也完全相同,验证一下:
print(pos_embedding.equal(b))#验证两个tensor矩阵是否相同
结果,就是没有啥变化,数值上确实没啥变化,说不定是属性梯度上的变化,俺也不知道。
True
结论,没有变化,纯纯废话文学。