nn.Conv1d()
或许对你有帮助的网址(conv1d与conv2d的区别,为确保阅读没有问题,建议科学上网)
下面举个简单的计算例子分析nn.Conv1d() (kerner_size=1, stride=1) 是如何进行计算的,以及他与nn.Linear()的区别
>>> a #这里定义一个输入矩阵:3 token, each token‘s dimension is 6(6个feature)
tensor([[[ 0.7000, 0.2000, -0.5000, 0.6000, 0.3000, -0.5000],
[ 0.4000, -0.1000, 0.4000, 0.3000, -0.1000, 0.4000],
[ 0.5000, 0.1000, 0.1000, 0.5000, 0.2000, 0.1000]]])
>>> a.shape
torch.Size([1, 3, 6])
>>> conv1 = nn.Conv1d(6, 8, kernel_size=1, bias=False)
>>> conv1.weight.shape
torch.Size([8, 6, 1])
>>> conv1.weight
Parameter containing:
tensor([[[ 0.2758],
[-0.0141],
[ 0.1317],
[ 0.0359],
[-0.3211],
[-0.0772]],
[[ 0.2688],
[ 0.3090],
[ 0.3472],
[-0.2065],
[ 0.3341],
[ 0.3242]],
[[-0.3243],
[-0.0492],
[ 0.0593],
[-0.2374],
[-0.1905],
[ 0.2047]],
[[-0.3278],
[ 0.3429],
[-0.2414],
[-0.0237],
[ 0.2726],
[ 0.3205]],
[[-0.0105],
[ 0.2715],
[-0.3571],
[-0.2210],
[-0.3023],
[-0.1792]],
[[-0.3150],
[ 0.3326],
[-0.3559],
[-0.1205],
[ 0.2597],
[ 0.3533]],
[[ 0.3359],
[-0.2941],
[-0.3177],
[-0.3294],
[ 0.2743],
[ 0.0651]],
[[-0.2750],
[-0.2631],
[-0.2080],
[ 0.1378],
[ 0.3089],
[-0.2562]]], requires_grad=True)
>>> conv1(a.transpose(1, 2)).transpose(1, 2) #attention, the difference between conv1d and linear [1, 3, 6] --> [1, 3, 8]
tensor([[[ 0.0882, -0.1094, -0.5685, -0.1329, 0.0917, -0.1470, 0.1873,
0.1623],
[ 0.1764, 0.2498, -0.0714, -0.1681, -0.2819, -0.2224, -0.0635,
-0.2589],
[ 0.0957, 0.1960, -0.2975, -0.0790, -0.2027, -0.1328, 0.0035,
-0.0796]]], grad_fn=<TransposeBackward0>)
>>> conv1(a.transpose(1, 2)).transpose(1, 2).shape
torch.Size([1, 3, 8])
下面以求输出矩阵中的第一个元素为例子, 具体说明如何进行运算:
>>这是a的第一行数据
0.7000 | 0.2000 | -0.5000 | 0.6000 | 0.3000 | -0.5000 |
---|
>>这是a与conv1.weight做乘法时对应的conv1.weight的数据
0.2758 |
---|
-0.0141 |
0.1317 |
0.0359 |
-0.3211 |
-0.0772 |
这两部分相乘后求和可以求出第一个元素为0.0882(bias=0),如下
>>> 0.7 * 0.2758 + 0.2 * (-0.0141) + (-0.5) * 0.1317 + 0.6 * 0.0359 + 0.3 * (-0.3211) + (-0.5) * (-0.0772)
0.08819999999999997
下面对比一下conv1d与linear的区别,先说结论,当kernel_size = 1, stride = 1时, conv1d的效果与linear的效果可以相同(但需要注意进行transpose), 对比linear与conv1d的输出,发现相同的权重与输入对应的输出是相同的, 也就是在transformer中的feed forward模块中二者是可以相互替代的
>>> fc = nn.Linear(6, 8, bias=False) #initialize a linear layer
#then, we assign the conv1.weight.data above to fc.weight.data
>>> fc.weight.data = conv1.weight[:, :, 0]
>>> fc.weight.data
tensor([[ 0.2758, -0.0141, 0.1317, 0.0359, -0.3211, -0.0772],
[ 0.2688, 0.3090, 0.3472, -0.2065, 0.3341, 0.3242],
[-0.3243, -0.0492, 0.0593, -0.2374, -0.1905, 0.2047],
[-0.3278, 0.3429, -0.2414, -0.0237, 0.2726, 0.3205],
[-0.0105, 0.2715, -0.3571, -0.2210, -0.3023, -0.1792],
[-0.3150, 0.3326, -0.3559, -0.1205, 0.2597, 0.3533],
[ 0.3359, -0.2941, -0.3177, -0.3294, 0.2743, 0.0651],
[-0.2750, -0.2631, -0.2080, 0.1378, 0.3089, -0.2562]])
>>> fc.weight.data.shape
torch.Size([8, 6])
>>> fc(a) #aW^T [1, 3, 6] --> [1, 3, 8]
tensor([[[ 0.0882, -0.1094, -0.5685, -0.1329, 0.0917, -0.1470, 0.1873,
0.1623],
[ 0.1764, 0.2498, -0.0714, -0.1681, -0.2819, -0.2224, -0.0635,
-0.2589],
[ 0.0957, 0.1960, -0.2975, -0.0790, -0.2027, -0.1328, 0.0035,
-0.0796]]], grad_fn=<UnsafeViewBackward>)
>>> fc(a).shape
torch.Size([1, 3, 8])