文章用来记自己对一些初次遇见的函数的理解,可能有所错误,如果路过的各位能批评指正感激不尽。
unfold的基本操作与卷积类似,区别在于unfold是将内部核每次扫过的区域提取出来而不是进行卷积运算。
如下面的代码
import torch
import torch.nn as nn
a = torch.rand(1,2,2,3)
print(a)
unfold = nn.Unfold(kernel_size=(2,2), padding = 0, stride= 1, dilation=1)
b = unfold(a)
print(b)
print(b.shape)
print(b.permute(0,2,1))
输出的结果如下
tensor([[[[0.7707, 0.4548, 0.6241],
[0.2297, 0.3549, 0.5207]],
[[0.8333, 0.0889, 0.6373],
[0.2227, 0.5256, 0.2432]]]])
tensor([[[0.7707, 0.4548],
[0.4548, 0.6241],
[0.2297, 0.3549],
[0.3549, 0.5207],
[0.8333, 0.0889],
[0.0889, 0.6373],
[0.2227, 0.5256],
[0.5256, 0.2432]]])
torch.Size([1, 8, 2])
tensor([[[0.7707, 0.4548, 0.2297, 0.3549, 0.8333, 0.0889, 0.2227, 0.5256],
[0.4548, 0.6241, 0.3549, 0.5207, 0.0889, 0.6373, 0.5256, 0.2432]]])
变换维度前理解可能稍微有点麻烦,变换后就可以很清楚的看出是怎么进行unfold的,2x2的核先扫过第一块区域得到0.7707~0.5256这一块向量(长度为8实际上是原来两个长度为4的向量按channel维度的顺序拼接在了一起)。这也是为什么输出是
[batch_size,channel * kernel_size0 * kernel_size1, L]