nn.Linear和nn.Conv2d_1x1等效验证

weixin_45703452

已于 2024-03-18 20:46:59 修改

阅读量719

点赞数 9

文章标签： pytorch ai 计算机视觉 python 卷积神经网络 transformer

于 2024-01-17 14:00:55 首次发布

本文链接：https://blog.csdn.net/weixin_45703452/article/details/135648685

版权

现在视觉transformer中，有些会把单个像素点视为一个token，在这样的粒度下进行transformer block的计算，其中一个比较重要的操作是nn.Linear，其和nn.Conv2d_1x1是等价的。验证代码如下：

import torch
import torch.nn as nn

b,c,h,w = 1,3,10,10
in_ch,out_ch = c,5


a = torch.randn(b,c,h,w)           # input:[b,c,h,w]

params = torch.randn(out_ch,in_ch) #预设的参数，输出为out_ch，输入为in_ch
bias = torch.randn(out_ch)         #bias

#线性层，将预设参数代入
linear = nn.Linear(in_ch,out_ch)
linear.weight.data = params.data
linear.bias.data = bias.data   
#1*1卷积层，将预设参数代入
conv = nn.Conv2d(in_ch,out_ch,1)
conv.weight.data = params.data.unsqueeze(-1).unsqueeze(-1) 
conv.bias.data = bias.data

#两个模块对a处理
a_l = linear(a.flatten(2).permute(0,2,1)).permute(0,2,1).view(b,out_ch,h,w) 
a_c = conv(a)

print(torch.allclose(a_l,a_c, atol=1e-10))
#输出为： True

图像的卷积本质上也是矩阵相乘，验证结果是符合预期的。
私以为现有的某些vision transformer可以不用写的太华丽，可以在原空间使用卷积实现self-attention和mlp，此外对layernorm进行修改，这样的话，写出来的transformer block中，self-attention=non-local，mlp=conv_1x1 Group。