Day 10: Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNe

最新推荐文章于 2021-12-13 17:49:28 发布

ttppss

最新推荐文章于 2021-12-13 17:49:28 发布

阅读量495

点赞数

分类专栏：论文研读文章标签：计算机视觉神经网络 mlp ieee论文卷积

本文链接：https://blog.csdn.net/ttppss/article/details/117172625

版权

论文研读专栏收录该内容

18 篇文章 3 订阅

订阅专栏

这是牛津发的一篇非常短小的报告，发布日期大约在2021年5月6日前后，很新。文章大意是在Transformer结构中，也许起到主要作用的并不是 attention，而是其它的东西，比如（也很可能是）由于图片块嵌入时引入的 inducive bias，以及仔细挑选处理过的训练增强（augmentation）。

本文内容不多，就目前看来，证据好像并不是非常充分。虽然在ViT-Base上把attention替换成MLP后，效果下降得不多，也许能部分说明 attention 并不是 Transformer 表现好的主要原因，但总感觉 ablation study 做得不够充分。不过文章也说了，这是一份“启发性”的报告，希望研究人员能顺着这个方向找到主要的原因。

方法

用 feed-forward network 将 patch dimension 映射到一个更高的维度空间
加上一个非线性变换
最后映射回原始空间

加在 patch dimension 上的前向层，可以被看成是一个有着全局感知的、单通道的非常规卷积。由于这个前向层可以看作是一个 $\times 1$ 的卷积，所以整个网络又可以看成是一个伪装起来的卷积网络。然而，其实它结构上与 Transformer 更相近，而不是传统的卷积网络，比如 ResNet 或者 VGG。

from torch import nn
class LinearBlock(nn.Module):
		def __init__(self, dim, mlp_ratio=4., drop=0., drop_path=0., act=nn.GELU,
									norm=nn.LayerNorm, n_tokens=197): # 197 = 16**2 + 1
				super().__init__()
				self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
				# FF over features
				self.mlp1 = Mlp(in_features=dim, hidden_features=int(dim*mlp_ratio), act=act, drop=drop)
				self.norm1 = norm(dim)
				# FF over patches
				self.mlp2 = Mlp(in_features=n_tokens, hidden_features=int(n_tokens*mlp_ratio), act=act, drop=drop)
				self.norm2 = norm(n_tokens)
		def forward(self, x):
				x = x + self.drop_path(self.mlp1(self.norm1(x)))
				x = x.transpose(-2, -1)
				x = x + self.drop_path(self.mlp2(self.norm2(x)))
				x = x.transpose(-2, -1)
				return x
class Mlp(nn.Module):
		def __init__(self, in_features, hidden_features, act_layer=nn.GELU, drop=0.):
				super().__init__()
				self.fc1 = nn.Linear(in_features, hidden_features)
				self.act = act_layer()
				self.fc2 = nn.Linear(hidden_features, in_features)
				self.drop = nn.Dropout(drop)
		def forward(self, x):
				x = self.fc1(x)
				x = self.act(x)
				x = self.drop(x)
				x = self.fc2(x)
				x = self.drop(x)
				return x