[Mixer]MLP-Mixer: An all-MLP Architecture forvision

image-20210508160154675

1. Motivation

In this paper we show that while convolutions and attention are both sufficient for good performance, neither of them are necessary.

2. Contribution

  • We propose the MLP-Mixer architecture (or “Mixer” for short), a competitive but conceptually and technically simple alternative, that does not use convolutions or self-attention.
  • Mixer’s architecture is based entirely on multi-layer perceptrons (MLPs) that are repeatedly applied across either spatial locations or feature channels.
  • Mixer relies only on basic matrix multiplication routines, changes to data layout (reshapes and transpositions), and scalar non-linearities.

3. Mixer Architecture

如图1所示,为MLP-Mixer的宏观结构,大体上来说,MLP-Mixer由per-patch linear embedding, Mixer layers 以及classifier head组成。对于Mixer layer来说,它由token-mixing MLP以及channel-mixing MLP组成,每一个MLP由2个FC层以及一个GELU非线性激活层组成。其余的成分包括skip-connection,dropout,layer norm,linear classifier head。

现在混合特征的方式有2种:

① at a given spatial location, 在特定空间位置

② between different spatial locations 在不同的空间位置

在CNN中,NxN卷积核和池化操作可以完成②,1x1卷积核可以完成①,更大的kernels①和②都可以perform;在VIT中,self-attention可以完成①,MLP-blocks可以完成②。

这些idea都分离了per-location(channel-mixing)以及cross-location(token-mixing)操作。

MLP-Mixer Layer的输入是a sequence of linearly projected image patches(文中也称为tokens),输入的维度为pathches x channel。

chennel-mixing MLPs以及token-mixing的作用如下:

The channel-mixing MLPs allow communication between different channels; they operate on each token independently and take individual rows of the table as inputs.

The token-mixing MLPs allow communication between different spatial locations (tokens); they operate on each channel independently and take individual columns of the table as inputs. These

image-20210508170625091

具体来说,Mixer采取的输入为S个image patches,每一个patch的维度为C,也就是二维的real-value input table, X ∈ R ( S × C

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值