[Mixer]MLP-Mixer: An all-MLP Architecture forvision

最新推荐文章于 2024-05-29 16:59:51 发布

Ah丶Weii

最新推荐文章于 2024-05-29 16:59:51 发布

阅读量468

点赞数

分类专栏：学习

本文链接：https://blog.csdn.net/weixin_43823854/article/details/116542853

版权

1. Motivation

In this paper we show that while convolutions and attention are both sufficient for good performance, neither of them are necessary.

2. Contribution

We propose the MLP-Mixer architecture (or “Mixer” for short), a competitive but conceptually and technically simple alternative, that does not use convolutions or self-attention.
Mixer’s architecture is based entirely on multi-layer perceptrons (MLPs) that are repeatedly applied across either spatial locations or feature channels.
Mixer relies only on basic matrix multiplication routines, changes to data layout (reshapes and transpositions), and scalar non-linearities.

3. Mixer Architecture

如图1所示，为MLP-Mixer的宏观结构，大体上来说，MLP-Mixer由per-patch linear embedding， Mixer layers 以及classifier head组成。对于Mixer layer来说，它由token-mixing MLP以及channel-mixing MLP组成，每一个MLP由2个FC层以及一个GELU非线性激活层组成。其余的成分包括skip-connection，dropout，layer norm，linear classifier head。

现在混合特征的方式有2种：

① at a given spatial location, 在特定空间位置

② between different spatial locations 在不同的空间位置

在CNN中，NxN卷积核和池化操作可以完成②，1x1卷积核可以完成①，更大的kernels①和②都可以perform；在VIT中，self-attention可以完成①，MLP-blocks可以完成②。

这些idea都分离了per-location（channel-mixing）以及cross-location（token-mixing）操作。

MLP-Mixer Layer的输入是a sequence of linearly projected image patches（文中也称为tokens），输入的维度为pathches x channel。

chennel-mixing MLPs以及token-mixing的作用如下：

The channel-mixing MLPs allow communication between different channels; they operate on each token independently and take individual rows of the table as inputs.

The token-mixing MLPs allow communication between different spatial locations (tokens); they operate on each channel independently and take individual columns of the table as inputs. These

具体来说，Mixer采取的输入为S个image patches，每一个patch的维度为C，也就是二维的real-value input table，

最低0.47元/天解锁文章

Ah丶Weii

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
[Mixer]MLP-Mixer: An all-MLP Architecture forvision

1. MotivationIn this paper we show that while convolutions and attention are both sufficient for good performance, neither of them are necessary.2. ContributionWe propose the MLP-Mixer architecture (or “Mixer” for short), a competitive but conceptua.
复制链接

扫一扫