【AI达人特训营】S2MLP论文复现

最新推荐文章于 2022-07-29 22:45:12 发布

AI Studio

最新推荐文章于 2022-07-29 22:45:12 发布

阅读量187

点赞数

分类专栏：人工智能文章标签：人工智能机器学习深度学习

原文链接：https://aistudio.baidu.com/aistudio/projectdetail/4284111?shared=1

版权

人工智能专栏收录该内容

180 篇文章 31 订阅

订阅专栏

【AI达人特训营】S2MLP论文复现

在这里插入图片描述

摘要

最近，visualtransformer (ViT)及其后续工作摒弃了卷积，利用了自我注意操作，达到了与cnn相当甚至更高的精度。最近，MLP- mixer放弃了卷积和自我关注操作，提出了一种只包含MLP层的架构。为了实现跨Token通信，在信道混合MLP的基础上，设计了一个额外的令牌混合MLP。在大规模数据集上进行训练，取得了良好的效果。但在ImageNet1K和ImageNet21K等中等规模数据集上训练时，它无法达到CNN和ViT的出色性能。MLP- mixer的性能下降促使我们重新思考token- mixer MLP。我们发现标记混合MLP是与全球接收场和特定空间配置深度卷积的一种变体。但全局接收场和空间特异性使得令牌混合MLP容易出现过拟合。在本文中，我们提出了一种新颖的纯MLP架构——空间移位MLP (S2-MLP)。与MLP- mixer不同，我们的S2-MLP只包含通道混合MLP。我们利用空间转移操作来进行Token之间的通信。它有一个局部的接收场，并且是空间不可知的。该方法无参数，计算效率高。在ImageNet-1K数据集上训练时，所提出的S2-MLP比MLP-Mixer具有更高的识别精度。同时，S2-MLP在ImageNet-1K数据集上实现了与ViT一样出色的性能，并且具有相当简单的架构和更少的FLOPs和参数。

1. S2MLP

1.1 前言

S2MLP 是百度提出的用于视觉的空间位移 MLP 架构。MLP-Mixer 好在它进一步去除了归纳偏置（即CNN的局部性和Transformer 的注意力机制），使用纯 MLP 架构进行学习，该工作这出在超大规模数据量训练时可以取得和 CNN 以及 Transformer 结构相当甚至更好的性能。然而，单单在 ImageNet 1k 或者 ImageNet 21K 上训练测试，其性能其实并不算太好。这是为什么呢？因为虽然 MLP-Mixer 增加了学习的自由性，没有给予局部性啊这些的约束，但是正因如此才更容易过拟合（But the freedom from breaking a chain is accompanied by the risk of over-fitting）。所以只有当它在超大规模数据量的训练下才可能变得普适。为此，我们实际上还是得给一些约束或者指导，以帮助模型在中小规模数据上训练得更好。
S2MLP 取消了 MLP-Mixer 中的 token-mixing MLP，仅仅保留 channel-minxing MLP，并且通过空间位移（Spatial-shift）操作将不同位置的特征移动到同一个通道对齐，从而在 channel-minxing MLP 中获得局部感受野的概念（token-mixing MLP是全局感受野的概念）。

1.2 总体架构

S2MLP 和 MLP-Mixer 类似，整体网络结构如下图所示，其中 S2MLP block 被重复了多次。我们先来全局性地讲一下 S2MLP 怎么工作的：

首先是对于一个 $\times H \times W$ 的输入 RGB 图像，将其进行 patch 切片，patch 大小为 $\times p$ ，patch 的个数为 $\frac{H}{p} \times \frac{W}{p}$ ，并将 patch 展平为一个向量，维度为 $3 p^{2}$ 。然后经过一个 patch-wise fully-connected layer (其实也就是 $\times 1$ 卷积)，将 $3 p^{2}$ 降维为 $c$ 。在全连接层后有一个 $\mathrm{LN}$ 层进行归一化。令 $h=\frac{H}{p}, w=\frac{W}{p}$ ，此时我们就有 $\times w \times c$ 的一个矩阵。作者使用过程中 $p = 16$ ，此时 $\times w=192$ 。
四个全连接层，两个残差结构，两个 GELU 激活函数，两个 LN 归一化层，以及一个 Spatial-shift 操作，这些配置和 MLP-Mixer 是一致的，唯一的不同就是把 token-mixing MLP 中两个全连接层替换为通道方向全连接再经过 Spatial-shift 操作之后再通道方向全连接。值得注意的是，fully-connected 3 全连接层通常会对节点数进行扩充，再通过 fully-connected 4 还原回来。这里 fully-connected 3 的 expansion ratio 被设置为 4 (与 ViT 等工作的设置一致）。
最后的输出结果经过全局平均池化和全连接层，就可以得到输出了

1.3 S2MLP Block

在对 S2MLP 网络整体有个概念后，我们来看看单个 Block 是怎么设计的：

首先是特征图输入后，对 Channel 进行一个全连接，这里是对于特定位置信息进行交流，其实也就是 $\times 1$ 卷积。然后经过一个 GELU 激活函数
其次对特征图进行 Spatial-shift 操作，这是一个固定的操作无需参数，并且四个方向都包含使得其是各向同性的。具体而言将特征图按通道分为 4 个组（和组卷积的定义一样），然后第一个组水平右移一个单位，第二组水平左移一个单位，第三组竖直下移一个单位，第四组竖直上移一个单位。通过简单的赋值就可以实现。并且 padding 为保留原始值（padding 在 AS-MLP 中被证明 Zero-padding 也是极好的）。这样就实现了不同位置的特征移动到同一个通道处对齐。然后再经过一个全连接层，其实也就是 $\times 1$ 卷积，进行局部感受野信息的融合，来实现不同 patch 之间的通信。此后经过一个 LN 归一化和残差（注意，这里不激活了，其实仔细去看哈，激活和归一化的位置和 MLP-Mixer 是一致的）
Spatial-shift 操作的伪代码如下图所示：

2. 代码复现

2.1 下载并导入所需的库

!pip install einops-0.3.0-py3-none-any.whl

!pip install paddlex

%matplotlib inline
import paddle
import paddle.fluid as fluid
import numpy as np
import matplotlib.pyplot as plt
from paddle.vision.datasets import Cifar10
from paddle.vision.transforms import Transpose
from paddle.io import Dataset, DataLoader
from paddle import nn
import paddle.nn.functional as F
import paddle.vision.transforms as transforms
import os
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
import paddlex
from einops.layers.paddle import Rearrange, Reduce
from einops import rearrange

2.2 创建数据集

train_tfm = transforms.Compose([
    transforms.Resize((230, 230)),
    transforms.ColorJitter(brightness=0.2,contrast=0.2, saturation=0.2),
    paddlex.transforms.MixupImage(),
    transforms.RandomResizedCrop(224, scale=(0.6, 1.0)),
    transforms.RandomHorizontalFlip(0.5),
    transforms.RandomRotation(20),
    transforms.ToTensor(),
    transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])

test_tfm = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])

paddle.vision.set_image_backend('cv2')
# 使用Cifar10数据集
train_dataset = Cifar10(data_file='data/data152754/cifar-10-python.tar.gz', mode='train', transform = train_tfm, )
val_dataset = Cifar10(data_file='data/data152754/cifar-10-python.tar.gz', mode='test',transform = test_tfm)
print("train_dataset: %d" % len(train_dataset))
print("val_dataset: %d" % len(val_dataset))

train_dataset: 50000
val_dataset: 10000

batch_size=128

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, drop_last=True, num_workers=4)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, drop_last=False, num_workers=4)

2.3 模型的创建

2.3.1 标签平滑

class LabelSmoothingCrossEntropy(nn.Layer):
    def __init__(self, smoothing=0.1):
        super().__init__()
        self.smoothing = smoothing

    def forward(self, pred, target):

        confidence = 1. - self.smoothing
        log_probs = F.log_softmax(pred, axis=-1)
        idx = paddle.stack([paddle.arange(log_probs.shape[0]), target], axis=1)
        nll_loss = paddle.gather_nd(-log_probs, index=idx)
        smooth_loss = paddle.mean(-log_probs, axis=-1)
        loss = confidence * nll_loss + self.smoothing * smooth_loss

        return loss.mean()

2.3.2 DropPath

def drop_path(x, drop_prob=0.0, training=False):
    """
    Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
    the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper...
    See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ...
    """
    if drop_prob == 0.0 or not training:
        return x
    keep_prob = paddle.to_tensor(1 - drop_prob)
    shape = (paddle.shape(x)[0],) + (1,) * (x.ndim - 1)
    random_tensor = keep_prob + paddle.rand(shape, dtype=x.dtype)
    random_tensor = paddle.floor(random_tensor)  # binarize
    output = x.divide(keep_prob) * random_tensor
    return output


class DropPath(nn.Layer):
    def __init__(self, drop_prob=None):
        super(DropPath, self).__init__()
        self.drop_prob = drop_prob

    def forward(self, x):
        return drop_path(x, self.drop_prob, self.training)

2.3.3 S2MLP模型的创建

class PostNormResidual(nn.Layer):
    def __init__(self, dim, fn, dpr=0.):
        super().__init__()
        self.fn = fn
        self.norm = nn.LayerNorm(dim)
        self.droppath = DropPath(dpr) if dpr>0. else nn.Identity()

    def forward(self, x):
        return x + self.droppath(self.norm(self.fn(x)))

class Spatial_Shift(nn.Layer):
    def __init__(self):
        super().__init__()
    
    def forward(self, x):
        b,w,h,c = x.shape
        x[:,1:,:,:c//4] = x[:,:w-1,:,:c//4]
        x[:,:w-1,:,c//4:c//2] = x[:,1:,:,c//4:c//2]
        x[:,:,1:,c//2:c*3//4] = x[:,:,:h-1,c//2:c*3//4]
        x[:,:,:h-1,3*c//4:] = x[:,:,1:,3*c//4:]
        return x

class S2Block(nn.Layer):
    def __init__(self, d_model, expansion_factor = 4, dropout = 0., dpr=0.):
        super().__init__()

        self.token_mixer = PostNormResidual(d_model,
            nn.Sequential(nn.Linear(d_model, d_model), nn.GELU(), Spatial_Shift(), nn.Linear(d_model, d_model)),
            dpr=dpr)
        
        self.channel_mixer = PostNormResidual(d_model, 
            nn.Sequential(nn.Linear(d_model, d_model * expansion_factor), nn.GELU(), nn.Linear(d_model * expansion_factor, d_model)),
            dpr=dpr)

    def forward(self, x):
        x = x.transpose([0, 2, 3, 1]) # b h w c
        x = self.token_mixer(x)
        x = self.channel_mixer(x)
        x = x.transpose([0, 3, 1, 2]) # b c h w
        return x

class S2MLP(nn.Layer):
    def __init__(self, image_size=224, patch_size=16, in_channel=3, num_classes=1000, 
        d_model=384, depth=36, expansion_factor=4, dropout=0., dpr=0.):
        super().__init__()

        assert image_size % patch_size == 0, 'image must be divisible by patch size'

        self.patcher = nn.Sequential(nn.Conv2D(in_channel, d_model, patch_size, patch_size), Rearrange('b c h w -> b h w c'),
            nn.LayerNorm(d_model), Rearrange('b h w c -> b c h w'))

        self.stage = nn.Sequential(*[S2Block(d_model, expansion_factor, dropout=dropout, dpr=dpr) for i in range(depth)])

        self.mlphead = nn.Sequential(nn.AdaptiveAvgPool2D(1), nn.Flatten(1), nn.Linear(d_model, num_classes))

        self.apply(self._init_weights)

    def _init_weights(self, m):
        zeros_ = nn.initializer.Constant(value=0.)
        ones_ = nn.initializer.Constant(value=1.)
        if isinstance(m, (nn.Linear, nn.Conv2D)):
            paddle.nn.initializer.XavierNormal(m.weight)
            if isinstance(m, (nn.Linear, nn.Conv2D)) and m.bias is not None:
                zeros_(m.bias)
        elif isinstance(m, (nn.LayerNorm, nn.GroupNorm, nn.BatchNorm, nn.BatchNorm2D)):
            zeros_(m.bias)
            ones_(m.weight)

    def forward(self, x):

        x = self.patcher(x)
        x = self.stage(x)
        x = self.mlphead(x)

        return x

2.3.4 模型的参数

# S2MLP-deep
model = S2MLP(image_size=224, patch_size=16, in_channel=3, num_classes=10, 
        d_model=384, depth=36, expansion_factor=4, dropout=0., dpr=0.1)
paddle.summary(model, (batch_size, 3, 224, 224))

在这里插入图片描述

# S2MLP-wide
model = S2MLP(image_size=224, patch_size=16, in_channel=3, num_classes=10, 
        d_model=768, depth=12, expansion_factor=4, dropout=0., dpr=0.)
paddle.summary(model, (batch_size, 3, 224, 224))

在这里插入图片描述

# S2MLP-ours
model = S2MLP(image_size=224, patch_size=16, in_channel=3, num_classes=10, 
        d_model=384, depth=12, expansion_factor=4, dropout=0., dpr=0.)
paddle.summary(model, (batch_size, 3, 224, 224))

在这里插入图片描述

2.4 训练

由于原论文的模型太大，本文对其进行缩放，使用的是S2MLP-ours模型

learning_rate = 0.001
n_epochs = 100
paddle.seed(42)
np.random.seed(42)

work_path = 'work/model'

# S2MLP-ours
model = S2MLP(image_size=224, patch_size=16, in_channel=3, num_classes=10, 
        d_model=384, depth=12, expansion_factor=4, dropout=0., dpr=0.)

criterion = LabelSmoothingCrossEntropy()

scheduler = paddle.optimizer.lr.CosineAnnealingDecay(learning_rate=learning_rate, T_max=50000 // batch_size * n_epochs, verbose=False)
optimizer = paddle.optimizer.Adam(parameters=model.parameters(), learning_rate=scheduler, weight_decay=1e-5)

gate = 0.0
threshold = 0.0
best_acc = 0.0
val_acc = 0.0
loss_record = {'train': {'loss': [], 'iter': []}, 'val': {'loss': [], 'iter': []}}   # for recording loss
acc_record = {'train': {'acc': [], 'iter': []}, 'val': {'acc': [], 'iter': []}}      # for recording accuracy

loss_iter = 0
acc_iter = 0

for epoch in range(n_epochs):
    # ---------- Training ----------
    model.train()
    train_num = 0.0
    train_loss = 0.0

    val_num = 0.0
    val_loss = 0.0
    accuracy_manager = paddle.metric.Accuracy()
    val_accuracy_manager = paddle.metric.Accuracy()
    print("#===epoch: {}, lr={:.10f}===#".format(epoch, optimizer.get_lr()))
    for batch_id, data in enumerate(train_loader):
        x_data, y_data = data
        labels = paddle.unsqueeze(y_data, axis=1)

        logits = model(x_data)

        loss = criterion(logits, y_data)

        acc = paddle.metric.accuracy(logits, labels)
        accuracy_manager.update(acc)
        if batch_id % 10 == 0:
            loss_record['train']['loss'].append(loss.numpy())
            loss_record['train']['iter'].append(loss_iter)
            loss_iter += 1

        loss.backward()

        optimizer.step()
        scheduler.step()
        optimizer.clear_grad()
        
        train_loss += loss
        train_num += len(y_data)

    total_train_loss = (train_loss / train_num) * batch_size
    train_acc = accuracy_manager.accumulate()
    acc_record['train']['acc'].append(train_acc)
    acc_record['train']['iter'].append(acc_iter)
    acc_iter += 1
    # Print the information.
    print("#===epoch: {}, train loss is: {}, train acc is: {:2.2f}%===#".format(epoch, total_train_loss.numpy(), train_acc*100))

    # ---------- Validation ----------
    model.eval()

    for batch_id, data in enumerate(val_loader):

        x_data, y_data = data
        labels = paddle.unsqueeze(y_data, axis=1)
        with paddle.no_grad():
          logits = model(x_data)

        loss = criterion(logits, y_data)

        acc = paddle.metric.accuracy(logits, labels)
        val_accuracy_manager.update(acc)

        val_loss += loss
        val_num += len(y_data)

    total_val_loss = (val_loss / val_num) * batch_size
    loss_record['val']['loss'].append(total_val_loss.numpy())
    loss_record['val']['iter'].append(loss_iter)
    val_acc = val_accuracy_manager.accumulate()
    acc_record['val']['acc'].append(val_acc)
    acc_record['val']['iter'].append(acc_iter)
    
    print("#===epoch: {}, val loss is: {}, val acc is: {:2.2f}%===#".format(epoch, total_val_loss.numpy(), val_acc*100))

    # ===================save====================
    if val_acc > best_acc:
        best_acc = val_acc
        paddle.save(model.state_dict(), os.path.join(work_path, 'best_model.pdparams'))
        paddle.save(optimizer.state_dict(), os.path.join(work_path, 'best_optimizer.pdopt'))

print(best_acc)
paddle.save(model.state_dict(), os.path.join(work_path, 'final_model.pdparams'))
paddle.save(optimizer.state_dict(), os.path.join(work_path, 'final_optimizer.pdopt'))

在这里插入图片描述

2.5 结果分析

def plot_learning_curve(record, title='loss', ylabel='CE Loss'):
    ''' Plot learning curve of your CNN '''
    maxtrain = max(map(float, record['train'][title]))
    maxval = max(map(float, record['val'][title]))
    ymax = max(maxtrain, maxval) * 1.1
    mintrain = min(map(float, record['train'][title]))
    minval = min(map(float, record['val'][title]))
    ymin = min(mintrain, minval) * 0.9

    total_steps = len(record['train'][title])
    x_1 = list(map(int, record['train']['iter']))
    x_2 = list(map(int, record['val']['iter']))
    figure(figsize=(10, 6))
    plt.plot(x_1, record['train'][title], c='tab:red', label='train')
    plt.plot(x_2, record['val'][title], c='tab:cyan', label='val')
    plt.ylim(ymin, ymax)
    plt.xlabel('Training steps')
    plt.ylabel(ylabel)
    plt.title('Learning curve of {}'.format(title))
    plt.legend()
    plt.show()

2.5.1 loss和acc曲线

plot_learning_curve(loss_record, title='loss', ylabel='CE Loss')

在这里插入图片描述

plot_learning_curve(acc_record, title='acc', ylabel='Accuracy')

在这里插入图片描述

import time
work_path = 'work/model'
model = S2MLP(image_size=224, patch_size=16, in_channel=3, num_classes=10, 
        d_model=384, depth=12, expansion_factor=4, dropout=0., dpr=0.)
model_state_dict = paddle.load(os.path.join(work_path, 'best_model.pdparams'))
model.set_state_dict(model_state_dict)
model.eval()
aa = time.time()
for batch_id, data in enumerate(val_loader):

    x_data, y_data = data
    labels = paddle.unsqueeze(y_data, axis=1)
    with paddle.no_grad():
        logits = model(x_data)
bb = time.time()
print("Throughout:{}".format(int(len(val_dataset)//(bb - aa))))

Throughout:545

2.5.2 预测与真实标签比较

def get_cifar10_labels(labels):  
    """返回CIFAR10数据集的文本标签。"""
    text_labels = [
        'airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog',
        'horse', 'ship', 'truck']
    return [text_labels[int(i)] for i in labels]

def show_images(imgs, num_rows, num_cols, pred=None, gt=None, scale=1.5):  
    """Plot a list of images."""
    figsize = (num_cols * scale, num_rows * scale)
    _, axes = plt.subplots(num_rows, num_cols, figsize=figsize)
    axes = axes.flatten()
    for i, (ax, img) in enumerate(zip(axes, imgs)):
        if paddle.is_tensor(img):
            ax.imshow(img.numpy())
        else:
            ax.imshow(img)
        ax.axes.get_xaxis().set_visible(False)
        ax.axes.get_yaxis().set_visible(False)
        if pred or gt:
            ax.set_title("pt: " + pred[i] + "\ngt: " + gt[i])
    return axes

work_path = 'work/model'
X, y = next(iter(DataLoader(val_dataset, batch_size=18)))
model = S2MLP(image_size=224, patch_size=16, in_channel=3, num_classes=10, 
        d_model=384, depth=12, expansion_factor=4, dropout=0., dpr=0.)
model_state_dict = paddle.load(os.path.join(work_path, 'best_model.pdparams'))
model.set_state_dict(model_state_dict)
model.eval()
logits = model(X)
y_pred = paddle.argmax(logits, -1)
X = paddle.transpose(X, [0, 2, 3, 1])
axes = show_images(X.reshape((18, 224, 224, 3)), 1, 18, pred=get_cifar10_labels(y_pred), gt=get_cifar10_labels(y))
plt.show()

Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).

在这里插入图片描述

2.5.3 可视化结果

!pip install interpretdl

import interpretdl as it

work_path = 'work/model'
model = S2MLP(image_size=224, patch_size=16, in_channel=3, num_classes=10, 
        d_model=384, depth=12, expansion_factor=4, dropout=0., dpr=0.)
model_state_dict = paddle.load(os.path.join(work_path, 'best_model.pdparams'))
model.set_state_dict(model_state_dict)

X, y = next(iter(DataLoader(val_dataset, batch_size=18)))
lime = it.LIMECVInterpreter(model, use_cuda=True)

lime_weights = lime.interpret(X.numpy()[3], interpret_class=y.numpy()[3], batch_size=100, num_samples=10000, visual=True)

100%|██████████| 10000/10000 [00:52<00:00, 161.77it/s]

在这里插入图片描述

lime_weights = lime.interpret(X.numpy()[13], interpret_class=y.numpy()[13], batch_size=100, num_samples=10000, visual=True)

100%|██████████| 10000/10000 [00:56<00:00, 176.81it/s]

在这里插入图片描述

总结

S2MLP使用了一个极其简单的在这里插入图片描述

未来工作：在更大的数据集上进一步测试分类性能

开源链接：https://aistudio.baidu.com/aistudio/projectdetail/4284111?shared=1

AI Studio

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【AI达人特训营】S2MLP论文复现

S2MLP 是百度提出的用于视觉的空间位移 MLP 架构，使用非常简单的Spatial-shift操作实现了非常好的性能
复制链接

扫一扫

专栏目录

【AI达人特训营】S2MLP论文复现

【AI达人特训营】S2MLP论文复现

摘要

1. S2MLP

1.1 前言

1.2 总体架构

1.3 S2MLP Block

2. 代码复现

2.1 下载并导入所需的库

2.2 创建数据集

2.3 模型的创建

2.3.1 标签平滑

2.3.2 DropPath

2.3.3 S2MLP模型的创建

2.3.4 模型的参数

2.4 训练

2.5 结果分析

2.5.1 loss和acc曲线

2.5.2 预测与真实标签比较

2.5.3 可视化结果

总结

“相关推荐”对你有帮助么？