EMO:对高效神经模型移动块的重新思考

★★★ 本文源自AlStudio社区精品项目,【点击此处】查看更多精品内容 >>>

摘要

        本文的重点是为密集预测任务设计具有低参数和FLOPs的有效模型。 尽管基于CNN的轻量化方法经过多年的研究已经取得了惊人的结果,但在模型的准确性和有限的资源之间的权衡上仍然需要进一步改进。 本文对MobileNetv2中有效倒置残差块与VIT中有效Transformer的本质统一性进行了重新思考,归纳抽象出一个Meta-Mobile Block的通用概念,并指出在共享相同框架的情况下,具体的实例化对模型性能非常重要。 基于这种现象,我们提出了一种简单而高效的现代反向残差移动块(IRMB),它继承了CNN的效率来建模短距离依赖和Transformer的动态建模能力来学习远距离交互。 此外,我们仅基于一系列iRMBs设计了一个类似ResNet的4阶段有效模型拥有密集预测任务。在ImageNet-1K、COCO2017和ADE20K基准上进行的大量实验表明,我们的EMO优于现有的方法,例如,我们的EMO-1M/2M/5M实现了71.5、75.1和78.4的Top-1,超过了基于SOTA CNN-/Transformer的模型,同时很好地权衡了模型的精度和效率。

1. EMO

        本文作者重新审视了轻量化CNN的反向残差模块和Transformer中的自注意力模块以及前馈神经网络模块,并从中归纳抽象出一个新的元结构——Meta-Mobile Block。本文作者认为之前的模块都是此元结构的一个实例化,并表明具体的实例化对模型性能非常重要。Meta-Mobile Block的公式如下所示:

X e = MLP ⁡ e ( X ) ( ∈ R λ C × H × W ) X f = F ( X e ) ( ∈ R λ C × H × W ) X s = MLP ⁡ s ( X f ) ( ∈ R C × H × W ) Y = X + X s ( ∈ R C × H × W ) \begin{array}{c} \boldsymbol{X}_{e} = \operatorname{MLP}_{e}(\boldsymbol{X})\left(\in \mathbb{R}^{\lambda C \times H \times W}\right)\\ \boldsymbol{X}_{f} = \mathcal{F}\left(\boldsymbol{X}_{e}\right)\left(\in \mathbb{R}^{\lambda C \times H \times W}\right)\\ \boldsymbol{X}_{s} = \operatorname{MLP}_{s}\left(\boldsymbol{X}_{f}\right)\left(\in \mathbb{R}^{C \times H \times W}\right)\\ \boldsymbol{Y} = \boldsymbol{X}+\boldsymbol{X}_{s}\left(\in \mathbb{R}^{C \times H \times W}\right) \end{array} Xe=MLPe(X)(RλC×H×W)Xf=F(Xe)(RλC×H×W)Xs=MLPs(Xf)(RC×H×W)Y=X+Xs(RC×H×W)

        本文结合了CNN和Transformer的优势,提出了一个新的反向移动残差块(iRMB),该结构既具有CNN建模局部特征的效率,又具有Transformer动态建模远程交互的能力,具体实现可以表示为如下公式:

F ( ⋅ ) = (  DW-Conv, Skip  ) (  EW-MHSA  ( ⋅ ) ) \mathcal{F}(\cdot)=(\text { DW-Conv, Skip })(\text { EW-MHSA }(\cdot)) F()=( DW-Conv, Skip )( EW-MHSA ())

        如图2左所示,首先使用CMLP生成Q和K,然后使用一个膨胀卷积来生成V,然后对Q、K、V进行窗口自注意力机制来进行远程交互,紧接着使用一个DWConv建模局部特征,最后使用压缩卷积来恢复通道数,并与输入相加。值得注意的是,由于膨胀卷积和自注意力机制只包含矩阵乘运算,所以可以在膨胀卷积之前计算自注意力,这样可以减少FLOPs,并且运算是等效的。

        同时,本文作者也总结了高效模型应该满足的几个标准并分析了本文提出的模块以及之前模块的优缺点(如表3所示):

  1. 可用性。不要使用复杂的运算符。简单的运算符易于优化
  2. 统一性。尽可能少的核心模块,以降低模型复杂度
  3. 有效性。良好的分类和密集预测性能
  4. 高效性。较少的参数和计算与精度的权衡

2. 代码复现

2.1 下载并导入所需的库

!pip install einops-0.3.0-py3-none-any.whl
!pip install paddlex
%matplotlib inline
import paddle
import paddle.fluid as fluid
import numpy as np
import matplotlib.pyplot as plt
from paddle.vision.datasets import Cifar10
from paddle.vision.transforms import Transpose
from paddle.io import Dataset, DataLoader
from paddle import nn
import paddle.nn.functional as F
import paddle.vision.transforms as transforms
import os
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
import paddlex
import itertools
from functools import partial
from einops import rearrange, reduce
from einops.layers.paddle import Rearrange

2.2 创建数据集

train_tfm = transforms.Compose([
    transforms.RandomResizedCrop(224),
    transforms.ColorJitter(brightness=0.2,contrast=0.2, saturation=0.2),
    transforms.RandomHorizontalFlip(0.5),
    transforms.RandomRotation(20),
    paddlex.transforms.MixupImage(),
    transforms.ToTensor(),
    transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])

test_tfm = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])
paddle.vision.set_image_backend('cv2')
# 使用Cifar10数据集
train_dataset = Cifar10(data_file='data/data152754/cifar-10-python.tar.gz', mode='train', transform = train_tfm, )
val_dataset = Cifar10(data_file='data/data152754/cifar-10-python.tar.gz', mode='test',transform = test_tfm)
print("train_dataset: %d" % len(train_dataset))
print("val_dataset: %d" % len(val_dataset))
train_dataset: 50000
val_dataset: 10000
batch_size=256
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, drop_last=True, num_workers=4)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, drop_last=False, num_workers=4)

2.3 模型的创建

2.3.1 标签平滑
class LabelSmoothingCrossEntropy(nn.Layer):
    def __init__(self, smoothing=0.1):
        super().__init__()
        self.smoothing = smoothing

    def forward(self, pred, target):

        confidence = 1. - self.smoothing
        log_probs = F.log_softmax(pred, axis=-1)
        idx = paddle.stack([paddle.arange(log_probs.shape[0]), target], axis=1)
        nll_loss = paddle.gather_nd(-log_probs, index=idx)
        smooth_loss = paddle.mean(-log_probs, axis=-1)
        loss = confidence * nll_loss + self.smoothing * smooth_loss

        return loss.mean()
2.3.2 DropPath
def drop_path(x, drop_prob=0.0, training=False):
    """
    Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
    the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper...
    See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ...
    """
    if drop_prob == 0.0 or not training:
        return x
    keep_prob = paddle.to_tensor(1 - drop_prob)
    shape = (paddle.shape(x)[0],) + (1,) * (x.ndim - 1)
    random_tensor = keep_prob + paddle.rand(shape, dtype=x.dtype)
    random_tensor = paddle.floor(random_tensor)  # binarize
    output = x.divide(keep_prob) * random_tensor
    return output


class DropPath(nn.Layer):
    def __init__(self, drop_prob=None):
        super(DropPath, self).__init__()
        self.drop_prob = drop_prob

    def forward(self, x):
        return drop_path(x, self.drop_prob, self.training)
2.3.3 EMO模型的创建
class LayerNorm(nn.Layer):
    def __init__(self, dim):
        super().__init__()
        self.norm = nn.Sequential(
            Rearrange('b c h w->b h w c'),
            nn.LayerNorm(dim),
            Rearrange('b h w c->b c h w')
        )
    
    def forward(self, x):
        return self.norm(x);
class SE(nn.Layer):
    def __init__(self, dim, se_ratio, act_layer):
        super().__init__()
        self.avg_pool = nn.AdaptiveAvgPool2D(1)
        self.conv1 = nn.Conv2D(dim, int(dim * se_ratio), 1, bias_attr=False)
        self.relu = act_layer()
        self.conv2 = nn.Conv2D(int(dim * se_ratio), dim, 1, bias_attr=False)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        attn = self.avg_pool(x)
        attn = self.conv1(attn)
        attn = self.relu(attn)
        attn = self.conv2(attn)
        attn = self.sigmoid(attn) 
        return x * attn
class iRMB(nn.Layer):
    def __init__(self, dim_in, dim_out, norm_in=True, has_skip=True, exp_ratio=1.0, v_proj=True, dw_ks=3,
                stride=1, dilation=1, se_ratio=0.0, dim_head=64, window_size=7, attn_s=True, qkv_bias=False,
                attn_drop=0., drop=0., drop_path=0., v_group=False, attn_pre=False, norm_layer=nn.BatchNorm2D, act_layer=nn.Silu):
        super().__init__()
        
        self.norm = nn.BatchNorm2D(dim_in) if norm_in else nn.Identity()
        dim_mid = int(dim_in * exp_ratio)
        self.has_skip = (dim_in == dim_out and stride == 1) and has_skip
        self.attn_s = attn_s

        if self.attn_s:
            assert dim_in % dim_head == 0, 'dim should be divisible by num_heads'
            self.dim_head = dim_head
            self.window_size = window_size
            self.num_head = dim_in // dim_head
            self.scale = self.dim_head ** -0.5
            self.attn_pre = attn_pre
            self.qk = nn.Conv2D(dim_in, int(dim_in * 2), kernel_size=1,
                                    bias_attr=qkv_bias)
            self.v = nn.Sequential(
                nn.Conv2D(dim_in, dim_mid, kernel_size=1, groups=self.num_head if v_group else 1, bias_attr=qkv_bias),
                act_layer()
            )
            self.attn_drop = nn.Dropout(attn_drop)
        else:
            if v_proj:
                self.v = nn.Sequential(
                nn.Conv2D(dim_in, dim_mid, kernel_size=1),
                act_layer()
            )
            else:
                self.v = nn.Identity()

        self.conv_local = nn.Sequential(
            nn.Conv2D(dim_mid, dim_mid, kernel_size=dw_ks, stride=stride, padding=dw_ks // 2, dilation=dilation, groups=dim_mid),
            norm_layer(dim_mid),
            act_layer()
        )
        self.se = SE(dim_mid, se_ratio=se_ratio, act_layer=act_layer) if se_ratio > 0.0 else nn.Identity()

        self.proj_drop = nn.Dropout(drop)
        self.proj = nn.Conv2D(dim_mid, dim_out, kernel_size=1)
        self.drop_path = DropPath(drop_path) if drop_path else nn.Identity()

    def forward(self, x):
        shortcut = x
        x = self.norm(x)
        B, C, H, W = x.shape
        if self.attn_s:
            pad_l, pad_t = 0, 0
            pad_r = (self.window_size - W % self.window_size) % self.window_size
            pad_b = (self.window_size - H % self.window_size) % self.window_size
            x = F.pad(x, [pad_l, pad_r, pad_t, pad_b])
            n1, n2 = (H + pad_b) // self.window_size, (W + pad_r) // self.window_size
            x = rearrange(x, 'b c (h1 n1) (w1 n2) -> (b n1 n2) c h1 w1', n1=n1, n2=n2)

            # attention
            b, c, h, w = x.shape
            qk = self.qk(x)
            qk = rearrange(qk, 'b (qk heads dim_head) h w -> qk b heads (h w) dim_head', qk=2, heads=self.num_head, dim_head=self.dim_head)
            q, k = qk[0], qk[1]
            attn_spa = (q @ k.transpose([0, 1, 3, 2])) * self.scale
            attn_spa = F.softmax(attn_spa, axis=-1)
            attn_spa = self.attn_drop(attn_spa)
            if self.attn_pre:
                x = rearrange(x, 'b (heads dim_head) h w -> b heads (h w) dim_head', heads=self.num_head)
                x_spa = attn_spa @ x
                x_spa = rearrange(x_spa, 'b heads (h w) dim_head -> b (heads dim_head) h w', heads=self.num_head, h=h, w=w)
                x_spa = self.v(x_spa)
            else:
                v = self.v(x)
                v = rearrange(v, 'b (heads dim_head) h w -> b heads (h w) dim_head', heads=self.num_head)
                x_spa = attn_spa @ v
                x_spa = rearrange(x_spa, 'b heads (h w) dim_head -> b (heads dim_head) h w', heads=self.num_head, h=h, w=w)
            # unpadding
            x = rearrange(x_spa, '(b n1 n2) c h1 w1 -> b c (h1 n1) (w1 n2)', n1=n1, n2=n2)
            if pad_r > 0 or pad_b > 0:
                x = x[:, :, :H, :W]
        else:
            x = self.v(x)

        x = x + self.se(self.conv_local(x)) if self.has_skip else self.se(self.conv_local(x))

        # 此处论文代码给的是这个顺序,但是根据之前的模型经验应该是写反了
        x = self.proj_drop(x)
        x = self.proj(x)

        x = (shortcut + self.drop_path(x)) if self.has_skip else x
        return x
class EMO(nn.Layer):

    def __init__(self, dim_in=3, num_classes=1000, img_size=224,
                depths=[1, 2, 4, 2], stem_dim=16, embed_dims=[64, 128, 256, 512], exp_ratios=[4., 4., 4., 4.],
                dw_kss=[3, 3, 5, 5], se_ratios=[0.0, 0.0, 0.0, 0.0], dim_heads=[32, 32, 32, 32],
                window_sizes=[7, 7, 7, 7], attn_ss=[False, False, True, True], qkv_bias=True,
                attn_drop=0., drop=0., drop_path=0., v_group=False, attn_pre=False,
                norm_layers=[nn.BatchNorm2D, nn.BatchNorm2D, LayerNorm, LayerNorm],
                act_layers=[nn.Silu, nn.Silu, nn.GELU, nn.GELU]):
        super().__init__()
        self.num_classes = num_classes
        assert num_classes > 0
        dprs = [x.item() for x in paddle.linspace(0, drop_path, sum(depths))]
        self.stage0 = nn.LayerList([
            nn.Sequential(
                nn.Conv2D(dim_in, stem_dim, dw_kss[0], stride=2, padding=dw_kss[0] // 2),
                norm_layers[0](stem_dim)
            ),
            iRMB(
                stem_dim, stem_dim, norm_in=False, has_skip=False, exp_ratio=1,
                norm_layer=norm_layers[0], act_layer=act_layers[0],
                v_proj=False, dw_ks=dw_kss[0], stride=1, dilation=1, se_ratio=1,
                dim_head=dim_heads[0], window_size=window_sizes[0], attn_s=False,
                qkv_bias=qkv_bias, attn_drop=attn_drop, drop=drop, drop_path=0.,
                attn_pre=attn_pre
            )
        ])
        emb_dim_pre = stem_dim
        for i in range(len(depths)):
            layers = []
            dpr = dprs[sum(depths[:i]):sum(depths[:i + 1])]
            for j in range(depths[i]):
                if j == 0:
                    stride, has_skip, attn_s, exp_ratio = 2, False, False, exp_ratios[i] * 2
                else:
                    stride, has_skip, attn_s, exp_ratio = 1, True, attn_ss[i], exp_ratios[i]
                layers.append(iRMB(
                    emb_dim_pre, embed_dims[i], norm_in=True, has_skip=has_skip, exp_ratio=exp_ratio,
                    norm_layer=norm_layers[i], act_layer=act_layers[i],
                    v_proj=True, dw_ks=dw_kss[i], stride=stride, dilation=1, se_ratio=se_ratios[i],
                    dim_head=dim_heads[i], window_size=window_sizes[i], attn_s=attn_s,
                    qkv_bias=qkv_bias, attn_drop=attn_drop, drop=drop, drop_path=dpr[j], v_group=v_group,
                    attn_pre=attn_pre
                ))
                emb_dim_pre = embed_dims[i]
            self.__setattr__(f'stage{i + 1}', nn.LayerList(layers))

        self.norm = nn.BatchNorm2D(embed_dims[-1])
        self.head = nn.Linear(embed_dims[-1], num_classes)
        self.apply(self._init_weights)

    def _init_weights(self, m):
        tn = nn.initializer.TruncatedNormal(std=.02)
        ones = nn.initializer.Constant(1.0)
        zeros = nn.initializer.Constant(0.0)
        if isinstance(m, (nn.Conv2D, nn.Linear)):
            tn(m.weight)
            if m.bias is not None:
                zeros(m.bias)
        elif isinstance(m, (nn.LayerNorm, nn.BatchNorm2D)):
            zeros(m.bias)
            ones(m.weight)

    def forward_features(self, x):
        for blk in self.stage0:
            x = blk(x)
        for blk in self.stage1:
            x = blk(x)
        for blk in self.stage2:
            x = blk(x)
        for blk in self.stage3:
            x = blk(x)
        for blk in self.stage4:
            x = blk(x)
        return x

    def forward(self, x):
        x = self.forward_features(x)
        x = self.norm(x)
        x = reduce(x, 'b c h w -> b c', 'mean')
        x = self.head(x)
        return x
2.3.4 模型的参数
num_classes = 10
def EMO_1M():
	model = EMO(
		num_classes=num_classes,
		depths=[2, 2, 8, 3], stem_dim=24, embed_dims=[32, 48, 80, 168], exp_ratios=[2., 2.5, 3.0, 3.5],
		dw_kss=[3, 3, 5, 5], dim_heads=[16, 16, 20, 21], window_sizes=[7, 7, 7, 7], attn_ss=[False, False, True, True],
		qkv_bias=True, attn_drop=0., drop=0., drop_path=0.04036, v_group=False, attn_pre=True)
	return model


def EMO_2M():
	model = EMO(
		num_classes=num_classes,
		depths=[3, 3, 9, 3], stem_dim=24, embed_dims=[32, 48, 120, 200], exp_ratios=[2., 2.5, 3.0, 3.5],
		dw_kss=[3, 3, 5, 5], dim_heads=[16, 16, 20, 20], window_sizes=[7, 7, 7, 7], attn_ss=[False, False, True, True],
		qkv_bias=True, attn_drop=0., drop=0., drop_path=0.05, v_group=False, attn_pre=True)
	return model


def EMO_5M():
	model = EMO(
		num_classes=num_classes,
		depths=[3, 3, 9, 3], stem_dim=24, embed_dims=[48, 72, 160, 288], exp_ratios=[2., 3., 4., 4.],
		dw_kss=[3, 3, 5, 5], dim_heads=[24, 24, 32, 32], window_sizes=[7, 7, 7, 7], attn_ss=[False, False, True, True],
		qkv_bias=True, attn_drop=0., drop=0., drop_path=0.05, v_group=False, attn_pre=True)
	return model


def EMO_6M():
	model = EMO(
		num_classes=num_classes,
		depths=[3, 3, 9, 3], stem_dim=24, embed_dims=[48, 72, 160, 320], exp_ratios=[2., 3., 4., 5.],
		dw_kss=[3, 3, 5, 5], dim_heads=[16, 24, 20, 32], window_sizes=[7, 7, 7, 7], attn_ss=[False, False, True, True],
		qkv_bias=True, attn_drop=0., drop=0., drop_path=0.05, v_group=False, attn_pre=True)
	return model

model = EMO_1M()
paddle.summary(model, (1, 3, 224, 224))

model = EMO_2M()
paddle.summary(model, (1, 3, 224, 224))

model = EMO_5M()
paddle.summary(model, (1, 3, 224, 224))

model = EMO_6M()
paddle.summary(model, (1, 3, 224, 224))

2.4 训练

learning_rate = 0.001
n_epochs = 100
paddle.seed(42)
np.random.seed(42)
work_path = 'work/model'

# EMO-1M
model = EMO_1M()

criterion = LabelSmoothingCrossEntropy()

scheduler = paddle.optimizer.lr.CosineAnnealingDecay(learning_rate=learning_rate, T_max=50000 // batch_size * n_epochs, verbose=False)
optimizer = paddle.optimizer.Adam(parameters=model.parameters(), learning_rate=scheduler, weight_decay=1e-5)

gate = 0.0
threshold = 0.0
best_acc = 0.0
val_acc = 0.0
loss_record = {'train': {'loss': [], 'iter': []}, 'val': {'loss': [], 'iter': []}}   # for recording loss
acc_record = {'train': {'acc': [], 'iter': []}, 'val': {'acc': [], 'iter': []}}      # for recording accuracy

loss_iter = 0
acc_iter = 0

for epoch in range(n_epochs):
    # ---------- Training ----------
    model.train()
    train_num = 0.0
    train_loss = 0.0

    val_num = 0.0
    val_loss = 0.0
    accuracy_manager = paddle.metric.Accuracy()
    val_accuracy_manager = paddle.metric.Accuracy()
    print("#===epoch: {}, lr={:.10f}===#".format(epoch, optimizer.get_lr()))
    for batch_id, data in enumerate(train_loader):
        x_data, y_data = data
        labels = paddle.unsqueeze(y_data, axis=1)

        logits = model(x_data)

        loss = criterion(logits, y_data)

        acc = paddle.metric.accuracy(logits, labels)
        accuracy_manager.update(acc)
        if batch_id % 10 == 0:
            loss_record['train']['loss'].append(loss.numpy())
            loss_record['train']['iter'].append(loss_iter)
            loss_iter += 1

        loss.backward()

        optimizer.step()
        scheduler.step()
        optimizer.clear_grad()
        
        train_loss += loss
        train_num += len(y_data)

    total_train_loss = (train_loss / train_num) * batch_size
    train_acc = accuracy_manager.accumulate()
    acc_record['train']['acc'].append(train_acc)
    acc_record['train']['iter'].append(acc_iter)
    acc_iter += 1
    # Print the information.
    print("#===epoch: {}, train loss is: {}, train acc is: {:2.2f}%===#".format(epoch, total_train_loss.numpy(), train_acc*100))

    # ---------- Validation ----------
    model.eval()

    for batch_id, data in enumerate(val_loader):

        x_data, y_data = data
        labels = paddle.unsqueeze(y_data, axis=1)
        with paddle.no_grad():
          logits = model(x_data)

        loss = criterion(logits, y_data)

        acc = paddle.metric.accuracy(logits, labels)
        val_accuracy_manager.update(acc)

        val_loss += loss
        val_num += len(y_data)

    total_val_loss = (val_loss / val_num) * batch_size
    loss_record['val']['loss'].append(total_val_loss.numpy())
    loss_record['val']['iter'].append(loss_iter)
    val_acc = val_accuracy_manager.accumulate()
    acc_record['val']['acc'].append(val_acc)
    acc_record['val']['iter'].append(acc_iter)
    
    print("#===epoch: {}, val loss is: {}, val acc is: {:2.2f}%===#".format(epoch, total_val_loss.numpy(), val_acc*100))

    # ===================save====================
    if val_acc > best_acc:
        best_acc = val_acc
        paddle.save(model.state_dict(), os.path.join(work_path, 'best_model.pdparams'))
        paddle.save(optimizer.state_dict(), os.path.join(work_path, 'best_optimizer.pdopt'))

print(best_acc)
paddle.save(model.state_dict(), os.path.join(work_path, 'final_model.pdparams'))
paddle.save(optimizer.state_dict(), os.path.join(work_path, 'final_optimizer.pdopt'))

2.5 结果分析

def plot_learning_curve(record, title='loss', ylabel='CE Loss'):
    ''' Plot learning curve of your CNN '''
    maxtrain = max(map(float, record['train'][title]))
    maxval = max(map(float, record['val'][title]))
    ymax = max(maxtrain, maxval) * 1.1
    mintrain = min(map(float, record['train'][title]))
    minval = min(map(float, record['val'][title]))
    ymin = min(mintrain, minval) * 0.9

    total_steps = len(record['train'][title])
    x_1 = list(map(int, record['train']['iter']))
    x_2 = list(map(int, record['val']['iter']))
    figure(figsize=(10, 6))
    plt.plot(x_1, record['train'][title], c='tab:red', label='train')
    plt.plot(x_2, record['val'][title], c='tab:cyan', label='val')
    plt.ylim(ymin, ymax)
    plt.xlabel('Training steps')
    plt.ylabel(ylabel)
    plt.title('Learning curve of {}'.format(title))
    plt.legend()
    plt.show()
plot_learning_curve(loss_record, title='loss', ylabel='CE Loss')

在这里插入图片描述

plot_learning_curve(acc_record, title='acc', ylabel='Accuracy')

在这里插入图片描述

import time
work_path = 'work/model'
model = EMO_1M()
model_state_dict = paddle.load(os.path.join(work_path, 'best_model.pdparams'))
model.set_state_dict(model_state_dict)
model.eval()
aa = time.time()
for batch_id, data in enumerate(val_loader):

    x_data, y_data = data
    labels = paddle.unsqueeze(y_data, axis=1)
    with paddle.no_grad():
        logits = model(x_data)
bb = time.time()
print("Throughout:{}".format(int(len(val_dataset)//(bb - aa))))
Throughout:1006
def get_cifar10_labels(labels):  
    """返回CIFAR10数据集的文本标签。"""
    text_labels = [
        'airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog',
        'horse', 'ship', 'truck']
    return [text_labels[int(i)] for i in labels]
def show_images(imgs, num_rows, num_cols, pred=None, gt=None, scale=1.5):  
    """Plot a list of images."""
    figsize = (num_cols * scale, num_rows * scale)
    _, axes = plt.subplots(num_rows, num_cols, figsize=figsize)
    axes = axes.flatten()
    for i, (ax, img) in enumerate(zip(axes, imgs)):
        if paddle.is_tensor(img):
            ax.imshow(img.numpy())
        else:
            ax.imshow(img)
        ax.axes.get_xaxis().set_visible(False)
        ax.axes.get_yaxis().set_visible(False)
        if pred or gt:
            ax.set_title("pt: " + pred[i] + "\ngt: " + gt[i])
    return axes
work_path = 'work/model'
X, y = next(iter(DataLoader(val_dataset, batch_size=18)))
model = EMO_1M()
model_state_dict = paddle.load(os.path.join(work_path, 'best_model.pdparams'))
model.set_state_dict(model_state_dict)
model.eval()
logits = model(X)
y_pred = paddle.argmax(logits, -1)
X = paddle.transpose(X, [0, 2, 3, 1])
axes = show_images(X.reshape((18, 224, 224, 3)), 1, 18, pred=get_cifar10_labels(y_pred), gt=get_cifar10_labels(y))
plt.show()
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-4fuqcfps-1683905523176)(main_files/main_44_1.png)]

!pip install interpretdl
import interpretdl as it
work_path = 'work/model'
model = EMO_1M()
model_state_dict = paddle.load(os.path.join(work_path, 'best_model.pdparams'))
model.set_state_dict(model_state_dict)
X, y = next(iter(DataLoader(val_dataset, batch_size=18)))
lime = it.LIMECVInterpreter(model)
lime_weights = lime.interpret(X.numpy()[3], interpret_class=y.numpy()[3], batch_size=100, num_samples=10000, visual=True)
100%|██████████| 10000/10000 [00:50<00:00, 197.30it/s]

在这里插入图片描述

lime_weights = lime.interpret(X.numpy()[13], interpret_class=y.numpy()[13], batch_size=100, num_samples=10000, visual=True)
100%|██████████| 10000/10000 [00:50<00:00, 199.73it/s]

:50<00:00, 199.73it/s]

在这里插入图片描述

总结

        本文分析了当前主流模块的结构,并从中抽象出了Meta-Mobile Block,并提出了一种新的移动块——iRMB,兼具CNN和Transformer的优点,达到了很好的性能与开销的权衡。

参考文献

  1. 论文:Rethinking Mobile Block for Efficient Neural Models
  2. 代码:zhangzjn/EMO

此文章为搬运
原项目链接

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值