InceptionNext:当Inception遇到ConvNext

本文介绍了InceptionNext模型,该模型通过将大核深度卷积分解为多个小内核和带状内核,以及恒等映射,解决了大内核卷积在内存访问上的问题,提高了基于大内核的CNN模型的训练吞吐量和性能。InceptionNext系列网络在保持高性能的同时,实现了更快的运算速度,有望成为未来架构设计的基准。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

★★★ 本文源自AlStudio社区精品项目,【点击此处】查看更多精品内容 >>>

摘要

        受ViTs长距离建模能力的启发,大核卷积算法近年来被广泛研究和采用,以扩大感受野,提高模型性能,如著名的工作ConvNeXt采用了7×7深度卷积。 虽然这种深度算子只消耗了少量的FLOPs,但由于内存访问成本较高,在强大的计算设备上极大地损害了模型的效率。 例如,ConvNeXt-T与ResNet-50也有类似的FLOPs,但当在100个GPU上进行全精度的训练时,只有60%的吞吐量。 虽然减小ConvNeXt的内核大小可以提高速度,但它会导致显著的性能下降。 目前还不清楚如何在保持性能的同时加快基于大内核的CNN模型的速度。 为了解决这个问题,我们在前人的启发下,提出了将大核深度卷积分解为四个平行的分支,即小平方核、两个正交带核和一个恒等映射。 通过这种新的Inception深度卷积,我们构建了一系列网络,即IncepitonNext,这些网络不仅拥有高吞吐量,而且保持了有竞争力的性能。 例如,InceptionNext-T的训练吞吐量比Convnex-T高1.6倍,在ImageNet1K上的精度提高了0.2%。 我们预计InceptionNext可以作为未来架构设计的一个经济基线,以减少碳排放。

1. InceptionNext

1.1 MetaNeXt

        ConvNeXt是一个结构简单的现代CNN模型,对于每个ConvNeXt块,输入X首先由深度卷积处理,以沿空间维度传播信息。遵循MetaFormer将深度卷积抽象为负责空间信息交互的token mixer。因此,如图2所示,ConvNeXt被抽象为MetaNeXt,形式上,在MetaNeXt块中,其输入X首先被处理为:

X ′ = TokenMixer ⁡ ( X ) X^{\prime}=\operatorname{TokenMixer}(X) X=TokenMixer(X)

        然后,对来自token mixer的输出进行归一化,公式如下所示:

Y = Norm ⁡ ( X ′ ) Y=\operatorname{Norm}\left(X^{\prime}\right) Y=Norm(X)

        在归一化之后,将得到的特征输入到由两个完全连接层组成的MLP模块中。此外,还采用shorcut connection,具体操作如下公式所示:

Y = Conv ⁡ 1 × 1 r C → C { σ [ Conv ⁡ 1 × 1 C → r C ( Y ) ] } + X Y=\operatorname{Conv}_{1 \times 1}^{r C \rightarrow C}\left\{\sigma\left[\operatorname{Conv}_{1 \times 1}^{C \rightarrow r C}(Y)\right]\right\}+X Y=Conv1×1rCC{σ[Conv1×1CrC(Y)]}+X

1.2 Inception depthwise convolution

        如图1所示,具有大内核大小的传统深度卷积显著阻碍了模型速度。受ShuffleNetv2的启发,发现处理部分通道对于单个深度卷积层已经足够。因此,保持部分通道不变,并将其表示为恒等映射的一个分支。对于处理通道,建议使用Inception风格分解深度操作。Inception利用小内核和大内核的几个分支。类似地,采用3×3作为分支之一,但由于其缓慢的实际速度而避免使用大内核。相反,受Incepotion v3的启发,大内核 k * k 分解为 1 * k 和 k * 1 ,具体来说,对于输入X,沿着通道维度将其分为四组:

X h w , X w , X h , X i d = Split ⁡ ( X ) = X : , : g , X : g : 2 g , X : 2 g : 3 g , X : 3 g \begin{aligned} X_{\mathrm{hw}}, X_{\mathrm{w}}, X_{\mathrm{h}}, X_{\mathrm{id}} & =\operatorname{Split}(X) \\ & =X_{:,: g}, X_{: g: 2 g}, X_{: 2 g: 3 g}, X_{: 3 g} \end{aligned} Xhw,Xw,Xh,Xid=Split(X)=X:,:g,X:g:2g,X:2g:3g,X:3g

        每一组使用不同的卷积核进行训练:

X h w ′ = D W C o n v k s × k s g → g g ( X h W ) , X w ′ = D W C o n v 1 × k b g → X W ) , X h ′ = D W C o n v k b × 1 g → g g ( X h ) , X i d ′ = X i d . \begin{aligned} X_{\mathrm{hw}}^{\prime} & =\mathrm{DWConv}_{k_{s} \times k_{s}}^{g \rightarrow g} g\left(X_{\mathrm{hW}}\right), \\ X_{\mathrm{w}}^{\prime} & =\mathrm{DWConv}_{1 \times k_{b}}^{\left.g \rightarrow X_{\mathrm{W}}\right)}, \\ X_{\mathrm{h}}^{\prime} & =\mathrm{DWConv}_{k_{b} \times 1}^{g \rightarrow g} g\left(X_{\mathrm{h}}\right), \\ X_{\mathrm{id}}^{\prime} & =X_{\mathrm{id}} . \end{aligned} XhwXwXhXid=DWConvks×ksggg(XhW),=DWConv1×kbgXW),=DWConvkb×1ggg(Xh),=Xid.

        最后将它们合并起来:

X ′ = Concat ⁡ ( X h w ′ , X w ′ , X h ′ , X i d ′ ) X^{\prime}=\operatorname{Concat}\left(X_{\mathrm{hw}}^{\prime}, X_{\mathrm{w}}^{\prime}, X_{\mathrm{h}}^{\prime}, X_{\mathrm{id}}^{\prime}\right) X=Concat(Xhw,Xw,Xh,Xid)

2. 代码复现

2.1 下载并导入所需的库

!pip install paddlex
%matplotlib inline
import paddle
import paddle.fluid as fluid
import numpy as np
import matplotlib.pyplot as plt
from paddle.vision.datasets import Cifar10
from paddle.vision.transforms import Transpose
from paddle.io import Dataset, DataLoader
from paddle import nn
import paddle.nn.functional as F
import paddle.vision.transforms as transforms
import os
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
import paddlex
from functools import partial

2.2 创建数据集

train_tfm = transforms.Compose([
    transforms.RandomResizedCrop(224, scale=(0.6, 1.0)),
    transforms.ColorJitter(brightness=0.2,contrast=0.2, saturation=0.2),
    transforms.RandomHorizontalFlip(0.5),
    transforms.RandomRotation(20),
    paddlex.transforms.MixupImage(),
    transforms.ToTensor(),
    transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])

test_tfm = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])
paddle.vision.set_image_backend('cv2')
# 使用Cifar10数据集
train_dataset = Cifar10(data_file='data/data152754/cifar-10-python.tar.gz', mode='train', transform = train_tfm, )
val_dataset = Cifar10(data_file='data/data152754/cifar-10-python.tar.gz', mode='test',transform = test_tfm)
print("train_dataset: %d" % len(train_dataset))
print("val_dataset: %d" % len(val_dataset))
train_dataset: 50000
val_dataset: 10000
batch_size=256
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, drop_last=True, num_workers=4)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, drop_last=False, num_workers=4)

2.3 模型的创建

2.3.1 标签平滑
class LabelSmoothingCrossEntropy(nn.Layer):
    def __init__(self, smoothing=0.1):
        super().__init__()
        self.smoothing = smoothing

    def forward(self, pred, target):

        confidence = 1. - self.smoothing
        log_probs = F.log_softmax(pred, axis=-1)
        idx = paddle.stack([paddle.arange(log_probs.shape[0]), target], axis=1)
        nll_loss = paddle.gather_nd(-log_probs, index=idx)
        smooth_loss = paddle.mean(-log_probs, axis=-1)
        loss = confidence * nll_loss + self.smoothing * smooth_loss

        return loss.mean()
2.3.2 DropPath
def drop_path(x, drop_prob=0.0, training=False):
    """
    Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
    the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper...
    See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ...
    """
    if drop_prob == 0.0 or not training:
        return x
    keep_prob = paddle.to_tensor(1 - drop_prob)
    shape = (paddle.shape(x)[0],) + (1,) * (x.ndim - 1)
    random_tensor = keep_prob + paddle.rand(shape, dtype=x.dtype)
    random_tensor = paddle.floor(random_tensor)  # binarize
    output = x.divide(keep_prob) * random_tensor
    return output


class DropPath(nn.Layer):
    def __init__(self, drop_prob=None):
        super(DropPath, self).__init__()
        self.drop_prob = drop_prob

    def forward(self, x):
        return drop_path(x, self.drop_prob, self.training)
2.3.3 InceptionNext模型的创建
class InceptionDWConv2D(nn.Layer):
    def __init__(self, in_channels, square_kernel_size=3, band_kernel_size=11, branch_ratio=0.125):
        super().__init__()
        self.in_channels = in_channels
        self.gc = int(in_channels * branch_ratio)
        self.dwconvhw = nn.Conv2D(self.gc, self.gc, square_kernel_size, padding=square_kernel_size // 2, groups=self.gc)
        self.dwconvh = nn.Conv2D(self.gc, self.gc, (band_kernel_size, 1), padding=(band_kernel_size // 2, 0), groups=self.gc)
        self.dwconvw = nn.Conv2D(self.gc, self.gc, (1, band_kernel_size), padding=(0, band_kernel_size // 2), groups=self.gc)

    def forward(self, x):
        x_id, x_hw, x_h, x_w = paddle.split(x, [self.in_channels - 3 * self.gc, self.gc, self.gc, self.gc], axis=1)

        x_hw = self.dwconvhw(x_hw)
        x_h = self.dwconvh(x_h)
        x_w = self.dwconvw(x_w)

        out = paddle.concat([x_id, x_hw, x_h, x_w], axis=1)
        return out

class ConvMlp(nn.Layer):
    """ MLP using 1x1 convs that keeps spatial dims
    copied from timm: https://github.com/huggingface/pytorch-image-models/blob/v0.6.11/timm/models/layers/mlp.py
    """
    def __init__(
            self, in_features, hidden_features=None, out_features=None, act_layer=nn.ReLU,
            norm_layer=None, bias=True, drop=0.):
        super().__init__()
        out_features = out_features or in_features
        hidden_features = hidden_features or in_features

        self.fc1 = nn.Conv2D(in_features, hidden_features, kernel_size=1, bias_attr=bias)
        self.norm = norm_layer(hidden_features) if norm_layer else nn.Identity()
        self.act = act_layer()
        self.drop = nn.Dropout(drop)
        self.fc2 = nn.Conv2D(hidden_features, out_features, kernel_size=1, bias_attr=bias)

    def forward(self, x):
        x = self.fc1(x)
        x = self.norm(x)
        x = self.act(x)
        x = self.drop(x)
        x = self.fc2(x)
        return x
class MlpHead(nn.Layer):
    """ MLP classification head
    """
    def __init__(self, dim, num_classes=1000, mlp_ratio=3, act_layer=nn.GELU,
        norm_layer=partial(nn.LayerNorm, epsilon=1e-6), drop=0., bias=True):
        super().__init__()
        hidden_features = int(mlp_ratio * dim)
        self.fc1 = nn.Linear(dim, hidden_features, bias_attr=bias)
        self.act = act_layer()
        self.norm = norm_layer(hidden_features)
        self.fc2 = nn.Linear(hidden_features, num_classes, bias_attr=bias)
        self.drop = nn.Dropout(drop)

    def forward(self, x):
        x = x.mean([2, 3]) # global average pooling
        x = self.fc1(x)
        x = self.act(x)
        x = self.norm(x)
        x = self.drop(x)
        x = self.fc2(x)
        return x
class MetaNeXtBlock(nn.Layer):
    """ MetaNeXtBlock Block
    Args:
        dim (int): Number of input channels.
        drop_path (float): Stochastic depth rate. Default: 0.0
        ls_init_value (float): Init value for Layer Scale. Default: 1e-6.
    """

    def __init__(
            self,
            dim,
            token_mixer=InceptionDWConv2D,
            norm_layer=nn.BatchNorm2D,
            mlp_layer=ConvMlp,
            mlp_ratio=4,
            act_layer=nn.GELU,
            ls_init_value=1e-6,
            drop_path=0.,

    ):
        super().__init__()
        self.token_mixer = token_mixer(dim)
        self.norm = norm_layer(dim)
        self.mlp = mlp_layer(dim, int(mlp_ratio * dim), act_layer=act_layer)
        self.gamma = self.create_parameter([1, dim, 1, 1],
            default_initializer=nn.initializer.Constant(ls_init_value)) if ls_init_value else None
        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()

    def forward(self, x):
        shortcut = x
        x = self.token_mixer(x)
        x = self.norm(x)
        x = self.mlp(x)
        if self.gamma is not None:
            x = x * self.gamma
        x = self.drop_path(x) + shortcut
        return x
class MetaNeXtStage(nn.Layer):
    def __init__(
            self,
            in_chs,
            out_chs,
            ds_stride=2,
            depth=2,
            drop_path_rates=None,
            ls_init_value=1.0,
            act_layer=nn.GELU,
            norm_layer=None,
            mlp_ratio=4,
    ):
        super().__init__()
        if ds_stride > 1:
            self.downsample = nn.Sequential(
                norm_layer(in_chs),
                nn.Conv2D(in_chs, out_chs, kernel_size=ds_stride, stride=ds_stride),
            )
        else:
            self.downsample = nn.Identity()

        drop_path_rates = drop_path_rates or [0.] * depth
        stage_blocks = []
        for i in range(depth):
            stage_blocks.append(MetaNeXtBlock(
                dim=out_chs,
                drop_path=drop_path_rates[i],
                ls_init_value=ls_init_value,
                act_layer=act_layer,
                norm_layer=norm_layer,
                mlp_ratio=mlp_ratio,
            ))
            in_chs = out_chs
        self.blocks = nn.Sequential(*stage_blocks)

    def forward(self, x):
        x = self.downsample(x)
        x = self.blocks(x)
        return x
class MetaNeXt(nn.Layer):
    r""" MetaNeXt
        A PyTorch impl of : `InceptionNeXt: When Inception Meets ConvNeXt`  - https://arxiv.org/pdf/2203.xxxxx.pdf
    Args:
        in_chans (int): Number of input image channels. Default: 3
        num_classes (int): Number of classes for classification head. Default: 1000
        depths (tuple(int)): Number of blocks at each stage. Default: (3, 3, 9, 3)
        dims (tuple(int)): Feature dimension at each stage. Default: (96, 192, 384, 768)
        token_mixers: Token mixer function. Default: nn.Identity
        norm_layer: Normalziation layer. Default: nn.BatchNorm2d
        act_layer: Activation function for MLP. Default: nn.GELU
        mlp_ratios (int or tuple(int)): MLP ratios. Default: (4, 4, 4, 3)
        head_fn: classifier head
        drop_rate (float): Head dropout rate
        drop_path_rate (float): Stochastic depth rate. Default: 0.
        ls_init_value (float): Init value for Layer Scale. Default: 1e-6.
    """

    def __init__(
            self,
            in_chans=3,
            num_classes=1000,
            depths=(3, 3, 9, 3),
            dims=(96, 192, 384, 768),
            token_mixers=nn.Identity,
            norm_layer=nn.BatchNorm2D,
            act_layer=nn.GELU,
            mlp_ratios=(4, 4, 4, 3),
            head_fn=MlpHead,
            drop_rate=0.,
            drop_path_rate=0.,
            ls_init_value=1e-6,
            **kwargs,
    ):
        super().__init__()

        num_stage = len(depths)
        if not isinstance(token_mixers, (list, tuple)):
            token_mixers = [token_mixers] * num_stage
        if not isinstance(mlp_ratios, (list, tuple)):
            mlp_ratios = [mlp_ratios] * num_stage


        self.num_classes = num_classes
        self.drop_rate = drop_rate
        self.stem = nn.Sequential(
            nn.Conv2D(in_chans, dims[0], kernel_size=4, stride=4),
            norm_layer(dims[0])
        )

        self.stages = nn.Sequential()
        dp_rates = [x.tolist() for x in paddle.linspace(0, drop_path_rate, sum(depths)).split(depths)]
        stages = []
        prev_chs = dims[0]
        # feature resolution stages, each consisting of multiple residual blocks
        for i in range(num_stage):
            out_chs = dims[i]
            stages.append(MetaNeXtStage(
                prev_chs,
                out_chs,
                ds_stride=2 if i > 0 else 1,
                depth=depths[i],
                drop_path_rates=dp_rates[i],
                ls_init_value=ls_init_value,
                act_layer=act_layer,
                norm_layer=norm_layer,
                mlp_ratio=mlp_ratios[i],
            ))
            prev_chs = out_chs
        self.stages = nn.Sequential(*stages)
        self.num_features = prev_chs
        self.head = head_fn(self.num_features, num_classes, drop=drop_rate)
        self.apply(self._init_weights)

    def _init_weights(self, m):
        tn = nn.initializer.TruncatedNormal(std=.02)
        ones = nn.initializer.Constant(1.0)
        zeros = nn.initializer.Constant(0.0)
        if isinstance(m, (nn.Conv2D, nn.Linear)):
            tn(m.weight)
            if m.bias is not None:
                zeros(m.bias)
        elif isinstance(m, (nn.LayerNorm, nn.BatchNorm2D)):
            zeros(m.bias)
            ones(m.weight)

    def forward_features(self, x):
        x = self.stem(x)
        x = self.stages(x)
        return x

    def forward_head(self, x):
        x = self.head(x)
        return x

    def forward(self, x):
        x = self.forward_features(x)
        x = self.forward_head(x)
        return x
num_classes = 10

def inceptionnext_tiny():
    model = MetaNeXt(depths=(3, 3, 9, 3), dims=(96, 192, 384, 768),
                      token_mixers=InceptionDWConv2D, num_classes=num_classes
    )
    return model


def inceptionnext_small():
    model = MetaNeXt(depths=(3, 3, 27, 3), dims=(96, 192, 384, 768),
                      token_mixers=InceptionDWConv2D, num_classes=num_classes
    )
    return model


def inceptionnext_base():
    model = MetaNeXt(depths=(3, 3, 27, 3), dims=(128, 256, 512, 1024),
                      token_mixers=InceptionDWConv2D, num_classes=num_classes
    )
    return model
2.3.4 模型的参数
model = inceptionnext_tiny()
paddle.summary(model, (1, 3, 224, 224))

model = inceptionnext_small()
paddle.summary(model, (1, 3, 224, 224))

model = inceptionnext_base()
paddle.summary(model, (1, 3, 224, 224))

2.4 训练

learning_rate = 0.00025
n_epochs = 100
paddle.seed(42)
np.random.seed(42)
work_path = 'work/model'

# InceptionNext-T
model = inceptionnext_tiny()

criterion = LabelSmoothingCrossEntropy()

scheduler = paddle.optimizer.lr.CosineAnnealingDecay(learning_rate=learning_rate, T_max=50000 // batch_size * n_epochs, verbose=False)
optimizer = paddle.optimizer.Adam(parameters=model.parameters(), learning_rate=scheduler, weight_decay=1e-5)

gate = 0.0
threshold = 0.0
best_acc = 0.0
val_acc = 0.0
loss_record = {'train': {'loss': [], 'iter': []}, 'val': {'loss': [], 'iter': []}}   # for recording loss
acc_record = {'train': {'acc': [], 'iter': []}, 'val': {'acc': [], 'iter': []}}      # for recording accuracy

loss_iter = 0
acc_iter = 0

for epoch in range(n_epochs):
    # ---------- Training ----------
    model.train()
    train_num = 0.0
    train_loss = 0.0

    val_num = 0.0
    val_loss = 0.0
    accuracy_manager = paddle.metric.Accuracy()
    val_accuracy_manager = paddle.metric.Accuracy()
    print("#===epoch: {}, lr={:.10f}===#".format(epoch, optimizer.get_lr()))
    for batch_id, data in enumerate(train_loader):
        x_data, y_data = data
        labels = paddle.unsqueeze(y_data, axis=1)

        logits = model(x_data)

        loss = criterion(logits, y_data)

        acc = paddle.metric.accuracy(logits, labels)
        accuracy_manager.update(acc)
        if batch_id % 10 == 0:
            loss_record['train']['loss'].append(loss.numpy())
            loss_record['train']['iter'].append(loss_iter)
            loss_iter += 1

        loss.backward()

        optimizer.step()
        scheduler.step()
        optimizer.clear_grad()

        train_loss += loss
        train_num += len(y_data)

    total_train_loss = (train_loss / train_num) * batch_size
    train_acc = accuracy_manager.accumulate()
    acc_record['train']['acc'].append(train_acc)
    acc_record['train']['iter'].append(acc_iter)
    acc_iter += 1
    # Print the information.
    print("#===epoch: {}, train loss is: {}, train acc is: {:2.2f}%===#".format(epoch, total_train_loss.numpy(), train_acc*100))

    # ---------- Validation ----------
    model.eval()

    for batch_id, data in enumerate(val_loader):

        x_data, y_data = data
        labels = paddle.unsqueeze(y_data, axis=1)
        with paddle.no_grad():
          logits = model(x_data)

        loss = criterion(logits, y_data)

        acc = paddle.metric.accuracy(logits, labels)
        val_accuracy_manager.update(acc)

        val_loss += loss
        val_num += len(y_data)

    total_val_loss = (val_loss / val_num) * batch_size
    loss_record['val']['loss'].append(total_val_loss.numpy())
    loss_record['val']['iter'].append(loss_iter)
    val_acc = val_accuracy_manager.accumulate()
    acc_record['val']['acc'].append(val_acc)
    acc_record['val']['iter'].append(acc_iter)

    print("#===epoch: {}, val loss is: {}, val acc is: {:2.2f}%===#".format(epoch, total_val_loss.numpy(), val_acc*100))

    # ===================save====================
    if val_acc > best_acc:
        best_acc = val_acc
        paddle.save(model.state_dict(), os.path.join(work_path, 'best_model.pdparams'))
        paddle.save(optimizer.state_dict(), os.path.join(work_path, 'best_optimizer.pdopt'))

print(best_acc)
paddle.save(model.state_dict(), os.path.join(work_path, 'final_model.pdparams'))
paddle.save(optimizer.state_dict(), os.path.join(work_path, 'final_optimizer.pdopt'))

2.5 结果分析

def plot_learning_curve(record, title='loss', ylabel='CE Loss'):
    ''' Plot learning curve of your CNN '''
    maxtrain = max(map(float, record['train'][title]))
    maxval = max(map(float, record['val'][title]))
    ymax = max(maxtrain, maxval) * 1.1
    mintrain = min(map(float, record['train'][title]))
    minval = min(map(float, record['val'][title]))
    ymin = min(mintrain, minval) * 0.9

    total_steps = len(record['train'][title])
    x_1 = list(map(int, record['train']['iter']))
    x_2 = list(map(int, record['val']['iter']))
    figure(figsize=(10, 6))
    plt.plot(x_1, record['train'][title], c='tab:red', label='train')
    plt.plot(x_2, record['val'][title], c='tab:cyan', label='val')
    plt.ylim(ymin, ymax)
    plt.xlabel('Training steps')
    plt.ylabel(ylabel)
    plt.title('Learning curve of {}'.format(title))
    plt.legend()
    plt.show()
plot_learning_curve(loss_record, title='loss', ylabel='CE Loss')

在这里插入图片描述

plot_learning_curve(acc_record, title='acc', ylabel='Accuracy')

在这里插入图片描述

import time
work_path = 'work/model'
model = inceptionnext_tiny()
model_state_dict = paddle.load(os.path.join(work_path, 'best_model.pdparams'))
model.set_state_dict(model_state_dict)
model.eval()
aa = time.time()
for batch_id, data in enumerate(val_loader):

    x_data, y_data = data
    labels = paddle.unsqueeze(y_data, axis=1)
    with paddle.no_grad():
        logits = model(x_data)
bb = time.time()
print("Throughout:{}".format(int(len(val_dataset)//(bb - aa))))
Throughout:724
def get_cifar10_labels(labels):
    """返回CIFAR10数据集的文本标签。"""
    text_labels = [
        'airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog',
        'horse', 'ship', 'truck']
    return [text_labels[int(i)] for i in labels]
def show_images(imgs, num_rows, num_cols, pred=None, gt=None, scale=1.5):
    """Plot a list of images."""
    figsize = (num_cols * scale, num_rows * scale)
    _, axes = plt.subplots(num_rows, num_cols, figsize=figsize)
    axes = axes.flatten()
    for i, (ax, img) in enumerate(zip(axes, imgs)):
        if paddle.is_tensor(img):
            ax.imshow(img.numpy())
        else:
            ax.imshow(img)
        ax.axes.get_xaxis().set_visible(False)
        ax.axes.get_yaxis().set_visible(False)
        if pred or gt:
            ax.set_title("pt: " + pred[i] + "\ngt: " + gt[i])
    return axes
work_path = 'work/model'
X, y = next(iter(DataLoader(val_dataset, batch_size=18)))
model = inceptionnext_tiny()
model_state_dict = paddle.load(os.path.join(work_path, 'best_model.pdparams'))
model.set_state_dict(model_state_dict)
model.eval()
logits = model(X)
y_pred = paddle.argmax(logits, -1)
X = paddle.transpose(X, [0, 2, 3, 1])
axes = show_images(X.reshape((18, 224, 224, 3)), 1, 18, pred=get_cifar10_labels(y_pred), gt=get_cifar10_labels(y))
plt.show()
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).

在这里插入图片描述

!pip install interpretdl
import interpretdl as it
work_path = 'work/model'
model = inceptionnext_tiny()
model_state_dict = paddle.load(os.path.join(work_path, 'best_model.pdparams'))
model.set_state_dict(model_state_dict)
X, y = next(iter(DataLoader(val_dataset, batch_size=18)))
lime = it.LIMECVInterpreter(model)
lime_weights = lime.interpret(X.numpy()[3], interpret_class=y.numpy()[3], batch_size=100, num_samples=10000, visual=True)
100%|██████████| 10000/10000 [00:57<00:00, 174.10it/s]

57<00:00, 174.10it/s]

在这里插入图片描述

总结

        本文提出了将通道进行划分,一部分做恒等映射,其余部分分别使用条状大卷积核和普通小卷积核进行深度卷积,缓解了传统大卷积核访存慢的问题,模型简单有效。

参考文献

  1. InceptionNeXt: When Inception Meets ConvNeXt
  2. sail-sg/inceptionnext

此文章为搬运
原项目链接

当"Inception"遇上"ConvNet"的时候,我们可以想象到一种全新的神经网络模型,结合了二者的精华,提升了模型的表达能力和性能。 首先,"Inception"是一种著名的卷积神经网络模型,以其高度模块化和多尺度特征提取而闻名。而"ConvNet"指的是卷积神经网络,它主要用于图像处理、计算机视觉等任务。 当这两种模型相结合时,我们可以期望到以下优势。首先,通过引入"Inception"的模块化设计,我们可以更好地平衡不同层级的特征提取,从而充分利用图像中的多尺度信息。这将更好地捕捉到图像中的细节和全局特征。 其次,将"Inception"的思想融入"ConvNet"中,可以增加模型的层数和宽度,从而增强模型的非线性拟合能力和表示能力。这将对于更复杂的图像任务来说,如物体识别、目标检测等,具有重要的意义。 此外,结合了两者的特点,我们可以通过精心的设计,进一步提高模型的计算效率。例如,可以通过引入空间金字塔池化等技术,减少模型的参数量和计算复杂度,提高模型的训练和推理速度。 总的来说,当"Inception"遇上"ConvNet",我们可以期待到一个更强大、更高效的神经网络模型。它将继承"Inception"的模块化设计和多尺度特征提取能力,同时结合了"ConvNet"的表达能力和广泛应用性。这将推动图像处理和计算机视觉领域的发展,为更广泛的应用场景提供更好的解决方案。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值