★★★ 本文源自AlStudio社区精品项目,【点击此处】查看更多精品内容 >>>
摘要
受ViTs长距离建模能力的启发,大核卷积算法近年来被广泛研究和采用,以扩大感受野,提高模型性能,如著名的工作ConvNeXt采用了7×7深度卷积。 虽然这种深度算子只消耗了少量的FLOPs,但由于内存访问成本较高,在强大的计算设备上极大地损害了模型的效率。 例如,ConvNeXt-T与ResNet-50也有类似的FLOPs,但当在100个GPU上进行全精度的训练时,只有60%的吞吐量。 虽然减小ConvNeXt的内核大小可以提高速度,但它会导致显著的性能下降。 目前还不清楚如何在保持性能的同时加快基于大内核的CNN模型的速度。 为了解决这个问题,我们在前人的启发下,提出了将大核深度卷积分解为四个平行的分支,即小平方核、两个正交带核和一个恒等映射。 通过这种新的Inception深度卷积,我们构建了一系列网络,即IncepitonNext,这些网络不仅拥有高吞吐量,而且保持了有竞争力的性能。 例如,InceptionNext-T的训练吞吐量比Convnex-T高1.6倍,在ImageNet1K上的精度提高了0.2%。 我们预计InceptionNext可以作为未来架构设计的一个经济基线,以减少碳排放。
1. InceptionNext
1.1 MetaNeXt
ConvNeXt是一个结构简单的现代CNN模型,对于每个ConvNeXt块,输入X首先由深度卷积处理,以沿空间维度传播信息。遵循MetaFormer将深度卷积抽象为负责空间信息交互的token mixer。因此,如图2所示,ConvNeXt被抽象为MetaNeXt,形式上,在MetaNeXt块中,其输入X首先被处理为:
X ′ = TokenMixer ( X ) X^{\prime}=\operatorname{TokenMixer}(X) X′=TokenMixer(X)
然后,对来自token mixer的输出进行归一化,公式如下所示:
Y = Norm ( X ′ ) Y=\operatorname{Norm}\left(X^{\prime}\right) Y=Norm(X′)
在归一化之后,将得到的特征输入到由两个完全连接层组成的MLP模块中。此外,还采用shorcut connection,具体操作如下公式所示:
Y = Conv 1 × 1 r C → C { σ [ Conv 1 × 1 C → r C ( Y ) ] } + X Y=\operatorname{Conv}_{1 \times 1}^{r C \rightarrow C}\left\{\sigma\left[\operatorname{Conv}_{1 \times 1}^{C \rightarrow r C}(Y)\right]\right\}+X Y=Conv1×1rC→C{σ[Conv1×1C→rC(Y)]}+X
1.2 Inception depthwise convolution
如图1所示,具有大内核大小的传统深度卷积显著阻碍了模型速度。受ShuffleNetv2的启发,发现处理部分通道对于单个深度卷积层已经足够。因此,保持部分通道不变,并将其表示为恒等映射的一个分支。对于处理通道,建议使用Inception风格分解深度操作。Inception利用小内核和大内核的几个分支。类似地,采用3×3作为分支之一,但由于其缓慢的实际速度而避免使用大内核。相反,受Incepotion v3的启发,大内核 k * k 分解为 1 * k 和 k * 1 ,具体来说,对于输入X,沿着通道维度将其分为四组:
X h w , X w , X h , X i d = Split ( X ) = X : , : g , X : g : 2 g , X : 2 g : 3 g , X : 3 g \begin{aligned} X_{\mathrm{hw}}, X_{\mathrm{w}}, X_{\mathrm{h}}, X_{\mathrm{id}} & =\operatorname{Split}(X) \\ & =X_{:,: g}, X_{: g: 2 g}, X_{: 2 g: 3 g}, X_{: 3 g} \end{aligned} Xhw,Xw,Xh,Xid=Split(X)=X:,:g,X:g:2g,X:2g:3g,X:3g
每一组使用不同的卷积核进行训练:
X h w ′ = D W C o n v k s × k s g → g g ( X h W ) , X w ′ = D W C o n v 1 × k b g → X W ) , X h ′ = D W C o n v k b × 1 g → g g ( X h ) , X i d ′ = X i d . \begin{aligned} X_{\mathrm{hw}}^{\prime} & =\mathrm{DWConv}_{k_{s} \times k_{s}}^{g \rightarrow g} g\left(X_{\mathrm{hW}}\right), \\ X_{\mathrm{w}}^{\prime} & =\mathrm{DWConv}_{1 \times k_{b}}^{\left.g \rightarrow X_{\mathrm{W}}\right)}, \\ X_{\mathrm{h}}^{\prime} & =\mathrm{DWConv}_{k_{b} \times 1}^{g \rightarrow g} g\left(X_{\mathrm{h}}\right), \\ X_{\mathrm{id}}^{\prime} & =X_{\mathrm{id}} . \end{aligned} Xhw′Xw′Xh′Xid′=DWConvks×ksg→gg(XhW),=DWConv1×kbg→XW),=DWConvkb×1g→gg(Xh),=Xid.
最后将它们合并起来:
X ′ = Concat ( X h w ′ , X w ′ , X h ′ , X i d ′ ) X^{\prime}=\operatorname{Concat}\left(X_{\mathrm{hw}}^{\prime}, X_{\mathrm{w}}^{\prime}, X_{\mathrm{h}}^{\prime}, X_{\mathrm{id}}^{\prime}\right) X′=Concat(Xhw′,Xw′,Xh′,Xid′)
2. 代码复现
2.1 下载并导入所需的库
!pip install paddlex
%matplotlib inline
import paddle
import paddle.fluid as fluid
import numpy as np
import matplotlib.pyplot as plt
from paddle.vision.datasets import Cifar10
from paddle.vision.transforms import Transpose
from paddle.io import Dataset, DataLoader
from paddle import nn
import paddle.nn.functional as F
import paddle.vision.transforms as transforms
import os
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
import paddlex
from functools import partial
2.2 创建数据集
train_tfm = transforms.Compose([
transforms.RandomResizedCrop(224, scale=(0.6, 1.0)),
transforms.ColorJitter(brightness=0.2,contrast=0.2, saturation=0.2),
transforms.RandomHorizontalFlip(0.5),
transforms.RandomRotation(20),
paddlex.transforms.MixupImage(),
transforms.ToTensor(),
transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])
test_tfm = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])
paddle.vision.set_image_backend('cv2')
# 使用Cifar10数据集
train_dataset = Cifar10(data_file='data/data152754/cifar-10-python.tar.gz', mode='train', transform = train_tfm, )
val_dataset = Cifar10(data_file='data/data152754/cifar-10-python.tar.gz', mode='test',transform = test_tfm)
print("train_dataset: %d" % len(train_dataset))
print("val_dataset: %d" % len(val_dataset))
train_dataset: 50000
val_dataset: 10000
batch_size=256
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, drop_last=True, num_workers=4)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, drop_last=False, num_workers=4)
2.3 模型的创建
2.3.1 标签平滑
class LabelSmoothingCrossEntropy(nn.Layer):
def __init__(self, smoothing=0.1):
super().__init__()
self.smoothing = smoothing
def forward(self, pred, target):
confidence = 1. - self.smoothing
log_probs = F.log_softmax(pred, axis=-1)
idx = paddle.stack([paddle.arange(log_probs.shape[0]), target], axis=1)
nll_loss = paddle.gather_nd(-log_probs, index=idx)
smooth_loss = paddle.mean(-log_probs, axis=-1)
loss = confidence * nll_loss + self.smoothing * smooth_loss
return loss.mean()
2.3.2 DropPath
def drop_path(x, drop_prob=0.0, training=False):
"""
Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper...
See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ...
"""
if drop_prob == 0.0 or not training:
return x
keep_prob = paddle.to_tensor(1 - drop_prob)
shape = (paddle.shape(x)[0],) + (1,) * (x.ndim - 1)
random_tensor = keep_prob + paddle.rand(shape, dtype=x.dtype)
random_tensor = paddle.floor(random_tensor) # binarize
output = x.divide(keep_prob) * random_tensor
return output
class DropPath(nn.Layer):
def __init__(self, drop_prob=None):
super(DropPath, self).__init__()
self.drop_prob = drop_prob
def forward(self, x):
return drop_path(x, self.drop_prob, self.training)
2.3.3 InceptionNext模型的创建
class InceptionDWConv2D(nn.Layer):
def __init__(self, in_channels, square_kernel_size=3, band_kernel_size=11, branch_ratio=0.125):
super().__init__()
self.in_channels = in_channels
self.gc = int(in_channels * branch_ratio)
self.dwconvhw = nn.Conv2D(self.gc, self.gc, square_kernel_size, padding=square_kernel_size // 2, groups=self.gc)
self.dwconvh = nn.Conv2D(self.gc, self.gc, (band_kernel_size, 1), padding=(band_kernel_size // 2, 0), groups=self.gc)
self.dwconvw = nn.Conv2D(self.gc, self.gc, (1, band_kernel_size), padding=(0, band_kernel_size // 2), groups=self.gc)
def forward(self, x):
x_id, x_hw, x_h, x_w = paddle.split(x, [self.in_channels - 3 * self.gc, self.gc, self.gc, self.gc], axis=1)
x_hw = self.dwconvhw(x_hw)
x_h = self.dwconvh(x_h)
x_w = self.dwconvw(x_w)
out = paddle.concat([x_id, x_hw, x_h, x_w], axis=1)
return out
class ConvMlp(nn.Layer):
""" MLP using 1x1 convs that keeps spatial dims
copied from timm: https://github.com/huggingface/pytorch-image-models/blob/v0.6.11/timm/models/layers/mlp.py
"""
def __init__(
self, in_features, hidden_features=None, out_features=None, act_layer=nn.ReLU,
norm_layer=None, bias=True, drop=0.):
super().__init__()
out_features = out_features or in_features
hidden_features = hidden_features or in_features
self.fc1 = nn.Conv2D(in_features, hidden_features, kernel_size=1, bias_attr=bias)
self.norm = norm_layer(hidden_features) if norm_layer else nn.Identity()
self.act = act_layer()
self.drop = nn.Dropout(drop)
self.fc2 = nn.Conv2D(hidden_features, out_features, kernel_size=1, bias_attr=bias)
def forward(self, x):
x = self.fc1(x)
x = self.norm(x)
x = self.act(x)
x = self.drop(x)
x = self.fc2(x)
return x
class MlpHead(nn.Layer):
""" MLP classification head
"""
def __init__(self, dim, num_classes=1000, mlp_ratio=3, act_layer=nn.GELU,
norm_layer=partial(nn.LayerNorm, epsilon=1e-6), drop=0., bias=True):
super().__init__()
hidden_features = int(mlp_ratio * dim)
self.fc1 = nn.Linear(dim, hidden_features, bias_attr=bias)
self.act = act_layer()
self.norm = norm_layer(hidden_features)
self.fc2 = nn.Linear(hidden_features, num_classes, bias_attr=bias)
self.drop = nn.Dropout(drop)
def forward(self, x):
x = x.mean([2, 3]) # global average pooling
x = self.fc1(x)
x = self.act(x)
x = self.norm(x)
x = self.drop(x)
x = self.fc2(x)
return x
class MetaNeXtBlock(nn.Layer):
""" MetaNeXtBlock Block
Args:
dim (int): Number of input channels.
drop_path (float): Stochastic depth rate. Default: 0.0
ls_init_value (float): Init value for Layer Scale. Default: 1e-6.
"""
def __init__(
self,
dim,
token_mixer=InceptionDWConv2D,
norm_layer=nn.BatchNorm2D,
mlp_layer=ConvMlp,
mlp_ratio=4,
act_layer=nn.GELU,
ls_init_value=1e-6,
drop_path=0.,
):
super().__init__()
self.token_mixer = token_mixer(dim)
self.norm = norm_layer(dim)
self.mlp = mlp_layer(dim, int(mlp_ratio * dim), act_layer=act_layer)
self.gamma = self.create_parameter([1, dim, 1, 1],
default_initializer=nn.initializer.Constant(ls_init_value)) if ls_init_value else None
self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
def forward(self, x):
shortcut = x
x = self.token_mixer(x)
x = self.norm(x)
x = self.mlp(x)
if self.gamma is not None:
x = x * self.gamma
x = self.drop_path(x) + shortcut
return x
class MetaNeXtStage(nn.Layer):
def __init__(
self,
in_chs,
out_chs,
ds_stride=2,
depth=2,
drop_path_rates=None,
ls_init_value=1.0,
act_layer=nn.GELU,
norm_layer=None,
mlp_ratio=4,
):
super().__init__()
if ds_stride > 1:
self.downsample = nn.Sequential(
norm_layer(in_chs),
nn.Conv2D(in_chs, out_chs, kernel_size=ds_stride, stride=ds_stride),
)
else:
self.downsample = nn.Identity()
drop_path_rates = drop_path_rates or [0.] * depth
stage_blocks = []
for i in range(depth):
stage_blocks.append(MetaNeXtBlock(
dim=out_chs,
drop_path=drop_path_rates[i],
ls_init_value=ls_init_value,
act_layer=act_layer,
norm_layer=norm_layer,
mlp_ratio=mlp_ratio,
))
in_chs = out_chs
self.blocks = nn.Sequential(*stage_blocks)
def forward(self, x):
x = self.downsample(x)
x = self.blocks(x)
return x
class MetaNeXt(nn.Layer):
r""" MetaNeXt
A PyTorch impl of : `InceptionNeXt: When Inception Meets ConvNeXt` - https://arxiv.org/pdf/2203.xxxxx.pdf
Args:
in_chans (int): Number of input image channels. Default: 3
num_classes (int): Number of classes for classification head. Default: 1000
depths (tuple(int)): Number of blocks at each stage. Default: (3, 3, 9, 3)
dims (tuple(int)): Feature dimension at each stage. Default: (96, 192, 384, 768)
token_mixers: Token mixer function. Default: nn.Identity
norm_layer: Normalziation layer. Default: nn.BatchNorm2d
act_layer: Activation function for MLP. Default: nn.GELU
mlp_ratios (int or tuple(int)): MLP ratios. Default: (4, 4, 4, 3)
head_fn: classifier head
drop_rate (float): Head dropout rate
drop_path_rate (float): Stochastic depth rate. Default: 0.
ls_init_value (float): Init value for Layer Scale. Default: 1e-6.
"""
def __init__(
self,
in_chans=3,
num_classes=1000,
depths=(3, 3, 9, 3),
dims=(96, 192, 384, 768),
token_mixers=nn.Identity,
norm_layer=nn.BatchNorm2D,
act_layer=nn.GELU,
mlp_ratios=(4, 4, 4, 3),
head_fn=MlpHead,
drop_rate=0.,
drop_path_rate=0.,
ls_init_value=1e-6,
**kwargs,
):
super().__init__()
num_stage = len(depths)
if not isinstance(token_mixers, (list, tuple)):
token_mixers = [token_mixers] * num_stage
if not isinstance(mlp_ratios, (list, tuple)):
mlp_ratios = [mlp_ratios] * num_stage
self.num_classes = num_classes
self.drop_rate = drop_rate
self.stem = nn.Sequential(
nn.Conv2D(in_chans, dims[0], kernel_size=4, stride=4),
norm_layer(dims[0])
)
self.stages = nn.Sequential()
dp_rates = [x.tolist() for x in paddle.linspace(0, drop_path_rate, sum(depths)).split(depths)]
stages = []
prev_chs = dims[0]
# feature resolution stages, each consisting of multiple residual blocks
for i in range(num_stage):
out_chs = dims[i]
stages.append(MetaNeXtStage(
prev_chs,
out_chs,
ds_stride=2 if i > 0 else 1,
depth=depths[i],
drop_path_rates=dp_rates[i],
ls_init_value=ls_init_value,
act_layer=act_layer,
norm_layer=norm_layer,
mlp_ratio=mlp_ratios[i],
))
prev_chs = out_chs
self.stages = nn.Sequential(*stages)
self.num_features = prev_chs
self.head = head_fn(self.num_features, num_classes, drop=drop_rate)
self.apply(self._init_weights)
def _init_weights(self, m):
tn = nn.initializer.TruncatedNormal(std=.02)
ones = nn.initializer.Constant(1.0)
zeros = nn.initializer.Constant(0.0)
if isinstance(m, (nn.Conv2D, nn.Linear)):
tn(m.weight)
if m.bias is not None:
zeros(m.bias)
elif isinstance(m, (nn.LayerNorm, nn.BatchNorm2D)):
zeros(m.bias)
ones(m.weight)
def forward_features(self, x):
x = self.stem(x)
x = self.stages(x)
return x
def forward_head(self, x):
x = self.head(x)
return x
def forward(self, x):
x = self.forward_features(x)
x = self.forward_head(x)
return x
num_classes = 10
def inceptionnext_tiny():
model = MetaNeXt(depths=(3, 3, 9, 3), dims=(96, 192, 384, 768),
token_mixers=InceptionDWConv2D, num_classes=num_classes
)
return model
def inceptionnext_small():
model = MetaNeXt(depths=(3, 3, 27, 3), dims=(96, 192, 384, 768),
token_mixers=InceptionDWConv2D, num_classes=num_classes
)
return model
def inceptionnext_base():
model = MetaNeXt(depths=(3, 3, 27, 3), dims=(128, 256, 512, 1024),
token_mixers=InceptionDWConv2D, num_classes=num_classes
)
return model
2.3.4 模型的参数
model = inceptionnext_tiny()
paddle.summary(model, (1, 3, 224, 224))
model = inceptionnext_small()
paddle.summary(model, (1, 3, 224, 224))
model = inceptionnext_base()
paddle.summary(model, (1, 3, 224, 224))
2.4 训练
learning_rate = 0.00025
n_epochs = 100
paddle.seed(42)
np.random.seed(42)
work_path = 'work/model'
# InceptionNext-T
model = inceptionnext_tiny()
criterion = LabelSmoothingCrossEntropy()
scheduler = paddle.optimizer.lr.CosineAnnealingDecay(learning_rate=learning_rate, T_max=50000 // batch_size * n_epochs, verbose=False)
optimizer = paddle.optimizer.Adam(parameters=model.parameters(), learning_rate=scheduler, weight_decay=1e-5)
gate = 0.0
threshold = 0.0
best_acc = 0.0
val_acc = 0.0
loss_record = {'train': {'loss': [], 'iter': []}, 'val': {'loss': [], 'iter': []}} # for recording loss
acc_record = {'train': {'acc': [], 'iter': []}, 'val': {'acc': [], 'iter': []}} # for recording accuracy
loss_iter = 0
acc_iter = 0
for epoch in range(n_epochs):
# ---------- Training ----------
model.train()
train_num = 0.0
train_loss = 0.0
val_num = 0.0
val_loss = 0.0
accuracy_manager = paddle.metric.Accuracy()
val_accuracy_manager = paddle.metric.Accuracy()
print("#===epoch: {}, lr={:.10f}===#".format(epoch, optimizer.get_lr()))
for batch_id, data in enumerate(train_loader):
x_data, y_data = data
labels = paddle.unsqueeze(y_data, axis=1)
logits = model(x_data)
loss = criterion(logits, y_data)
acc = paddle.metric.accuracy(logits, labels)
accuracy_manager.update(acc)
if batch_id % 10 == 0:
loss_record['train']['loss'].append(loss.numpy())
loss_record['train']['iter'].append(loss_iter)
loss_iter += 1
loss.backward()
optimizer.step()
scheduler.step()
optimizer.clear_grad()
train_loss += loss
train_num += len(y_data)
total_train_loss = (train_loss / train_num) * batch_size
train_acc = accuracy_manager.accumulate()
acc_record['train']['acc'].append(train_acc)
acc_record['train']['iter'].append(acc_iter)
acc_iter += 1
# Print the information.
print("#===epoch: {}, train loss is: {}, train acc is: {:2.2f}%===#".format(epoch, total_train_loss.numpy(), train_acc*100))
# ---------- Validation ----------
model.eval()
for batch_id, data in enumerate(val_loader):
x_data, y_data = data
labels = paddle.unsqueeze(y_data, axis=1)
with paddle.no_grad():
logits = model(x_data)
loss = criterion(logits, y_data)
acc = paddle.metric.accuracy(logits, labels)
val_accuracy_manager.update(acc)
val_loss += loss
val_num += len(y_data)
total_val_loss = (val_loss / val_num) * batch_size
loss_record['val']['loss'].append(total_val_loss.numpy())
loss_record['val']['iter'].append(loss_iter)
val_acc = val_accuracy_manager.accumulate()
acc_record['val']['acc'].append(val_acc)
acc_record['val']['iter'].append(acc_iter)
print("#===epoch: {}, val loss is: {}, val acc is: {:2.2f}%===#".format(epoch, total_val_loss.numpy(), val_acc*100))
# ===================save====================
if val_acc > best_acc:
best_acc = val_acc
paddle.save(model.state_dict(), os.path.join(work_path, 'best_model.pdparams'))
paddle.save(optimizer.state_dict(), os.path.join(work_path, 'best_optimizer.pdopt'))
print(best_acc)
paddle.save(model.state_dict(), os.path.join(work_path, 'final_model.pdparams'))
paddle.save(optimizer.state_dict(), os.path.join(work_path, 'final_optimizer.pdopt'))
2.5 结果分析
def plot_learning_curve(record, title='loss', ylabel='CE Loss'):
''' Plot learning curve of your CNN '''
maxtrain = max(map(float, record['train'][title]))
maxval = max(map(float, record['val'][title]))
ymax = max(maxtrain, maxval) * 1.1
mintrain = min(map(float, record['train'][title]))
minval = min(map(float, record['val'][title]))
ymin = min(mintrain, minval) * 0.9
total_steps = len(record['train'][title])
x_1 = list(map(int, record['train']['iter']))
x_2 = list(map(int, record['val']['iter']))
figure(figsize=(10, 6))
plt.plot(x_1, record['train'][title], c='tab:red', label='train')
plt.plot(x_2, record['val'][title], c='tab:cyan', label='val')
plt.ylim(ymin, ymax)
plt.xlabel('Training steps')
plt.ylabel(ylabel)
plt.title('Learning curve of {}'.format(title))
plt.legend()
plt.show()
plot_learning_curve(loss_record, title='loss', ylabel='CE Loss')
plot_learning_curve(acc_record, title='acc', ylabel='Accuracy')
import time
work_path = 'work/model'
model = inceptionnext_tiny()
model_state_dict = paddle.load(os.path.join(work_path, 'best_model.pdparams'))
model.set_state_dict(model_state_dict)
model.eval()
aa = time.time()
for batch_id, data in enumerate(val_loader):
x_data, y_data = data
labels = paddle.unsqueeze(y_data, axis=1)
with paddle.no_grad():
logits = model(x_data)
bb = time.time()
print("Throughout:{}".format(int(len(val_dataset)//(bb - aa))))
Throughout:724
def get_cifar10_labels(labels):
"""返回CIFAR10数据集的文本标签。"""
text_labels = [
'airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog',
'horse', 'ship', 'truck']
return [text_labels[int(i)] for i in labels]
def show_images(imgs, num_rows, num_cols, pred=None, gt=None, scale=1.5):
"""Plot a list of images."""
figsize = (num_cols * scale, num_rows * scale)
_, axes = plt.subplots(num_rows, num_cols, figsize=figsize)
axes = axes.flatten()
for i, (ax, img) in enumerate(zip(axes, imgs)):
if paddle.is_tensor(img):
ax.imshow(img.numpy())
else:
ax.imshow(img)
ax.axes.get_xaxis().set_visible(False)
ax.axes.get_yaxis().set_visible(False)
if pred or gt:
ax.set_title("pt: " + pred[i] + "\ngt: " + gt[i])
return axes
work_path = 'work/model'
X, y = next(iter(DataLoader(val_dataset, batch_size=18)))
model = inceptionnext_tiny()
model_state_dict = paddle.load(os.path.join(work_path, 'best_model.pdparams'))
model.set_state_dict(model_state_dict)
model.eval()
logits = model(X)
y_pred = paddle.argmax(logits, -1)
X = paddle.transpose(X, [0, 2, 3, 1])
axes = show_images(X.reshape((18, 224, 224, 3)), 1, 18, pred=get_cifar10_labels(y_pred), gt=get_cifar10_labels(y))
plt.show()
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
!pip install interpretdl
import interpretdl as it
work_path = 'work/model'
model = inceptionnext_tiny()
model_state_dict = paddle.load(os.path.join(work_path, 'best_model.pdparams'))
model.set_state_dict(model_state_dict)
X, y = next(iter(DataLoader(val_dataset, batch_size=18)))
lime = it.LIMECVInterpreter(model)
lime_weights = lime.interpret(X.numpy()[3], interpret_class=y.numpy()[3], batch_size=100, num_samples=10000, visual=True)
100%|██████████| 10000/10000 [00:57<00:00, 174.10it/s]
57<00:00, 174.10it/s]
总结
本文提出了将通道进行划分,一部分做恒等映射,其余部分分别使用条状大卷积核和普通小卷积核进行深度卷积,缓解了传统大卷积核访存慢的问题,模型简单有效。
参考文献
此文章为搬运
原项目链接