SimAM:无参数的注意力机制

提出一种无需额外参数的注意力机制SimAM,基于线性可分性和神经科学理论,通过优化能量函数确定神经元重要性,提升卷积神经网络表现。应用于AlexNet,在CIFAR10数据集上验证其有效性。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

SimAM:无参数的注意力机制

摘要

        在本文中,我们提出了一个用于卷积神经网络的概念简单但非常有效的注意模块。与现有的通道关注模块和空间关注模块相比,我们的模块无需向原始网络添加参数,而是在一层中推断特征图的3-D关注权重。具体来说,我们基于一些著名的神经科学理论,提出优化一个能量函数来发现每个神经元的重要性。我们进一步推导出能量函数的快速封闭形式的解,并表明该解可以在不到十行代码中实现。该模块的另一个优点是,大多数算子是根据定义的能量函数的解来选择的,避免了在结构调整上花费太多精力。通过对各种视觉任务的定量评估,证明了该模块的灵活性和有效性,提高了许多ConvNets的表达能力。

1. SimAM

1.1 现有注意力机制比较

在这里插入图片描述

        现有的注意力模块通常被继承到每个块中,以改进来自先前层的输出。这种细化步骤通常沿着通道维度或空间维度操作,这些方法生成一维或二维权重,并平等对待每个通道或空间位置中的神经元:

  1. 通道注意力:1D注意力,它对不同通道区别对待,对所有位置同等对待;
  2. 空间注意力:2D注意力,它对不同位置区别对待,对所有通道同等对待。
            这可能会限制他们学习更多辨别线索的能力。因此三维权重优于传统的一维和二维权重注意力
    在这里插入图片描述

1.2 SimAM

        已有研究BAM、CBAM分别将空间注意力与通道注意力进行并行或串行组合。然而,人脑的两种注意力往往是协同工作,因此,我们提出了统一权值的注意力模块。为更好的实现注意力,我们需要评估每个神经元的重要性。在神经科学中,信息丰富的神经元通常表现出与周围神经元不同的放电模式。而且,激活神经元通常会抑制周围神经元,即空域抑制。换句话说,具有空域抑制效应的神经元应当赋予更高的重要性。最简单的寻找重要神经元的方法:度量神经元之间的线性可分性。因此,我们定义了如下能量函数:
e t ( w t , b t , y , x i ) = ( y t − t ^ ) 2 + 1 M − 1 ∑ i = 1 M − 1 ( y o − x ^ i ) 2 e_{t}\left(w_{t}, b_{t}, \mathbf{y}, x_{i}\right)=\left(y_{t}-\hat{t}\right)^{2}+\frac{1}{M-1} \sum_{i=1}^{M-1}\left(y_{o}-\hat{x}_{i}\right)^{2} et(wt,bt,y,xi)=(ytt^)2+M11i=1M1(yox^i)2
        最小化上述公式等价于训练同一通道内神经元t与其他神经元之间的线性可分性。为简单起见,我们采用二值标签,并添加正则项,最终的能量函数定义如下:
e t ( w t , b t , y , x i ) = 1 M − 1 ∑ i = 1 M − 1 ( − 1 − ( w t x i + b t ) ) 2 + ( 1 − ( w t t + b t ) ) 2 + λ w t 2 . \begin{aligned} e_{t}\left(w_{t}, b_{t}, \mathbf{y}, x_{i}\right) &=\frac{1}{M-1} \sum_{i=1}^{M-1}\left(-1-\left(w_{t} x_{i}+b_{t}\right)\right)^{2} \\ &+\left(1-\left(w_{t} t+b_{t}\right)\right)^{2}+\lambda w_{t}^{2} . \end{aligned} et(wt,bt,y,xi)=M11i=1M1(1(wtxi+bt))2+(1(wtt+bt))2+λwt2.
        上式的解析解为:
w t = − 2 ( t − μ t ) ( t − μ t ) 2 + 2 σ t 2 + 2 λ b t = − 1 2 ( t + μ t ) w t \begin{array}{l} w_{t}=-\frac{2\left(t-\mu_{t}\right)}{\left(t-\mu_{t}\right)^{2}+2 \sigma_{t}^{2}+2 \lambda} \\ b_{t}=-\frac{1}{2}\left(t+\mu_{t}\right) w_{t} \end{array} wt=(tμt)2+2σt2+2λ2(tμt)bt=21(t+μt)wt
        由于每个通道上所有神经元都遵循相同的分布,因此可以先对输入特征在H和W两个维度上计算均值和方差,避免重复计算:
e t ∗ = 4 ( σ ^ 2 + λ ) ( t − μ ^ ) 2 + 2 σ ^ 2 + 2 λ e_{t}^{*}=\frac{4\left(\hat{\sigma}^{2}+\lambda\right)}{(t-\hat{\mu})^{2}+2 \hat{\sigma}^{2}+2 \lambda} et=(tμ^)2+2σ^2+2λ4(σ^2+λ)
        整个过程可以表示为:
X ~ = sigmoid ⁡ ( 1 E ) ⊙ X \widetilde{\mathbf{X}}=\operatorname{sigmoid}\left(\frac{1}{\mathbf{E}}\right) \odot \mathbf{X} X =sigmoid(E1)X

2. 代码复现

2.1 下载并导入所需要的包

!pip install paddlex
%matplotlib inline
import paddle
import paddle.fluid as fluid
import numpy as np
import matplotlib.pyplot as plt
from paddle.vision.datasets import Cifar10
from paddle.vision.transforms import Transpose
from paddle.io import Dataset, DataLoader
from paddle import nn
import paddle.nn.functional as F
import paddle.vision.transforms as transforms
import os
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
import paddlex
from paddle import ParamAttr

2.2 创建数据集

train_tfm = transforms.Compose([
    transforms.Resize((130, 130)),
    transforms.ColorJitter(brightness=0.2,contrast=0.2, saturation=0.2),
    paddlex.transforms.MixupImage(),
    transforms.RandomResizedCrop(128, scale=(0.6, 1.0)),
    transforms.RandomHorizontalFlip(0.5),
    transforms.RandomRotation(20),
    transforms.ToTensor(),
    transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])

test_tfm = transforms.Compose([
    transforms.Resize((128, 128)),
    transforms.ToTensor(),
    transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])
paddle.vision.set_image_backend('cv2')
# 使用Cifar10数据集
train_dataset = Cifar10(data_file='data/data152754/cifar-10-python.tar.gz', mode='train', transform = train_tfm, )
val_dataset = Cifar10(data_file='data/data152754/cifar-10-python.tar.gz', mode='test',transform = test_tfm)
print("train_dataset: %d" % len(train_dataset))
print("val_dataset: %d" % len(val_dataset))
train_dataset: 50000
val_dataset: 10000
batch_size=128
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, drop_last=True, num_workers=2)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, drop_last=False, num_workers=2)

2.3 标签平滑

class LabelSmoothingCrossEntropy(nn.Layer):
    def __init__(self, smoothing=0.1):
        super().__init__()
        self.smoothing = smoothing

    def forward(self, pred, target):

        confidence = 1. - self.smoothing
        log_probs = F.log_softmax(pred, axis=-1)
        idx = paddle.stack([paddle.arange(log_probs.shape[0]), target], axis=1)
        nll_loss = paddle.gather_nd(-log_probs, index=idx)
        smooth_loss = paddle.mean(-log_probs, axis=-1)
        loss = confidence * nll_loss + self.smoothing * smooth_loss

        return loss.mean()

2.4 AlexNet-SimAM

2.4.1 SimAM
class SimAM(nn.Layer):
    def __init__(self, lamda=1e-5):
        super().__init__()
        self.lamda = lamda
        self.sigmoid = nn.Sigmoid()        

    def forward(self, x):
        b, c, h, w = x.shape
        n = h * w - 1
        mean = paddle.mean(x, axis=[-2,-1], keepdim=True)
        var = paddle.sum(paddle.pow((x - mean), 2), axis=[-2, -1], keepdim=True) / n
        e_t = paddle.pow((x - mean), 2) / (4 * (var + self.lamda)) + 0.5 
        out = self.sigmoid(e_t) * x
        return out
model = SimAM()
paddle.summary(model, (1, 64, 224, 224))
---------------------------------------------------------------------------
 Layer (type)       Input Shape          Output Shape         Param #    
===========================================================================
  Sigmoid-28    [[1, 64, 224, 224]]   [1, 64, 224, 224]          0       
===========================================================================
Total params: 0
Trainable params: 0
Non-trainable params: 0
---------------------------------------------------------------------------
Input size (MB): 12.25
Forward/backward pass size (MB): 24.50
Params size (MB): 0.00
Estimated Total Size (MB): 36.75
---------------------------------------------------------------------------






{'total_params': 0, 'trainable_params': 0}
2.4.2 AlexNet-SimAM
class AlexNet_SimAM(nn.Layer):
    def __init__(self,num_classes=10):
        super().__init__()
        self.features=nn.Sequential(
            nn.Conv2D(3,48, kernel_size=11, stride=4, padding=11//2),
            SimAM(),
            nn.ReLU(),
            nn.MaxPool2D(kernel_size=3,stride=2),
            nn.Conv2D(48,128, kernel_size=5, padding=2),
            SimAM(),
            nn.ReLU(),
            nn.MaxPool2D(kernel_size=3,stride=2),
            nn.Conv2D(128, 192,kernel_size=3,stride=1,padding=1),
            SimAM(),
            nn.ReLU(),
            nn.Conv2D(192,192,kernel_size=3,stride=1,padding=1),
            SimAM(),
            nn.ReLU(),
            nn.Conv2D(192,128,kernel_size=3,stride=1,padding=1),
            SimAM(),
            nn.ReLU(),
            nn.MaxPool2D(kernel_size=3,stride=2),
        )
        self.classifier=nn.Sequential(
            nn.Linear(3 * 3 * 128,2048),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(2048,2048),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(2048,num_classes),
        )
 
 
    def forward(self,x):
        x = self.features(x)
        x = paddle.flatten(x, 1)
        x=self.classifier(x)
 
        return x
model = AlexNet_SimAM(num_classes=10)
paddle.summary(model, (1, 3, 128, 128))

在这里插入图片描述

2.5 训练

learning_rate = 0.0005
n_epochs = 50
paddle.seed(42)
np.random.seed(42)
work_path = 'work/model'

model = AlexNet_SimAM(num_classes=10)

criterion = LabelSmoothingCrossEntropy()

scheduler = paddle.optimizer.lr.CosineAnnealingDecay(learning_rate=learning_rate, T_max=50000 // batch_size * n_epochs, verbose=False)
optimizer = paddle.optimizer.Adam(parameters=model.parameters(), learning_rate=scheduler, weight_decay=1e-5)

gate = 0.0
threshold = 0.0
best_acc = 0.0
val_acc = 0.0
loss_record = {'train': {'loss': [], 'iter': []}, 'val': {'loss': [], 'iter': []}}   # for recording loss
acc_record = {'train': {'acc': [], 'iter': []}, 'val': {'acc': [], 'iter': []}}      # for recording accuracy

loss_iter = 0
acc_iter = 0

for epoch in range(n_epochs):
    # ---------- Training ----------
    model.train()
    train_num = 0.0
    train_loss = 0.0

    val_num = 0.0
    val_loss = 0.0
    accuracy_manager = paddle.metric.Accuracy()
    val_accuracy_manager = paddle.metric.Accuracy()
    print("#===epoch: {}, lr={:.10f}===#".format(epoch, optimizer.get_lr()))
    for batch_id, data in enumerate(train_loader):
        x_data, y_data = data
        labels = paddle.unsqueeze(y_data, axis=1)

        logits = model(x_data)

        loss = criterion(logits, y_data)

        acc = paddle.metric.accuracy(logits, labels)
        accuracy_manager.update(acc)
        if batch_id % 10 == 0:
            loss_record['train']['loss'].append(loss.numpy())
            loss_record['train']['iter'].append(loss_iter)
            loss_iter += 1

        loss.backward()

        optimizer.step()
        scheduler.step()
        optimizer.clear_grad()
        
        train_loss += loss
        train_num += len(y_data)

    total_train_loss = (train_loss / train_num) * batch_size
    train_acc = accuracy_manager.accumulate()
    acc_record['train']['acc'].append(train_acc)
    acc_record['train']['iter'].append(acc_iter)
    acc_iter += 1
    # Print the information.
    print("#===epoch: {}, train loss is: {}, train acc is: {:2.2f}%===#".format(epoch, total_train_loss.numpy(), train_acc*100))

    # ---------- Validation ----------
    model.eval()

    for batch_id, data in enumerate(val_loader):

        x_data, y_data = data
        labels = paddle.unsqueeze(y_data, axis=1)
        with paddle.no_grad():
          logits = model(x_data)

        loss = criterion(logits, y_data)

        acc = paddle.metric.accuracy(logits, labels)
        val_accuracy_manager.update(acc)

        val_loss += loss
        val_num += len(y_data)

    total_val_loss = (val_loss / val_num) * batch_size
    loss_record['val']['loss'].append(total_val_loss.numpy())
    loss_record['val']['iter'].append(loss_iter)
    val_acc = val_accuracy_manager.accumulate()
    acc_record['val']['acc'].append(val_acc)
    acc_record['val']['iter'].append(acc_iter)
    
    print("#===epoch: {}, val loss is: {}, val acc is: {:2.2f}%===#".format(epoch, total_val_loss.numpy(), val_acc*100))

    # ===================save====================
    if val_acc > best_acc:
        best_acc = val_acc
        paddle.save(model.state_dict(), os.path.join(work_path, 'best_model.pdparams'))
        paddle.save(optimizer.state_dict(), os.path.join(work_path, 'best_optimizer.pdopt'))

print(best_acc)
paddle.save(model.state_dict(), os.path.join(work_path, 'final_model.pdparams'))
paddle.save(optimizer.state_dict(), os.path.join(work_path, 'final_optimizer.pdopt'))

在这里插入图片描述

2.6 实验结果

def plot_learning_curve(record, title='loss', ylabel='CE Loss'):
    ''' Plot learning curve of your CNN '''
    maxtrain = max(map(float, record['train'][title]))
    maxval = max(map(float, record['val'][title]))
    ymax = max(maxtrain, maxval) * 1.1
    mintrain = min(map(float, record['train'][title]))
    minval = min(map(float, record['val'][title]))
    ymin = min(mintrain, minval) * 0.9

    total_steps = len(record['train'][title])
    x_1 = list(map(int, record['train']['iter']))
    x_2 = list(map(int, record['val']['iter']))
    figure(figsize=(10, 6))
    plt.plot(x_1, record['train'][title], c='tab:red', label='train')
    plt.plot(x_2, record['val'][title], c='tab:cyan', label='val')
    plt.ylim(ymin, ymax)
    plt.xlabel('Training steps')
    plt.ylabel(ylabel)
    plt.title('Learning curve of {}'.format(title))
    plt.legend()
    plt.show()
plot_learning_curve(loss_record, title='loss', ylabel='CE Loss')

在这里插入图片描述

plot_learning_curve(acc_record, title='acc', ylabel='Accuracy')

在这里插入图片描述

import time
work_path = 'work/model'
model = AlexNet_SimAM(num_classes=10)
model_state_dict = paddle.load(os.path.join(work_path, 'best_model.pdparams'))
model.set_state_dict(model_state_dict)
model.eval()
aa = time.time()
for batch_id, data in enumerate(val_loader):

    x_data, y_data = data
    labels = paddle.unsqueeze(y_data, axis=1)
    with paddle.no_grad():
        logits = model(x_data)
bb = time.time()
print("Throughout:{}".format(int(len(val_dataset)//(bb - aa))))
Throughout:938
def get_cifar10_labels(labels):  
    """返回CIFAR10数据集的文本标签。"""
    text_labels = [
        'airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog',
        'horse', 'ship', 'truck']
    return [text_labels[int(i)] for i in labels]
def show_images(imgs, num_rows, num_cols, pred=None, gt=None, scale=1.5):  
    """Plot a list of images."""
    figsize = (num_cols * scale, num_rows * scale)
    _, axes = plt.subplots(num_rows, num_cols, figsize=figsize)
    axes = axes.flatten()
    for i, (ax, img) in enumerate(zip(axes, imgs)):
        if paddle.is_tensor(img):
            ax.imshow(img.numpy())
        else:
            ax.imshow(img)
        ax.axes.get_xaxis().set_visible(False)
        ax.axes.get_yaxis().set_visible(False)
        if pred or gt:
            ax.set_title("pt: " + pred[i] + "\ngt: " + gt[i])
    return axes
work_path = 'work/model'
X, y = next(iter(DataLoader(val_dataset, batch_size=18)))
model = AlexNet_SimAM(num_classes=10)
model_state_dict = paddle.load(os.path.join(work_path, 'best_model.pdparams'))
model.set_state_dict(model_state_dict)
model.eval()
logits = model(X)
y_pred = paddle.argmax(logits, -1)
X = paddle.transpose(X, [0, 2, 3, 1])
axes = show_images(X.reshape((18, 128, 128, 3)), 1, 18, pred=get_cifar10_labels(y_pred), gt=get_cifar10_labels(y))
plt.show()
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).

在这里插入图片描述

3. AlexNet

3.1 AlexNet

class AlexNet(nn.Layer):
    def __init__(self,num_classes=10):
        super().__init__()
        self.features=nn.Sequential(
            nn.Conv2D(3,48, kernel_size=11, stride=4, padding=11//2),
            nn.ReLU(),
            nn.MaxPool2D(kernel_size=3,stride=2),
            nn.Conv2D(48,128, kernel_size=5, padding=2),
            nn.ReLU(),
            nn.MaxPool2D(kernel_size=3,stride=2),
            nn.Conv2D(128, 192,kernel_size=3,stride=1,padding=1),
            nn.ReLU(),
            nn.Conv2D(192,192,kernel_size=3,stride=1,padding=1),
            nn.ReLU(),
            nn.Conv2D(192,128,kernel_size=3,stride=1,padding=1),
            nn.ReLU(),
            nn.MaxPool2D(kernel_size=3,stride=2),
        )
        self.classifier=nn.Sequential(
            nn.Linear(3 * 3 * 128,2048),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(2048,2048),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(2048,num_classes),
        )
 
 
    def forward(self,x):
        x = self.features(x)
        x = paddle.flatten(x, 1)
        x=self.classifier(x)
 
        return x
model = AlexNet(num_classes=10)
paddle.summary(model, (1, 3, 128, 128))

在这里插入图片描述

3.2 训练

learning_rate = 0.0005
n_epochs = 50
paddle.seed(42)
np.random.seed(42)
work_path = 'work/model1'

model = AlexNet(num_classes=10)

criterion = LabelSmoothingCrossEntropy()

scheduler = paddle.optimizer.lr.CosineAnnealingDecay(learning_rate=learning_rate, T_max=50000 // batch_size * n_epochs, verbose=False)
optimizer = paddle.optimizer.Adam(parameters=model.parameters(), learning_rate=scheduler, weight_decay=1e-5)

gate = 0.0
threshold = 0.0
best_acc = 0.0
val_acc = 0.0
loss_record1 = {'train': {'loss': [], 'iter': []}, 'val': {'loss': [], 'iter': []}}   # for recording loss
acc_record1 = {'train': {'acc': [], 'iter': []}, 'val': {'acc': [], 'iter': []}}      # for recording accuracy

loss_iter = 0
acc_iter = 0

for epoch in range(n_epochs):
    # ---------- Training ----------
    model.train()
    train_num = 0.0
    train_loss = 0.0

    val_num = 0.0
    val_loss = 0.0
    accuracy_manager = paddle.metric.Accuracy()
    val_accuracy_manager = paddle.metric.Accuracy()
    print("#===epoch: {}, lr={:.10f}===#".format(epoch, optimizer.get_lr()))
    for batch_id, data in enumerate(train_loader):
        x_data, y_data = data
        labels = paddle.unsqueeze(y_data, axis=1)

        logits = model(x_data)

        loss = criterion(logits, y_data)

        acc = paddle.metric.accuracy(logits, labels)
        accuracy_manager.update(acc)
        if batch_id % 10 == 0:
            loss_record1['train']['loss'].append(loss.numpy())
            loss_record1['train']['iter'].append(loss_iter)
            loss_iter += 1

        loss.backward()

        optimizer.step()
        scheduler.step()
        optimizer.clear_grad()
        
        train_loss += loss
        train_num += len(y_data)

    total_train_loss = (train_loss / train_num) * batch_size
    train_acc = accuracy_manager.accumulate()
    acc_record1['train']['acc'].append(train_acc)
    acc_record1['train']['iter'].append(acc_iter)
    acc_iter += 1
    # Print the information.
    print("#===epoch: {}, train loss is: {}, train acc is: {:2.2f}%===#".format(epoch, total_train_loss.numpy(), train_acc*100))

    # ---------- Validation ----------
    model.eval()

    for batch_id, data in enumerate(val_loader):

        x_data, y_data = data
        labels = paddle.unsqueeze(y_data, axis=1)
        with paddle.no_grad():
          logits = model(x_data)

        loss = criterion(logits, y_data)

        acc = paddle.metric.accuracy(logits, labels)
        val_accuracy_manager.update(acc)

        val_loss += loss
        val_num += len(y_data)

    total_val_loss = (val_loss / val_num) * batch_size
    loss_record1['val']['loss'].append(total_val_loss.numpy())
    loss_record1['val']['iter'].append(loss_iter)
    val_acc = val_accuracy_manager.accumulate()
    acc_record1['val']['acc'].append(val_acc)
    acc_record1['val']['iter'].append(acc_iter)
    
    print("#===epoch: {}, val loss is: {}, val acc is: {:2.2f}%===#".format(epoch, total_val_loss.numpy(), val_acc*100))

    # ===================save====================
    if val_acc > best_acc:
        best_acc = val_acc
        paddle.save(model.state_dict(), os.path.join(work_path, 'best_model.pdparams'))
        paddle.save(optimizer.state_dict(), os.path.join(work_path, 'best_optimizer.pdopt'))

print(best_acc)
paddle.save(model.state_dict(), os.path.join(work_path, 'final_model.pdparams'))
paddle.save(optimizer.state_dict(), os.path.join(work_path, 'final_optimizer.pdopt'))

在这里插入图片描述

3.3 实验结果

plot_learning_curve(loss_record1, title='loss', ylabel='CE Loss')

在这里插入图片描述

plot_learning_curve(acc_record1, title='acc', ylabel='Accuracy')

在这里插入图片描述

import time
work_path = 'work/model1'
model = AlexNet(num_classes=10)
model_state_dict = paddle.load(os.path.join(work_path, 'best_model.pdparams'))
model.set_state_dict(model_state_dict)
model.eval()
aa = time.time()
for batch_id, data in enumerate(val_loader):

    x_data, y_data = data
    labels = paddle.unsqueeze(y_data, axis=1)
    with paddle.no_grad():
        logits = model(x_data)
bb = time.time()
print("Throughout:{}".format(int(len(val_dataset)//(bb - aa))))
Throughout:986
work_path = 'work/model1'
X, y = next(iter(DataLoader(val_dataset, batch_size=18)))
model = AlexNet(num_classes=10)
model_state_dict = paddle.load(os.path.join(work_path, 'best_model.pdparams'))
model.set_state_dict(model_state_dict)
model.eval()
logits = model(X)
y_pred = paddle.argmax(logits, -1)
X = paddle.transpose(X, [0, 2, 3, 1])
axes = show_images(X.reshape((18, 128, 128, 3)), 1, 18, pred=get_cifar10_labels(y_pred), gt=get_cifar10_labels(y))
plt.show()
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).

在这里插入图片描述

4. 对比实验结果

modelTrain AccVal Accparameter
AlexNet w/o SimAM0.84220.842867524042
AlexNet w SimAM0.85560.843557524042

总结

        本文根据神经科学理论和线性可分性原理,提出了一种无任何可训练参数的注意力机制,使AlexNet在CIFAR10上提高了0.069%(0.84355 vs 0.84286)

声明

此项目为搬运
原项目链接

### 回答1: 引入SiMaN注意力机制后,YOLOv7的参数量和计算量都有所增加。具体来说,SiMaN注意力机制引入了两个额外的可训练参数,分别用于计算注意力图的权重和偏置,这导致了参数量的增加。同时,由于要计算注意力图,需要在每个特征图上进行卷积操作,这也增加了计算量。不过,这种注意力机制可以有效地提高目标检测的性能,特别是在处理小目标时。因此,虽然参数量和计算量增加了,但是模型性能也有所提高。 ### 回答2: YOLOv7 引入了SiMaM(Scaled Maximum Attention Module)注意力机制,该注意力机制可以帮助模型更加准确地定位目标并提高检测性能。 在参数量方面,SiMaM注意力机制在YOLOv7中引入了一些额外的参数。具体来说,SiMaM模块由一系列卷积层和注意力模块组成,其中注意力模块包括一些线性变换、激活函数和全局平均池化等操作,这些操作带来了一些额外的可学习参数。 在计算量方面,SiMaM注意力机制也引入了额外的计算开销。由于注意力模块包含了一些额外的卷积、线性变换和池化操作,这些操作会增加模型的计算量。因此,引入了SiMaM注意力机制后,YOLOv7的计算量相比于之前的版本会有一定的增加。 尽管引入了一定的额外参数和计算开销,但是SiMaM注意力机制在YOLOv7中的应用可以有效提升目标定位的准确性和检测性能。通过对目标特征进行自适应调整和加权,SiMaM可以提高模型对目标的关注度,并使得模型更具有目标感知性。这种提升在一些复杂的场景下尤为明显,可以提高目标检测的精度和鲁棒性。 总之,尽管引入SiMaM注意力机制会带来一些额外的参数和计算开销,但是它对于YOLOv7模型的性能提升是非常有益的。在目标定位的准确性和检测性能上,SiMaM可以显著改善模型的表现,提高目标检测的准确率和可靠性。 ### 回答3: YOLOv7是YOLO系列目标检测算法的最新版本,在YOLOv7中引入了SIMAM注意力机制,以改进检测准确性。SIMAM注意力机制能够帮助模型更好地关注重要的目标区域,提升检测性能。 引入SIMAM注意力机制会对参数量和计算量产生一定程度的变化。具体来说,引入SIMAM注意力机制会增加一些额外的参数用于计算注意力权重和特征映射,因此会导致参数量的增加。这些额外的参数可以通过训练过程中学习到,以适应不同的目标检测任务。 同时,引入SIMAM注意力机制还会增加一定的计算量。因为注意力权重的计算需要额外的操作,需要对特征映射进行一定的处理和加权,从而得到更加关注目标的特征表示。这些额外的计算操作会增加模型的计算量,因此相应地会增加模型的推理时间。 然而,需要注意的是,具体的参数量和计算量增加情况与具体的实现方式和模型配置有关。在实际应用中,可以灵活调整模型结构和参数配置,以平衡模型的性能和计算资源的消耗。因此,从整体来看,引入SIMAM注意力机制可能会略微增加参数量和计算量,但可以提升模型的检测准确性。
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值