EfficientNet模型Pytorch版本具体实现

EfficientNet模型原理:EfficientNet:对模型深度、宽度和分辨率的混合缩放策略-CSDN博客

一、激活函数:

EfficientNet模型使用了Swish激活函数而不是更常见的Relu激活函数

1、公式定义

  • Swish(x) = x * sigmoid(x)
  • 是一个平滑的非线性激活函数

2、优势分析

  • 相比 ReLU:
    • 平滑可导
    • 允许负值通过
    • 不会出现梯度消失问题
  • 相比 Sigmoid:
    • 无上界限制
    • 计算更简单
    • 梯度更稳定

3、实现原理

class Swish(nn.Module):
    def forward(self, x):
        return x * torch.sigmoid(x)  # 自门控机制

4、性能特点

  • 在深层网络中表现更好
  • 训练更稳定
  • 有助于模型收敛
  • 适合 EfficientNet 这样的大型网络

5、自适应性

  • 当 x < 0 时,趋近于 0
  • 当 x > 0 时,近似线性
  • 动态调节信息流

二、自适应特征提取模块:

1、目的

  • 自适应学习特征通道间的重要性
  • 增强有用特征,抑制不重要特征
  • 提高模型对特征的利用效率

2、核心步骤

  • Squeeze(压缩):将空间信息压缩为通道描述符
  • Excitation(激发):学习通道间的相互依赖关系
  • Scale(缩放):对原始特征进行重标定

3、具体实现

class SEModule(nn.Module):
    def __init__(self, channels, se_ratio):
        super(SEModule, self).__init__()
        # 将特征图压缩为 1x1:[B,C,H,W] -> [B,C,1,1]
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        # 两个全连接层(使用卷积层实现),通过激活函数 ReLU 和 Sigmoid 来实现特征图的压缩
        self.fc1 = nn.Conv2d(channels, int(channels * se_ratio), kernel_size=1, bias=False)
        # 稀疏化特征:负值置零,减少信息冗余
        self.relu = nn.ReLU(inplace=True)
        self.fc2 = nn.Conv2d(int(channels * se_ratio), channels, kernel_size=1, bias=False)
        # 特征重标定:将特征图的每个通道的重要性系数压缩到 0~1 之间(归一化)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        # out是每个通道的重要性系数
        out = self.avg_pool(x)
        out = self.fc1(out)
        out = self.relu(out)  # max(0,x)
        out = self.fc2(out)
        out = self.sigmoid(out)  # 1/(1+e^(-x))
        return x * out

三、MBConvBlock:

1、具体实现:

class MBConvBlock(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride, expand_ratio, se_ratio, drop_rate):
        super(MBConvBlock, self).__init__()
        self.expand_ratio = expand_ratio
        self.stride = stride
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.swish = Swish()

        # 扩张阶段
        self.expanded_channels = int(in_channels * expand_ratio)
        if expand_ratio > 1:
            self.expand_conv = nn.Conv2d(in_channels, self.expanded_channels, kernel_size=1, bias=False)
            self.expand_bn = nn.BatchNorm2d(self.expanded_channels)

        # 深度可分离卷积阶段
        # 每个通道独立进行空间卷积
        self.depthwise_conv = nn.Conv2d(self.expanded_channels, self.expanded_channels, kernel_size=kernel_size,
                                        stride=stride, groups=self.expanded_channels, bias=False, padding=kernel_size//2)
        self.depthwise_bn = nn.BatchNorm2d(self.expanded_channels)

        # SE注意力模块
        self.se = SEModule(self.expanded_channels, se_ratio)

        # 输出阶段
        self.project_conv = nn.Conv2d(self.expanded_channels, out_channels, kernel_size=1, bias=False)
        self.project_bn = nn.BatchNorm2d(out_channels)
        self.dropout = nn.Dropout(drop_rate)

    def forward(self, x):
        out = x
        if self.expand_ratio > 1:
            out = self.expand_conv(out)
            out = self.expand_bn(out)
            out = self.swish(out)

        out = self.depthwise_conv(out)
        out = self.depthwise_bn(out)
        out = self.swish(out)

        out = self.se(out)

        out = self.project_conv(out)
        out = self.project_bn(out)
        out = self.dropout(out)

        if self.in_channels == self.out_channels and self.stride == 1:
            out = out + x
        return out

 2、卷积层分析:

2.1、扩张阶段 (当 expand_ratio > 1)

  • 1x1 卷积层: self.expand_conv

2.2、深度可分离卷积阶段

  • KxK 深度卷积: self.depthwise_conv

2.3、SE模块中的卷积

  • 1x1 降维卷积: self.fc1
  • 1x1 升维卷积: self.fc2

2.4、输出阶段

  • 1x1 投影卷积: self.project_conv

2.5、总结:

  • 当 expand_ratio > 1 时:共 5 个卷积层
    • 1个扩张卷积
    • 1个深度卷积
    • 2个SE模块卷积
    • 1个投影卷积
  • 当 expand_ratio = 1 时:共 4 个卷积层
    • 不包含扩张卷积
    • 其他层保持不变

四、宽度与深度扩展:

1、深度扩展

向上取整确保至少保持原有深度

def _round_repeats(self, repeats, depth_multiplier):
    """计算模块的重复次数"""
    return int(math.ceil(depth_multiplier * repeats))

2、宽度扩展

确保通道数是8的倍数,便于硬件优化

def _round_filters(self, filters, width_multiplier):
        """计算卷积层的输出通道数"""
        filters *= width_multiplier
        new_filters = max(8, int(filters + 4) // 8 * 8)
        # 确保通道数不会减少超过10%
        if new_filters < 0.9 * filters:
            new_filters += 8
        return int(new_filters)

五、完整模型:

1、具体实现:

class EfficientNet(nn.Module):
    def __init__(self, width_multiplier=1.0, depth_multiplier=1.0, dropout=0.2, num_classes=10):
        super(EfficientNet, self).__init__()
        self.num_classes = num_classes

        # 基础配置
        settings = [
            # expand_ratio, channels, repeats, stride, kernel_size
            [1, 16, 1, 1, 3],
            [6, 24, 2, 2, 3],
            [6, 40, 2, 2, 5],
            [6, 80, 3, 2, 3],
            [6, 112, 3, 1, 5],
            [6, 192, 4, 2, 5],
            [6, 320, 1, 1, 3]
        ]
        stem_channels = self._round_filters(32, width_multiplier)
        # Define the inverted residual blocks
        self.stem = nn.Sequential(
            nn.Conv2d(3, stem_channels, 3, stride=2, padding=1, bias=False),
            nn.BatchNorm2d(stem_channels),
            Swish()
        )

        # 构建主干网络
        self.blocks = nn.Sequential()
        input_channels = stem_channels
        total_blocks = sum(setting[2] for setting in settings)
        block_idx = 0

        for idx, (expand_ratio, channels, repeats, stride, kernel_size) in enumerate(settings):
            output_channels = self._round_filters(channels, width_multiplier)
            repeats = self._round_repeats(repeats, depth_multiplier)

            for i in range(repeats):
                drop_rate = dropout * block_idx / total_blocks
                self.blocks.add_module(f'block_{block_idx}',
                                       MBConvBlock(
                                           in_channels=input_channels,
                                           out_channels=output_channels,
                                           kernel_size=kernel_size,
                                           stride=stride if i == 0 else 1,
                                           expand_ratio=expand_ratio,
                                           se_ratio=0.25,
                                           drop_rate=drop_rate
                                       ))
                input_channels = output_channels
                block_idx += 1

        # Head
        head_channels = self._round_filters(1280, width_multiplier)
        self.head = nn.Sequential(
            nn.Conv2d(input_channels, head_channels, 1, bias=False),
            nn.BatchNorm2d(head_channels),
            Swish()
        )

        self.avgpool = nn.AdaptiveAvgPool2d(1)
        self.classifier = nn.Sequential(
            nn.Dropout(dropout),
            nn.Linear(head_channels, num_classes)
        )

    def _round_filters(self, filters, width_multiplier):
        """计算卷积层的输出通道数"""
        filters *= width_multiplier
        new_filters = max(8, int(filters + 4) // 8 * 8)
        # 确保通道数不会减少超过10%
        if new_filters < 0.9 * filters:
            new_filters += 8
        return int(new_filters)

    def _round_repeats(self, repeats, depth_multiplier):
        """计算模块的重复次数"""
        return int(math.ceil(depth_multiplier * repeats))

    def forward(self, x):
        x = self.stem(x)
        x = self.blocks(x)
        x = self.head(x)
        x = self.avgpool(x)
        x = x.flatten(1)
        x = self.classifier(x)
        return x

2、卷积层分析:

 # 基础配置
        settings = [
            # expand_ratio, channels, repeats, stride, kernel_size
            [1, 16, 1, 1, 3],
            [6, 24, 2, 2, 3],
            [6, 40, 2, 2, 5],
            [6, 80, 3, 2, 3],
            [6, 112, 3, 1, 5],
            [6, 192, 4, 2, 5],
            [6, 320, 1, 1, 3]
        ]

这里的基础配置指的是B0模型中的MBConvBlock的配置,按照这个配置, B0模型的第一个MBConvBlock由于扩张比率是1,没有扩张卷积,只有4个卷积层,从第二个MBConvBlock开始,每个MBConvBlock都有5个卷积层,并且,第二个和第三个MBConvBlock重复执行2次,第四个和第五个MBConvBlock重复执行3次,第六个MBConvBlock重复执行4次、第七个卷积层执行1次。外加最开始执行的卷积和最后执行的卷积,总共需要执行81个。

六、不同版本模型配置:

def efficient_config(model_name):
    """EfficientNet 配置"""
    params_dict = {
        # (width, depth, resolution, dropout)
        'efficientnet-b0': (1.0, 1.0, 224, 0.2),
        'efficientnet-b1': (1.0, 1.1, 240, 0.2),
        'efficientnet-b2': (1.1, 1.2, 260, 0.3),
        'efficientnet-b3': (1.2, 1.4, 300, 0.3),
        'efficientnet-b4': (1.4, 1.8, 380, 0.4),
        'efficientnet-b5': (1.6, 2.2, 456, 0.4),
        'efficientnet-b6': (1.8, 2.6, 528, 0.5),
        'efficientnet-b7': (2.0, 3.1, 600, 0.5)
    }
    return params_dict[model_name]
def build_efficientnet(model_name, num_classes=10):
    config = efficient_config(model_name)
    return EfficientNet(
        width_multiplier=config[0],
        depth_multiplier=config[1],
        dropout=config[3],
        num_classes=num_classes
    )

 

按照这个配置,B1到B7模型的outchannels和repeats为:

B1 (1.0, 1.1): 
- channels 不变
- repeats: [1, 3, 3, 4, 4, 5, 2]

B2 (1.1, 1.2):
- channels: [16, 26, 44, 88, 123, 211, 352]
- repeats: [1, 3, 3, 4, 4, 5, 2]

B3 (1.2, 1.4):
- channels: [19, 29, 48, 96, 134, 230, 384]
- repeats: [1, 3, 3, 5, 5, 6, 2]

B4 (1.4, 1.8):
- channels: [22, 34, 56, 112, 157, 269, 448]
- repeats: [2, 4, 4, 6, 6, 8, 2]

B5 (1.6, 2.2):
- channels: [26, 38, 64, 128, 179, 307, 512]
- repeats: [2, 5, 5, 7, 7, 9, 3]

B6 (1.8, 2.6):
- channels: [29, 43, 72, 144, 202, 346, 576]
- repeats: [2, 6, 6, 8, 8, 11, 3]

B7 (2.0, 3.1):
- channels: [32, 48, 80, 160, 224, 384, 640]
- repeats: [3, 7, 7, 10, 10, 13, 4]

那么B1模型需要经过111个卷积层;

那么B2模型需要经过111个卷积层;

那么B3模型需要经过126个卷积层;

那么B4模型需要经过161个卷积层;

那么B5模型需要经过191个卷积层;

那么B6模型需要经过221个卷积层;

那么B7模型需要经过271个卷积层;

七、数据处理:

使用CIAFR-10数据集:

# 数据增强
def create_transforms(image_size):
    """根据模型配置创建数据变换"""
    transform_train = transforms.Compose([
        transforms.Resize((image_size, image_size)),
        transforms.RandomHorizontalFlip(p=0.5),
        transforms.RandomRotation(15),
        transforms.RandomCrop(image_size, padding=4),
        transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1),
        transforms.ToTensor(),
        transforms.Normalize(mean=(0.4914, 0.4822, 0.4465), std=(0.2023, 0.1994, 0.2010))
    ])

    transform_test = transforms.Compose([
        transforms.Resize((image_size, image_size)),
        transforms.ToTensor(),
        transforms.Normalize(mean=(0.4914, 0.4822, 0.4465), std=(0.2023, 0.1994, 0.2010))
    ])

    return transform_train, transform_test


# 加载数据集
def load_data(data_path, model_name, num_gpus=4):
    config = efficient_config(model_name)
    image_size = config[2]
    transform_train, transform_test = create_transforms(image_size)
    train_dataset = datasets.CIFAR10(root=data_path, train=True, download=False, transform=transform_train)
    test_dataset = datasets.CIFAR10(root=data_path, train=False, download=False, transform=transform_test)

    batch_size = 32 * num_gpus

    train_loader = DataLoader(
        train_dataset,
        batch_size=batch_size,
        shuffle=True,
        num_workers=4 * num_gpus,  # 每个 GPU 配置 4 个工作进程
        pin_memory=True  # 加快数据传输到 GPU
    )

    test_loader = DataLoader(
        test_dataset,
        batch_size=512,
        shuffle=False,
        num_workers=4 * num_gpus,
        pin_memory=True
    )
    return train_loader, test_loader

八、训练与测试:

# 设置训练与测试
def train(model, device, train_loader, optimizer, epoch, criterion, scaler):
    model.train()
    total_loss = 0
    total_samples = 0

    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()

        with autocast('cuda'):
            output = model(data)
            loss = criterion(output, target)

        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()

        # 更精确的损失计算
        total_loss += loss.item() * data.size(0)
        total_samples += data.size(0)

        if batch_idx % 100 == 0:
            current_avg_loss = total_loss / total_samples
            print(
                f'Train Epoch: {epoch} [{batch_idx * len(data)}/{len(train_loader.dataset)} '
                f'({100. * batch_idx / len(train_loader):.0f}%)]\tAVG_Loss: {current_avg_loss:.6f}'
            )
def test(model, device, test_loader, criterion):
    model.eval()
    test_loss = 0
    correct = 0
    total = len(test_loader.dataset)

    with torch.no_grad():
        pbar = tqdm(test_loader, desc='Testing')
        for data, target in pbar:
            data, target = data.to(device), target.to(device)
            output = model(data)

            # 计算批次损失
            loss = criterion(output, target)
            test_loss += loss.item() * data.size(0)  # 累积实际batch size的损失

            # 计算准确率
            pred = output.argmax(dim=1, keepdim=True)
            correct += pred.eq(target.view_as(pred)).sum().item()

            # 更新进度条信息
            acc = 100. * correct / total
            pbar.set_postfix({'Loss': f'{test_loss / total:.4f}', 'Acc': f'{acc:.2f}%'})

    avg_loss = test_loss / total
    accuracy = 100. * correct / total

    print(f'\nTest set: Average loss: {avg_loss:.4f}, '
          f'Accuracy: {correct}/{total} ({accuracy:.2f}%)\n')

    return accuracy

九、模型训练配置:

if __name__ == '__main__':
    data_path = r'你的数据地址'
    model_save_path = '你的模型保存地址'
    model_name = 'efficientnet-b3'#选择要训练的模型
    # 设置多GPU
    devices = [3, 4, 6]
    device = torch.device(f"cuda:{devices[0]}")  # 主GPU

    epoch = 40
    num_classes = 10
    print('Building model...')
    model = build_efficientnet(model_name=model_name, num_classes=num_classes)
    # 使用DataParallel包装模型
    model = nn.DataParallel(model, device_ids=devices)
    model.to(device)

    print('==> Preparing data...')
    train_loader, test_loader = load_data(data_path, model_name, num_gpus=len(devices))

    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=1e-3, weight_decay=1e-4)
    model.to(device)
    scaler = GradScaler()  # 用于缩放梯度
    print('Start training...')
    best_acc = 0
    best_epoch = 0
    for epoch in range(1, epoch + 1):
        train(model, device, train_loader, optimizer, epoch, criterion, scaler)
        acc = test(model, device, test_loader, criterion)
        if acc > best_acc:
            best_acc = acc
            best_epoch = epoch
            torch.save({
                'epoch': epoch,
                'model_state_dict': model.module.state_dict(),  # 注意这里要用.module来获取原始模型
                'accuracy': acc,
            }, model_save_path)
    print('Finished Training')
    print('Start Evaing...')
    print('train_data:')
    test(model, device, train_loader, criterion)
    print(f'Best Epoch: {best_epoch}   Best Accuracy: {best_acc:.2f}%')
    print('Finished!')

十、模型训练结果:

1、实验环境:

PyTorch2.4.0
Python3.12.7
Cuda12.0
CPUAMD EPYC 7402 24-Core Processor
GPUA30(24G)
环境Linux

2、训练配置:

optimizer = optim.Adam(model.parameters(), lr=1e-5, weight_decay=0.001)
criterion = nn.CrossEntropyLoss()
epoch=40
num_classes = 10

3、训练结果:

3.1、B0模型:

3.2、B1模型:

3.3、B2模型:

3.4、B3模型:

3.5、B4模型:

由于CIFAR-10数据集过于的小,所以对于这几个模型来说性能没有太大差距,反倒是模型越复杂性能还降低了。 

### EfficientNet-B4 模型概述 EfficientNet 是一种基于复合缩放方法设计的卷积神经网络模型,通过统一调整网络深度、宽度和分辨率来优化性能[^3]。其中,EfficientNet-B4 属于该系列的一个特定变体,在计算资源受限的情况下能够提供较高的精度。 #### 架构特点 EfficientNet 的核心思想在于重新思考模型扩展的方式。传统的方法通常是单独增加模型的深度或宽度,而 EfficientNet 提出了一个更高效的解决方案:通过联合调节三个维度(深度 \(d\)、宽度 \(w\) 和分辨率 \(r\)),找到最佳的比例因子以平衡性能与效率。对于 B4 版本而言,其参数设置如下表所示: | 参数 | 值 | |------------|-------| | 宽度系数 (\(\phi_w\)) | 1.4 | | 深度系数 (\(\phi_d\)) | 1.8 | | 输入分辨率 (Image Size) | 380×380 | 这些比例因子决定了每一层的具体配置以及整体结构的变化程度[^2]。相比于基础版本 B0,B4 在上述三个方面都有显著提升,从而增强了表达能力并提高了最终效果。 #### 实现方式 以下是采用 PyTorch 编写的简单代码片段展示如何加载预训练好的 EfficientNet-B4 模型实例: ```python import torch from efficientnet_pytorch import EfficientNet # 加载带有 ImageNet 权重的 EfficientNet-B4 模型 model = EfficientNet.from_pretrained('efficientnet-b4') def forward_pass(input_tensor): output = model(input_tensor) return output ``` 此部分展示了利用第三方库 `efficientnet-pytorch` 方便快捷地获取所需架构及其初始化权重的过程。 #### 应用场景 作为一种强大的特征提取器,EfficientNet 被广泛运用于各类计算机视觉任务当中,比如但不限于图像分类、物体定位及语义分割等领域。特别是在移动设备端或者云端服务环境中部署时,B4 可凭借其良好的权衡表现脱颖而出成为优选方案之一[^1]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值