深度学习-第J6周：ResNeXt-50实战解析

本文链接：https://blog.csdn.net/weixin_43397208/article/details/131988767

🍨 本文为[🔗365天深度学习训练营]内部限免文章（版权归 *K同学啊* 所有）
🍖 作者：[K同学啊]

前言

ResNeXt是由何凯明团队在2017年CVPR会议上提出来的新型图像分类网络。在论文《Aggregated Residual Transformations for Deep Neural Networks》作者提出了当时普遍存在的问题：如何提高模型的准确率？

常用的方法是提高网络的深度或宽度，但单纯的提高网络的深度或宽度，加大了设计的难度，也加大了计算的开销。由此何团队设计了cardinality的概念。将卷积通道分组，再对分组进行卷积。

对比ResNet跟ResNeXt

ResNeXt采用分组卷积的方式，将特征图分为不同的组，再对每组特征图进行卷积，在分组卷积中，每个卷积核只处理一部分的通道

一、分组卷积

我们在ResNet50的基础上进行修改，先设计组采样的Bottleneck模块

class Bottleneck(nn.Module):  # 定义残差块，renet50、resnet101、resnet152使用此残差块
    expansion = 4  # 残差操作维度变化倍数

    def __init__(self, in_channel, out_channel, stride=1, downsample=None, groups=1, base_width=64):  # 初始化方法
        super(Bottleneck, self).__init__()  # 继承初始化方法
        width = int(in_channel * (base_width / 64.0)) * groups  # F(x)第二个卷积的通道数
        self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=width, kernel_size=1, stride=1,
                               bias=False)  # conv操作
        self.bn1 = nn.BatchNorm2d(num_features=width)  # bn操作
        self.conv2 = nn.Conv2d(in_channels=width, out_channels=width, groups=groups, kernel_size=3, stride=stride,
                               padding=1, bias=False)  # conv操作，若为ResNeXt网络，则这里为group conv操作
        self.bn2 = nn.BatchNorm2d(num_features=width)  # bn操作
        self.conv3 = nn.Conv2d(in_channels=width, out_channels=out_channel * self.expansion, kernel_size=1, stride=1,
                               bias=False)  # conv操作
        self.bn3 = nn.BatchNorm2d(num_features=out_channel * self.expansion)  # bn操作

        self.relu = nn.ReLU(inplace=True)  # relu激活函数
        self.downsample = downsample  # 是否下采样

    def forward(self, x):  # 前传函数
        identity = x  # 原始x
        if self.downsample:  # 如果下采样
            identity = self.downsample(x)  # 残差边存在conv操作，x-->x'

        x = self.conv1(x)  # conv操作
        x = self.bn1(x)  # bn操作
        x = self.relu(x)  # relu激活函数

        x = self.conv2(x)  # conv操作
        x = self.bn2(x)  # bn操作
        x = self.relu(x)  # relu激活函数

        x = self.conv3(x)  # conv操作
        x = self.bn3(x)  # bn操作

        x += identity  # F(x)+x/x'
        x = self.relu(x)  # relu激活函数

        return x

其中downsample是向下的组，每一层layer都向下采样一次

        if stride != 1 or self.channel != channel * block.expansion:  # 如果卷积步长不为1或卷积前后通道数不一致，则需要对原始x进行操作
            downsample = nn.Sequential(
                nn.Conv2d(in_channels=self.channel, out_channels=channel * block.expansion, kernel_size=1,  # conv操作
                          stride=stride, bias=False),
                nn.BatchNorm2d(num_features=channel * block.expansion)  # bn操作
            )

二、ResNeXt50_Model模型

class ResNeXt50_Model(nn.Module):
    def __init__(self, in_channel=3, N_classes=1000):
        super(ResNeXt50_Model, self).__init__()
        self.in_channels = in_channel
        self.layers = [2, 3, 5, 2]
        # ============= 基础层
        # 方法1
        self.zeropadding2d = nn.ZeroPad2d(3)
        self.cov0 = nn.Conv2d(self.in_channels, out_channels=64, kernel_size=7, stride=2, padding=3)
        self.bn0 = nn.BatchNorm2d(num_features=64)
        self.relu0 = nn.ReLU(inplace=False)
        self.maxpool0 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        
        self.channel = 64
        self.groups = 1
        self.base_width = 64
        
        self.layer1 = self._make_layer(Bottleneck, 64, self.layers[0])  # 第一块残差集合，由基本的残差块组成
        self.layer2 = self._make_layer(Bottleneck, 128, self.layers[1], stride=2)  # 第二块残差集合，由基本的残差块组成
        self.layer3 = self._make_layer(Bottleneck, 256, self.layers[2], stride=2)  # 第三块残差集合，由基本的残差块组成
        self.layer4 = self._make_layer(Bottleneck, 512, self.layers[3], stride=2)  # 第四块残差集合，由基本的残差块组成

        # 输出网络
        self.avgpool = nn.AvgPool2d((7, 7))
        # classfication layer
        # 7*7均值后2048个参数
        self.fc = nn.Sequential(nn.Linear(2048, N_classes),
                                nn.Softmax(dim=1))

        for m in self.modules():  # 遍历模型结构
            if isinstance(m, nn.Conv2d):  # 如果当前结构是卷积操作
                nn.init.kaiming_normal_(m.weight, mode="fan_out", nonlinearity="relu")  # 使用kaiming初始化方法

    def basic_layer1(self, x):
        '''
        input:  x = tensor(3, 224, 224).unsqueeze(0)
         Layer (type)               Output Shape         Param #
      ================================================================
            Conv2d-1         [-1, 64, 112, 112]           9,408
       BatchNorm2d-2         [-1, 64, 112, 112]             128
              ReLU-3         [-1, 64, 112, 112]               0
         MaxPool2d-4           [-1, 64, 56, 56]               0
      ================================================================   
        '''
        x = self.zeropadding2d(x)
        x = self.cov0(x)
        x = self.bn0(x)
        x = self.relu0(x)
        x = self.maxpool0(x)
        
        return x
    
    def _make_layer(self, block, channel, blocks, stride=1):  # 定义函数，用于生成模型结构
        downsample = None  # 默认不对原始x进行操作

        if stride != 1 or self.channel != channel * block.expansion:  # 如果卷积步长不为1或卷积前后通道数不一致，则需要对原始x进行操作
            downsample = nn.Sequential(
                nn.Conv2d(in_channels=self.channel, out_channels=channel * block.expansion, kernel_size=1,  # conv操作
                          stride=stride, bias=False),
                nn.BatchNorm2d(num_features=channel * block.expansion)  # bn操作
            )
        layers = []  # 列表用于存放模型结构

        layers.append(block(self.channel, channel, downsample=downsample, stride=stride, groups=self.groups,
                            base_width=self.base_width))  # 模型追加block结构
        self.channel = channel * block.expansion  # 通道数转换为卷积后输出通道数
        for _ in range(1, blocks):  # 进行blocks次循环
            layers.append(block(self.channel, channel, groups=self.groups, base_width=self.base_width))  # 模型追加block结构
        return nn.Sequential(*layers)  # 返回模型结构

    def forward(self, x):
        
        x = self.basic_layer1(x)
        
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.fc(x)

        return x

三、测试模型

设定epochs=100, 早停为early_stop=10

import torch
from torchvision import datasets, transforms
import torch.nn as nn
import time
import numpy as np
import matplotlib.pyplot as plt
import torch.nn.functional as F 
import torchsummary as summary
import copy
import os

data_dir = './J3-data'

def random_split_imagefolder(data_dir, transforms, random_split_rate=0.8):
    
    _total_data = datasets.ImageFolder(data_dir, transform=transforms)
    
    train_size = int(random_split_rate * len(_total_data))
    test_size = len(_total_data) - train_size
    
    _train_datasets, _test_datasets =  torch.utils.data.random_split(_total_data, [train_size, test_size])

    return _total_data, _train_datasets, _test_datasets

N_classes=2
batch_size = 32
mean = [0.4958, 0.4984, 0.4068]
std = [0.2093, 0.2026, 0.2170]
# 真实均值-标准差重新读取数据
real_transforms = transforms.Compose(
        [
        transforms.Resize([224, 224]),#中心裁剪到224*224
        transforms.ToTensor(),#转化成张量
        transforms.Normalize(mean, std)
])
total_data, train_datasets, test_datasets = random_split_imagefolder(data_dir, real_transforms, 0.8)

# 批读取文件
train_data = torch.utils.data.DataLoader(train_datasets, batch_size=batch_size, shuffle=True, num_workers=8)
test_data = torch.utils.data.DataLoader(test_datasets, batch_size=batch_size, shuffle=True, num_workers=8)

train_data_size = len(train_datasets)
test_data_size = len(test_datasets)

def train_and_test(model, loss_func, optimizer, epochs=100, early_stop=10):
    
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    model.to(device)
    summary.summary(model, (3, 224, 224))
    
    record = []
    best_acc = 0.0
    best_epoch = 0
    stop_steps = 0
    for epoch in range(epochs):#训练epochs轮
            epoch_start = time.time()
            print("Epoch: {}/{}".format(epoch + 1, epochs))
    
            model.train()#训练
    
            train_loss = 0.0
            train_acc = 0.0
            valid_loss = 0.0
            valid_acc = 0.0
    
            for i, (inputs, labels) in enumerate(train_data):
                inputs = inputs.to(device)
                labels = labels.to(device)
                #print(labels)
                # 记得清零
                optimizer.zero_grad()
    
                outputs = model(inputs)
    
                loss = loss_func(outputs, labels)
    
                loss.backward()
    
                optimizer.step()
    
                train_loss += loss.item() * inputs.size(0)
                if i%10==0:
                    print("train data: {:01d} / {:03d} outputs: {}".format(i, len(train_data), outputs.data[0]))
                ret, predictions = torch.max(outputs.data, 1)
                correct_counts = predictions.eq(labels.data.view_as(predictions))
    
                acc = torch.mean(correct_counts.type(torch.FloatTensor))
    
                train_acc += acc.item() * inputs.size(0)

            with torch.no_grad():
                model.eval()#验证
    
                for j, (inputs, labels) in enumerate(test_data):
                    inputs = inputs.to(device)
                    labels = labels.to(device)
    
                    outputs = model(inputs)
    
                    loss = loss_func(outputs, labels)
    
                    valid_loss += loss.item() * inputs.size(0)
                    if j%10==0:
                        print("val data: {:01d} / {:03d} outputs: {}".format(j, len(test_data), outputs.data[0]))
                    ret, predictions = torch.max(outputs.data, 1)
                    correct_counts = predictions.eq(labels.data.view_as(predictions))
    
                    acc = torch.mean(correct_counts.type(torch.FloatTensor))
    
                    valid_acc += acc.item() * inputs.size(0)
    
            avg_train_loss = train_loss / train_data_size
            avg_train_acc = train_acc / train_data_size
    
            avg_valid_loss = valid_loss / test_data_size
            avg_valid_acc = valid_acc / test_data_size
    
    
            record.append([avg_train_loss, avg_valid_loss, avg_train_acc, avg_valid_acc])
    
            if avg_valid_acc > best_acc  :#记录最高准确性的模型
                best_acc = avg_valid_acc
                stop_steps = 0
                best_epoch = epoch + 1
                best_param = copy.deepcopy(model.state_dict())
            else:
                stop_steps += 1
                if stop_steps >=  early_stop:
                    break
            
            epoch_end = time.time()
    
            print("Epoch: {:03d}, Training: Loss: {:.4f}, Accuracy: {:.4f}%, \n\t\tValidation: Loss: {:.4f}, Accuracy: {:.4f}%, Time: {:.4f}s".format(
                    epoch + 1, avg_valid_loss, avg_train_acc * 100, avg_valid_loss, avg_valid_acc * 100,
                    epoch_end - epoch_start))
            print("Best Accuracy for validation : {:.4f} at epoch {:03d}".format(best_acc, best_epoch))    
    
    model.load_state_dict(best_param)
    
    return model, record
#%%
if __name__=='__main__':
    
    early_stop=10
    epochs = 100
    model = ResNeXt50_Model(3, N_classes)
    
    loss_func = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(),lr=0.0001)
    model, record = train_and_test(model, loss_func, optimizer, epochs, early_stop)

    torch.save(model.state_dict(), './Best_ResNeXt50.pth')

    record = np.array(record)
    plt.plot(record[:, 0:2])
    plt.legend(['Train Loss', 'Valid Loss'])
    plt.xlabel('Epoch Number')
    plt.ylabel('Loss')
    plt.ylim(0, 1.5)
    plt.savefig('Loss_J3_1.png')
    plt.show()

    plt.plot(record[:, 2:4])
    plt.legend(['Train Accuracy', 'Valid Accuracy'])
    plt.xlabel('Epoch Number')
    plt.ylabel('Accuracy')
    plt.ylim(0, 1)
    plt.savefig('Accuracy_J3_1.png')
    plt.show()

四、运行结果

模型打印

Bottleneck-49 [-1, 512, 29, 29] 0
Conv2d-50 [-1, 512, 29, 29] 262,144
BatchNorm2d-51 [-1, 512, 29, 29] 1,024
ReLU-52 [-1, 512, 29, 29] 0
Conv2d-53 [-1, 512, 29, 29] 2,359,296
BatchNorm2d-54 [-1, 512, 29, 29] 1,024
ReLU-55 [-1, 512, 29, 29] 0
Conv2d-56 [-1, 512, 29, 29] 262,144
BatchNorm2d-57 [-1, 512, 29, 29] 1,024
ReLU-58 [-1, 512, 29, 29] 0
Bottleneck-59 [-1, 512, 29, 29] 0
Conv2d-60 [-1, 1024, 15, 15] 524,288
BatchNorm2d-61 [-1, 1024, 15, 15] 2,048
Conv2d-62 [-1, 512, 29, 29] 262,144
BatchNorm2d-63 [-1, 512, 29, 29] 1,024
ReLU-64 [-1, 512, 29, 29] 0
Conv2d-65 [-1, 512, 15, 15] 2,359,296
BatchNorm2d-66 [-1, 512, 15, 15] 1,024
ReLU-67 [-1, 512, 15, 15] 0
Conv2d-68 [-1, 1024, 15, 15] 524,288
BatchNorm2d-69 [-1, 1024, 15, 15] 2,048
ReLU-70 [-1, 1024, 15, 15] 0
Bottleneck-71 [-1, 1024, 15, 15] 0
Conv2d-72 [-1, 1024, 15, 15] 1,048,576
BatchNorm2d-73 [-1, 1024, 15, 15] 2,048
ReLU-74 [-1, 1024, 15, 15] 0
Conv2d-75 [-1, 1024, 15, 15] 9,437,184
BatchNorm2d-76 [-1, 1024, 15, 15] 2,048
ReLU-77 [-1, 1024, 15, 15] 0
Conv2d-78 [-1, 1024, 15, 15] 1,048,576
BatchNorm2d-79 [-1, 1024, 15, 15] 2,048
ReLU-80 [-1, 1024, 15, 15] 0
Bottleneck-81 [-1, 1024, 15, 15] 0
Conv2d-82 [-1, 1024, 15, 15] 1,048,576
BatchNorm2d-83 [-1, 1024, 15, 15] 2,048
ReLU-84 [-1, 1024, 15, 15] 0
Conv2d-85 [-1, 1024, 15, 15] 9,437,184
BatchNorm2d-86 [-1, 1024, 15, 15] 2,048
ReLU-87 [-1, 1024, 15, 15] 0
Conv2d-88 [-1, 1024, 15, 15] 1,048,576
BatchNorm2d-89 [-1, 1024, 15, 15] 2,048
ReLU-90 [-1, 1024, 15, 15] 0
Bottleneck-91 [-1, 1024, 15, 15] 0
Conv2d-92 [-1, 1024, 15, 15] 1,048,576
BatchNorm2d-93 [-1, 1024, 15, 15] 2,048
ReLU-94 [-1, 1024, 15, 15] 0
Conv2d-95 [-1, 1024, 15, 15] 9,437,184
BatchNorm2d-96 [-1, 1024, 15, 15] 2,048
ReLU-97 [-1, 1024, 15, 15] 0
Conv2d-98 [-1, 1024, 15, 15] 1,048,576
BatchNorm2d-99 [-1, 1024, 15, 15] 2,048
ReLU-100 [-1, 1024, 15, 15] 0
Bottleneck-101 [-1, 1024, 15, 15] 0
Conv2d-102 [-1, 1024, 15, 15] 1,048,576
BatchNorm2d-103 [-1, 1024, 15, 15] 2,048
ReLU-104 [-1, 1024, 15, 15] 0
Conv2d-105 [-1, 1024, 15, 15] 9,437,184
BatchNorm2d-106 [-1, 1024, 15, 15] 2,048
ReLU-107 [-1, 1024, 15, 15] 0
Conv2d-108 [-1, 1024, 15, 15] 1,048,576
BatchNorm2d-109 [-1, 1024, 15, 15] 2,048
ReLU-110 [-1, 1024, 15, 15] 0
Bottleneck-111 [-1, 1024, 15, 15] 0
Conv2d-112 [-1, 2048, 8, 8] 2,097,152
BatchNorm2d-113 [-1, 2048, 8, 8] 4,096
Conv2d-114 [-1, 1024, 15, 15] 1,048,576
BatchNorm2d-115 [-1, 1024, 15, 15] 2,048
ReLU-116 [-1, 1024, 15, 15] 0
Conv2d-117 [-1, 1024, 8, 8] 9,437,184
BatchNorm2d-118 [-1, 1024, 8, 8] 2,048
ReLU-119 [-1, 1024, 8, 8] 0
Conv2d-120 [-1, 2048, 8, 8] 2,097,152
BatchNorm2d-121 [-1, 2048, 8, 8] 4,096
ReLU-122 [-1, 2048, 8, 8] 0
Bottleneck-123 [-1, 2048, 8, 8] 0
Conv2d-124 [-1, 2048, 8, 8] 4,194,304
BatchNorm2d-125 [-1, 2048, 8, 8] 4,096
ReLU-126 [-1, 2048, 8, 8] 0
Conv2d-127 [-1, 2048, 8, 8] 37,748,736
BatchNorm2d-128 [-1, 2048, 8, 8] 4,096
ReLU-129 [-1, 2048, 8, 8] 0
Conv2d-130 [-1, 2048, 8, 8] 4,194,304
BatchNorm2d-131 [-1, 2048, 8, 8] 4,096
ReLU-132 [-1, 2048, 8, 8] 0
Bottleneck-133 [-1, 2048, 8, 8] 0
AvgPool2d-134 [-1, 2048, 1, 1] 0
Linear-135 [-1, 2] 4,098
Softmax-136 [-1, 2] 0
================================================================
Total params: 118,185,090
Trainable params: 118,185,090
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 363.40
Params size (MB): 450.84
Estimated Total Size (MB): 814.81
----------------------------------------------------------------