进击J3:DenseNet算法实战与解析

一、实验目的:

  1. 了解并研究 DenseNet 与 ResNetV 的区别
  2. 改进思路是否可以迁移到其他地方(自由探索)

二、实验环境:

  • 语言环境:python 3.8
  • 编译器:Jupyter notebook
  • 深度学习环境:Pytorch
    • torch==2.4.0+cu124
    • torchvision==0.19.0+cu124

三、DenseNet模型介绍

DenseNet论文原文:《Densely Connected Convolutional Networks》

1. 前言

在计算机视觉领域,卷积神经网络(CNN)已经成为最主流的方法,比如GoogLenet,VGG-16,Incepetion等模型。CNN史上的一个里程碑事件是ResNet模型的出现,ResNet可以训练出更深的CNN模型,从而实现更高的准确度。ResNet模型的核心是通过建立前面层与后面层之间的 “短路连接”(shortcuts,skip connection),进而训练出更深的CNN网络。

今天我们要介绍的是DenseNet模型,它的基本思路与ResNet一致,但是它建立的是前面所有层与后面层的 密集连接(dense connection),它的名称也是由此而来。DenseNet的另一大特色是通过特征在channel上的连接来实现 特征重用(feature reuse)。这些特点让DenseNet在参数和计算成本更少的情形下实现比ResNet更优的性能,DenseNet也因此斩获CVPR 2017的最佳论文奖。
Dense模块(5-layer,growth rate of k=4)

2. 设计理念

相比 ResNet,DenseNet 提出了一个更激进的密集连接机制:即互相连接所有的层,具体来说就是每个层都会接受其前面所有层作为其额外的输入。

图1为 ResNet 网络的残差连接机制,作为对比,图2为 DenseNet 的密集连接机制。可以看到,ResNet 是每个层与前面的某层(一般是2~4层)短路连接在一起,连接方式是通过元素相加。而在 DenseNet 中,每个层都会与前面所有层在 channel 维度上连接(concat)在一起(即元素叠加),并作为下一层的输入。

对于一个 L 层的网络,DenseNet 共包含 L(L+1) / 2个连接,相比 ResNet,这是一种密集连接。而且 DenseNet 是直接 concat 来自不同层的特征图,这可以实现特征重用,提升效率,这一特点是 DenseNet 与 ResNet 最主要的区别。
图1 ResNet网络的短路连接机制(其中+代表的是元素级相加操作)
图2 DenseNet网络的密集连接机制(其中c代表的是channel级连接操作)
图3 DenseNet的前向过程

3. 网络结构

具体介绍网络的具体实现细节如图4所示。

图4 DenseNet的网络结构

CNN 网络一般要经过 Pooling 或者 stride>1 的 Conv 来降低特征图的大小,而 DenseNet 的密集连接方式需要特征图大小保持一致。为了解决这个问题,DenseNet 网络中使用 DenseBlock+Transition 的结构,其中 DenseBlock 是包含很多层的模块,每个层的特征图大小相同,层与层之间采用密集连接方式。而 Transition 层是连接两个相邻的 DenseBlock,并且通过 Pooling 使特征图大小降低。图5给出了 DenseNet 的网络结构,它共包含4个 DenseBlock,各个 DenseBlock 之间通过 Transition 层连接在一起。

图5 使用DenseBlock+Transition的DenseNet网络

四、使用Pytorch实现DenseNet121

在这里插入图片描述

设置GPU、导入数据、划分数据集等步骤同前。

1. DenseBlock中的内部结构

# BN+ReLU+1x1Conv+BN+ReLU+3x3Conv结构,最后也加入dropout层以用于训练过程
class _DenseLayer(nn.Sequential):
    """Basic unit of DenseBlock (using bottleneck layer) """
    def __init__(self, num_input_features, growth_rate, bn_size, drop_rate):
        super(_DenseLayer, self).__init__()
        self.add_module("norm1", nn.BatchNorm2d(num_input_features))
        self.add_module("relu1", nn.ReLU(inplace=True))
        self.add_module("conv1", nn.Conv2d(num_input_features, bn_size*growth_rate,
                                           kernel_size=1, stride=1, bias=False))
        self.add_module("norm2", nn.BatchNorm2d(bn_size*growth_rate))
        self.add_module("relu2", nn.ReLU(inplace=True))
        self.add_module("conv2", nn.Conv2d(bn_size*growth_rate, growth_rate,
                                           kernel_size=3, stride=1, padding=1, bias=False))
        self.drop_rate = drop_rate

    def forward(self, x):
        new_features = super(_DenseLayer, self).forward(x)
        if self.drop_rate > 0:
            new_features = F.dropout(new_features, p=self.drop_rate, training=self.training)
        return torch.cat([x, new_features], 1)

2. DenseBlock模块

# DenseBlock模块
# 内部是密集连接方式(输入特征数线性增长)
class _DenseBlock(nn.Sequential):
    """DenseBlock"""
    def __init__(self, num_layers, num_input_features, bn_size, growth_rate, drop_rate):
        super(_DenseBlock, self).__init__()
        for i in range(num_layers):
            layer = _DenseLayer(num_input_features+i*growth_rate, growth_rate, bn_size,
                                drop_rate)
            self.add_module("denselayer%d" % (i+1,), layer)

3. Transition模块

# Transition模块
# 实现Transition层,主要是一个卷积层和一个池化层
class _Transition(nn.Sequential):
    """Transition layer between two adjacent DenseBlock"""
    def __init__(self, num_input_feature, num_output_features):
        super(_Transition, self).__init__()
        self.add_module("norm", nn.BatchNorm2d(num_input_feature))
        self.add_module("relu", nn.ReLU(inplace=True))
        self.add_module("conv", nn.Conv2d(num_input_feature, num_output_features,
                                          kernel_size=1, stride=1, bias=False))
        self.add_module("pool", nn.AvgPool2d(2, stride=2))

4. 实现DenseNet网络

# 实现DenseNet网络
from collections import OrderedDict
class DenseNet(nn.Module):
    "DenseNet-BC model"
    def __init__(self, growth_rate=32, block_config=(6, 12, 24, 16), num_init_features=64,
                 bn_size=4, compression_rate=0.5, drop_rate=0, num_classes=1000):
        """
        :param growth_rate: (int) number of filters used in DenseLayer, `k` in the paper
        :param block_config: (list of 4 ints) number of layers in each DenseBlock
        :param num_init_features: (int) number of filters in the first Conv2d
        :param bn_size: (int) the factor using in the bottleneck layer
        :param compression_rate: (float) the compression rate used in Transition Layer
        :param drop_rate: (float) the drop rate after each DenseLayer
        :param num_classes: (int) number of classes for classification
        """
        super(DenseNet, self).__init__()
        # first Conv2d
        self.features = nn.Sequential(OrderedDict([
            ("conv0", nn.Conv2d(3, num_init_features, kernel_size=7, stride=2, padding=3, bias=False)),
            ("norm0", nn.BatchNorm2d(num_init_features)),
            ("relu0", nn.ReLU(inplace=True)),
            ("pool0", nn.MaxPool2d(3, stride=2, padding=1))
        ]))

        # DenseBlock
        num_features = num_init_features
        for i, num_layers in enumerate(block_config):
            block = _DenseBlock(num_layers, num_features, bn_size, growth_rate, drop_rate)
            self.features.add_module("denseblock%d" % (i + 1), block)
            num_features += num_layers*growth_rate
            if i != len(block_config) - 1:
                transition = _Transition(num_features, int(num_features*compression_rate))
                self.features.add_module("transition%d" % (i + 1), transition)
                num_features = int(num_features * compression_rate)

        # final bn+ReLU
        self.features.add_module("norm5", nn.BatchNorm2d(num_features))
        self.features.add_module("relu5", nn.ReLU(inplace=True))

        # classification layer
        self.classifier = nn.Linear(num_features, num_classes)

        # params initialization
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.bias, 0)
                nn.init.constant_(m.weight, 1)
            elif isinstance(m, nn.Linear):
                nn.init.constant_(m.bias, 0)

    def forward(self, x):
        features = self.features(x)
        out = F.avg_pool2d(features, 7, stride=1).view(features.size(0), -1)
        out = self.classifier(out)
        return out

5. DenseNet-121网络

def densenet121(pretrained=False, **kwargs):
    """DenseNet121"""
    model = DenseNet(num_init_features=64, growth_rate=32, block_config=(6, 12, 24, 16),
                     **kwargs)

    if pretrained:
        # '.'s are no longer allowed in module names, but pervious _DenseLayer
        # has keys 'norm.1', 'relu.1', 'conv.1', 'norm.2', 'relu.2', 'conv.2'.
        # They are also in the checkpoints in model_urls. This pattern is used
        # to find such keys.
        pattern = re.compile(
            r'^(.*denselayer\d+\.(?:norm|relu|conv))\.((?:[12])\.(?:weight|bias|running_mean|running_var))$')
        state_dict = model_zoo.load_url(model_urls['densenet121'])
        for key in list(state_dict.keys()):
            res = pattern.match(key)
            if res:
                new_key = res.group(1) + res.group(2)
                state_dict[new_key] = state_dict[key]
                del state_dict[key]
        model.load_state_dict(state_dict)
    return model

model=densenet121().to(device)
# 统计模型参数量以及其他指标
import torchsummary as summary
summary.summary(model,(3,224,224))

代码输出部分截图:
在这里插入图片描述

6. 编写训练与测试函数

# 编写训练函数
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)

    train_acc, train_loss = 0, 0

    for X, y in dataloader:
        X, y = X.to(device), y.to(device)

        pred = model(X)
        loss = loss_fn(pred, y)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        train_loss += loss.item()
        train_acc += (pred.argmax(1) == y).type(torch.float).sum().item()

    train_loss /= num_batches
    train_acc /= size

    return train_acc, train_loss
# 编写测试函数
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)  # 测试集的大小
    num_batches = len(dataloader)  # 批次数目, (size/batch_size,向上取整)
    test_loss, test_acc = 0, 0

    # 当不进行训练时,停止梯度更新,节省计算内存消耗
    with torch.no_grad():
        for imgs, target in dataloader:
            imgs, target = imgs.to(device), target.to(device)

            # 计算loss
            target_pred = model(imgs)
            loss = loss_fn(target_pred, target)

            test_loss += loss.item()
            test_acc += (target_pred.argmax(1) == target).type(torch.float).sum().item()

    test_acc /= size
    test_loss /= num_batches

    return test_acc, test_loss

7. 设置损失函数和学习率

import copy

loss_fn = nn.CrossEntropyLoss()
learn_rate = 1e-4
# SGD与Adam优化器,选择其中一个
# opt = torch.optim.SGD(model.parameters(),lr=learn_rate)
opt = torch.optim.Adam(model.parameters(), lr=learn_rate)

scheduler = torch.optim.lr_scheduler.StepLR(opt, step_size=1, gamma=0.9)  # 定义学习率高度器

epochs = 100  # 设置训练模型的最大轮数为100,但可能到不了100
patience = 10  # 早停的耐心值,即如果模型连续10个周期没有准确率提升,则跳出训练

train_loss = []
train_acc = []
test_loss = []
test_acc = []
best_acc = 0  # 设置一个最佳的准确率,作为最佳模型的判别指标
no_improve_epoch = 0  # 用于跟踪准确率是否提升的计数器
epoch = 0  # 用于统计最终的训练模型的轮数,这里设置初始值为0;为绘图作准备,这里的绘图范围不是epochs = 100

8. 正式训练

# 开始训练
for epoch in range(epochs):

    model.train()
    epoch_train_acc, epoch_train_loss = train(train_dl, model, loss_fn, opt)

    model.eval()
    epoch_test_acc, epoch_test_loss = test(test_dl, model, loss_fn)

    if epoch_test_acc > best_acc:
        best_acc = epoch_test_acc
        best_model = copy.deepcopy(model)
        no_improve_epoch = 0  # 重置计数器
        # 保存最佳模型的检查点
        PATH = 'J3_best_model.pth'
        torch.save({
            'epoch': epoch,
            'model_state_dict': best_model.state_dict(),
            'optimizer_state_dict': opt.state_dict(),
            'loss': epoch_test_loss,
        }, PATH)
    else:
        no_improve_epoch += 1

    if no_improve_epoch >= patience:
        print(f"Early stop triggered at epoch {epoch + 1}")
        break  # 早停

    train_acc.append(epoch_train_acc)
    train_loss.append(epoch_train_loss)
    test_acc.append(epoch_test_acc)
    test_loss.append(epoch_test_loss)

    scheduler.step()  # 更新学习率
    lr = opt.state_dict()['param_groups'][0]['lr']

    template = ('Epoch:{:2d}, Train_acc:{:.1f}%, Train_loss:{:.3f}, Test_acc:{:.1f}%, Test_loss:{:.3f}, Lr:{:.2E}')
    print(
        template.format(epoch + 1, epoch_train_acc * 100, epoch_train_loss, epoch_test_acc * 100, epoch_test_loss, lr))

代码输出:

Epoch: 1, Train_acc:57.1%, Train_loss:4.055, Test_acc:70.8%, Test_loss:1.865, Lr:9.00E-05
Epoch: 2, Train_acc:74.8%, Train_loss:1.207, Test_acc:74.3%, Test_loss:1.041, Lr:8.10E-05
Epoch: 3, Train_acc:77.4%, Train_loss:0.728, Test_acc:67.3%, Test_loss:0.993, Lr:7.29E-05
Epoch: 4, Train_acc:84.3%, Train_loss:0.523, Test_acc:77.9%, Test_loss:0.708, Lr:6.56E-05
Epoch: 5, Train_acc:88.9%, Train_loss:0.403, Test_acc:85.8%, Test_loss:0.512, Lr:5.90E-05
Epoch: 6, Train_acc:88.5%, Train_loss:0.401, Test_acc:81.4%, Test_loss:0.669, Lr:5.31E-05
Epoch: 7, Train_acc:93.4%, Train_loss:0.271, Test_acc:81.4%, Test_loss:0.668, Lr:4.78E-05
Epoch: 8, Train_acc:90.3%, Train_loss:0.286, Test_acc:85.8%, Test_loss:0.475, Lr:4.30E-05
Epoch: 9, Train_acc:93.6%, Train_loss:0.236, Test_acc:87.6%, Test_loss:0.481, Lr:3.87E-05
Epoch:10, Train_acc:94.9%, Train_loss:0.204, Test_acc:86.7%, Test_loss:0.402, Lr:3.49E-05
Epoch:11, Train_acc:95.1%, Train_loss:0.191, Test_acc:86.7%, Test_loss:0.509, Lr:3.14E-05
Epoch:12, Train_acc:96.2%, Train_loss:0.168, Test_acc:86.7%, Test_loss:0.448, Lr:2.82E-05
Epoch:13, Train_acc:96.7%, Train_loss:0.155, Test_acc:86.7%, Test_loss:0.412, Lr:2.54E-05
Epoch:14, Train_acc:96.0%, Train_loss:0.139, Test_acc:86.7%, Test_loss:1.107, Lr:2.29E-05
Epoch:15, Train_acc:97.8%, Train_loss:0.121, Test_acc:87.6%, Test_loss:0.444, Lr:2.06E-05
Epoch:16, Train_acc:97.6%, Train_loss:0.116, Test_acc:90.3%, Test_loss:0.390, Lr:1.85E-05
Epoch:17, Train_acc:96.7%, Train_loss:0.133, Test_acc:90.3%, Test_loss:0.387, Lr:1.67E-05
Epoch:18, Train_acc:98.5%, Train_loss:0.098, Test_acc:89.4%, Test_loss:0.449, Lr:1.50E-05
Epoch:19, Train_acc:99.3%, Train_loss:0.086, Test_acc:88.5%, Test_loss:0.448, Lr:1.35E-05
Epoch:20, Train_acc:97.3%, Train_loss:0.112, Test_acc:88.5%, Test_loss:0.413, Lr:1.22E-05
Epoch:21, Train_acc:97.1%, Train_loss:0.133, Test_acc:88.5%, Test_loss:0.414, Lr:1.09E-05
Epoch:22, Train_acc:98.7%, Train_loss:0.089, Test_acc:90.3%, Test_loss:0.432, Lr:9.85E-06
Epoch:23, Train_acc:98.0%, Train_loss:0.096, Test_acc:89.4%, Test_loss:0.376, Lr:8.86E-06
Epoch:24, Train_acc:99.6%, Train_loss:0.062, Test_acc:91.2%, Test_loss:0.348, Lr:7.98E-06
Epoch:25, Train_acc:97.8%, Train_loss:0.089, Test_acc:90.3%, Test_loss:0.641, Lr:7.18E-06
Epoch:26, Train_acc:97.1%, Train_loss:0.110, Test_acc:89.4%, Test_loss:0.391, Lr:6.46E-06
Epoch:27, Train_acc:99.3%, Train_loss:0.074, Test_acc:89.4%, Test_loss:0.363, Lr:5.81E-06
Epoch:28, Train_acc:98.2%, Train_loss:0.099, Test_acc:89.4%, Test_loss:0.389, Lr:5.23E-06
Epoch:29, Train_acc:98.7%, Train_loss:0.080, Test_acc:87.6%, Test_loss:0.454, Lr:4.71E-06
Epoch:30, Train_acc:98.0%, Train_loss:0.093, Test_acc:90.3%, Test_loss:0.354, Lr:4.24E-06
Epoch:31, Train_acc:99.1%, Train_loss:0.077, Test_acc:90.3%, Test_loss:0.586, Lr:3.82E-06
Epoch:32, Train_acc:98.9%, Train_loss:0.082, Test_acc:91.2%, Test_loss:0.339, Lr:3.43E-06
Epoch:33, Train_acc:98.0%, Train_loss:0.109, Test_acc:89.4%, Test_loss:0.383, Lr:3.09E-06
Early stop triggered at epoch 34

9. 结果可视化

# 结果可视化
# Loss与Accuracy图

import matplotlib.pyplot as plt
# 隐藏警告
import warnings
warnings.filterwarnings("ignore")  # 忽略警告信息
plt.rcParams['font.sans-serif'] = ['SimHei']  # 用来正常显示中文标签
plt.rcParams['axes.unicode_minus'] = False  # 用来正常显示负号
plt.rcParams['figure.dpi'] = 100  # 分辨率

epochs_range = range(epoch)

plt.figure(figsize=(12, 3))
plt.subplot(1, 2, 1)

plt.plot(epochs_range, train_acc, label='Training Accuracy')
plt.plot(epochs_range, test_acc, label='Test Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')

plt.subplot(1, 2, 2)
plt.plot(epochs_range, train_loss, label='Training Loss')
plt.plot(epochs_range, test_loss, label='Test Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()

请添加图片描述

10. 预测

from PIL import Image

classes = list(total_data.class_to_idx)


def predict_one_image(image_path, model, transform, classes):
    test_img = Image.open(image_path).convert('RGB')
    plt.imshow(test_img)  # 展示预测的图片

    test_img = transform(test_img)
    img = test_img.to(device).unsqueeze(0)

    model.eval()
    output = model(img)

    _, pred = torch.max(output, 1)
    pred_class = classes[pred]
    print(f'预测结果是:{pred_class}')
    
import os
from pathlib import Path
import random

#从所有的图片的随机选择一张图片

image=[]
def image_path(data_dir):
    file_list=os.listdir(data_dir)                       #列出四个分类标签
    data_file_dir=file_list                              #从四个分类标签中随机选择一个
    data_dir=Path(data_dir)
    for i in data_file_dir:
        i=Path(i)
        image_file_path=data_dir.joinpath(i)            #拼接路径
        data_file_paths=image_file_path.iterdir()       #罗列文件夹的内容
        data_file_paths=list(data_file_paths)           #要转换为列表
        image.append(data_file_paths)
    file=random.choice(image)                           #从所有的图像中随机选择一类
    file=random.choice(file)                            #从选择的类中随机选择一张图片
    return file

data_dir='./bird_photos'
image_path=image_path(data_dir)

# 预测训练集中的某张照片
predict_one_image(image_path=image_path,
                  model=model,
                  transform=train_transforms,
                  classes=classes)

预测结果是:Cockatoo

# 模型评估
# 将参数加载到model当中
best_model.load_state_dict(torch.load(PATH,map_location=device))
epoch_test_acc,epoch_test_loss=test(test_dl,best_model,loss_fn)
epoch_test_acc,epoch_test_loss
(0.8849557522123894, 0.3951910634835561)

总结

通过对比本次 DenseNet-121 与上回 ResNet50V2 的实验结果,分析如下。

DenseNet 模型表现:

  • 训练准确率:从57.1%逐渐增加到98.7%,显示出模型在训练集上的学习能力非常强。
  • 训练损失:从4.055显著下降到0.089,表明模型在训练过程中逐渐减小了误差。
  • 测试准确率:从70.8%增加到91.2%,显示出模型在未见过的数据上也有很好的泛化能力。
  • 测试损失:从1.865下降到0.339,表明模型在测试集上的表现越来越好。

ResNet 模型表现:

  • 训练准确率:从48.9%增加到100%,显示出模型在训练集上同样具有很好的学习能力。
  • 训练损失:从3.001显著下降到0.012,同样表明模型在训练过程中误差减小。
  • 测试准确率:从61.9%增加到90.3%,尽管在Epoch 36时有所下降,但整体上模型在测试集上的泛化能力也很好。
  • 测试损失:从1.048下降到0.335,显示出模型在测试集上的表现逐渐变好。

比较两个模型:

  1. 训练准确率和损失:两个模型都显示出随着训练的进行,准确率显著提高,损失显著降低。DenseNet在训练准确率上稍微落后于ResNet,但在训练损失上两者表现相近。
  2. 测试准确率和损失:DenseNet的测试准确率在大部分时间里都高于ResNet,但在最后几个epoch中,ResNet的测试准确率有所提高,最终两者非常接近。测试损失方面,DenseNet整体上优于ResNet。
  3. 学习率调整:两个模型都使用了学习率衰减策略,随着epoch的增加,学习率逐渐减小,这有助于模型在训练过程中稳定下来。
  4. 网络训练的加速:这方面主要通过什么指标评估?回顾每一epoch的平均执行时间ResNet是快过DenseNet的。

结论:

  • DenseNet和ResNet都表现出了很好的学习能力和泛化能力。
  • DenseNet在测试准确率上整体上略优于ResNet,但在最后几个epoch中,ResNet迎头赶上。
  • 两个模型的训练损失都很低,显示出它们在训练集上都有很好的拟合度。
  • 根据这些数据,我们不能断定哪个模型绝对更好,因为它们在不同的epoch表现出不同的优势。选择哪个模型可能取决于具体的应用场景和需求。

注意:

  • 两个模型都在训练过程中触发了早停机制,这有助于防止过拟合。
  • 测试损失的突然增加(如DenseNet在Epoch 14和ResNet在Epoch 32)可能是由于模型在某些epoch对测试集的泛化能力下降,这可能是由于模型复杂度、数据分布变化或其他因素造成的。

DenseNet121模型结构大图:

在这里插入图片描述

  • 12
    点赞
  • 15
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值