动手学习深度学习14-微调-热狗识别代码

love_and_hope

已于 2024-09-28 14:25:34 修改

阅读量401

点赞数 17

分类专栏：动手学习深度学习文章标签： python 学习深度学习人工智能计算机视觉

于 2024-09-28 14:24:24 首次发布

本文链接：https://blog.csdn.net/weixin_68930974/article/details/142597853

版权

动手学习深度学习专栏收录该内容

1 篇文章 0 订阅

订阅专栏

学习网站：14.2. 微调 — 《动手学深度学习》 1.0.3 documentation (d2l.ai)

在热狗数据集上微调 ResNet 模型（已在ImageNet数据集上进行预训练）

%matplotlib inline
import os
import torch
import torchvision
from torch import nn
from d2l import torch as d2l

一、读取数据集

1.下载数据集

#@save
d2l.DATA_HUB['hotdog'] = (d2l.DATA_URL + 'hotdog.zip',
                         'fba480ffa8aa7e0febbb511d181409f899b9baa5')

data_dir = d2l.download_extract('hotdog')

第一行代码在 d2l.DATA_HUB 字典中添加了一个新的数据集条目，键为 'hotdog'，值是一个元组。

d2l.DATA_URL + 'hotdog.zip' 是该数据集的 URL（d2l.DATA_URL 是数据的基础 URL，'hotdog.zip' 是具体的文件名）。
'fba480ffa8aa7e0febbb511d181409f899b9baa5' 是该文件的 SHA-1 哈希值，用于确保下载的数据的完整性和正确性。

第二行代码调用了 d2l.download_extract 函数，将 hotdog 数据集下载到本地并解压缩。

download_extract('hotdog') 函数会使用上面设置的数据源 URL 下载 'hotdog.zip'，并且会对下载的数据进行完整性验证（基于前面提到的 SHA-1 哈希值）。
下载完成后，这个函数会自动解压缩文件，并将解压缩的文件路径存储到 data_dir 变量中。
文件夹结构（清楚的了解文件夹的结构）

2.读取训练集和测试集

train_imgs = torchvision.datasets.ImageFolder(os.path.join(data_dir, 'train'))
test_imgs = torchvision.datasets.ImageFolder(os.path.join(data_dir, 'test'))

torchvision.datasets.ImageFolder 是 PyTorch 中 torchvision 库的一个类，专门用于从文件夹中加载图像数据集。

os.path.join(data_dir, 'train') 这部分代码将 data_dir 路径与 'train' 文件夹拼接，构成了训练集的路径。

3.显示前 8 个正面示例和最后 8 个负图像（每个图像的大小和纵横比各不相同）

hotdogs = [train_imgs[i][0] for i in range(8)]
not_hotdogs = [train_imgs[-i - 1][0] for i in range(8)]
d2l.show_images(hotdogs + not_hotdogs, 2, 8, scale=1.4);

第一行代码：

train_imgs[i] 代表 train_imgs 中的第 i 个图像及其对应的标签，这是一个元组，其中第一个元素是图像，第二个元素是标签（类别）。
train_imgs[i][0] 表示仅获取图像部分，不获取标签。

第二行代码：

train_imgs[-i - 1] 表示从 train_imgs 末尾开始提取图像。-i - 1 意味着 i=0 时，取的是最后一张图像，i=1 时取倒数第二张图像，依此类推。
同样，train_imgs[-i - 1][0] 提取的是图像部分，而非标签。

第三行代码：

hotdogs + not_hotdogs 将两个图像列表合并，组成了一个包含 16 张图像的列表。
d2l.show_images 是 d2l 库中的函数，用于以网格形式显示图像。
- 第一个参数 hotdogs + not_hotdogs 是图像列表。
- 第二个参数 2 表示将图像显示为 2 行。
- 第三个参数 8 表示每行显示 8 张图像。
- scale=1.4 设置了图像显示的缩放比例，调节图像大小。

4.数据增强（data augmentation）和预处理操作

# Specify the means and standard deviations of the three RGB channels to
# standardize each channel
normalize = torchvision.transforms.Normalize(
    [0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

train_augs = torchvision.transforms.Compose([
    torchvision.transforms.RandomResizedCrop(224),
    torchvision.transforms.RandomHorizontalFlip(),
    torchvision.transforms.ToTensor(),
    normalize])

test_augs = torchvision.transforms.Compose([
    torchvision.transforms.Resize([256, 256]),
    torchvision.transforms.CenterCrop(224),
    torchvision.transforms.ToTensor(),
    normalize])

第一行代码：

torchvision.transforms.Normalize 是一个标准化图像的操作，用于对图像的每个通道（RGB）进行归一化处理。标准化能够让模型更快收敛，并减少输入数据的偏移量对训练的影响。
[0.485, 0.456, 0.406] 表示每个通道（R、G、B）的均值。
[0.229, 0.224, 0.225] 表示每个通道的标准差。
在标准化过程中，图像的每个像素值都会按如下公式进行变换：这一步是为了确保输入的图像各通道的数据分布在模型训练时更平稳和统一。

第二行代码：

torchvision.transforms.Compose 是一个组合操作，允许你将多个数据预处理或增强操作按顺序应用到图像上。train_augs 是专门用于训练数据的图像增强操作。

torchvision.transforms.RandomResizedCrop(224)：随机裁剪图像，并将其调整为 224×224 的大小。这是为了引入随机性，以便让模型更具鲁棒性。
torchvision.transforms.RandomHorizontalFlip()：随机水平翻转图像，有 50% 的概率进行翻转，这是一种常用的数据增强技术，目的是让模型对水平翻转的图像不敏感。
torchvision.transforms.ToTensor()：将图像从PIL格式（[0, 255] 范围的值）转换为 PyTorch 的张量格式（[0, 1] 的浮点数值）。
normalize：应用之前定义的标准化操作，对每个通道进行标准化。

第三行代码：

torchvision.transforms.Resize([256, 256])：将图像调整为 256×256 的大小。
torchvision.transforms.CenterCrop(224)：将调整后的图像从中心裁剪出一个 224×224 的区域。这样能够保持图像主体部分不变，同时统一尺寸。
torchvision.transforms.ToTensor()：将图像转换为张量。
normalize：同样应用标准化操作，确保测试集和训练集使用相同的标准化参数。

二、定义和初始化模型

1.使用在 ImageNet 数据集上预训练的 ResNet-18 作为源模型

pretrained_net = torchvision.models.resnet18(pretrained=True)

pretrained=True 表示想加载一个在大规模数据集（通常是 ImageNet 数据集）上预训练过的模型。

2.显示全连接层

pretrained_net.fc

fc 代表 ResNet 模型中的最后一个全连接层（Fully Connected Layer）

在 ImageNet 数据集中，输出的类别数是 1000 个（因为 ImageNet 有 1000 类标签）。
全连接层的输入通常是一个向量，代表通过卷积层和池化层提取的高层次特征，输出则是对应类别的概率分布（通过 softmax 层后）。

3.将其最后的全连接层替换为适应新的分类任务的全连接层

finetune_net = torchvision.models.resnet18(pretrained=True)
finetune_net.fc = nn.Linear(finetune_net.fc.in_features, 2)
nn.init.xavier_uniform_(finetune_net.fc.weight);

第一行代码：

加载预训练模型：这行代码与之前一样，加载了一个在 ImageNet 上预训练过的 ResNet-18 模型。

第二行代码：

替换全连接层：这一行代码将 ResNet-18 模型中的最后一层全连接层（fc）替换为一个新的全连接层。

第三行代码：

权重初始化：这行代码对新创建的全连接层（fc）的权重进行初始化。
nn.init.xavier_uniform_：是一种常见的权重初始化方法，基于 Xavier 均匀分布。这种初始化方法可以让权重分布均匀，并且有助于加速训练收敛。Xavier 初始化的目的是使每层输出的方差相同，从而避免梯度消失或爆炸问题。
- finetune_net.fc.weight：表示对新定义的 fc 层的权重进行初始化。

三、微调模型

1.定义训练函数

当 param_group=True 时，对输出层的参数（即全连接层 fc）使用更高的学习率进行更新。

# If `param_group=True`, the model parameters in the output layer will be
# updated using a learning rate ten times greater
def train_fine_tuning(net, learning_rate, batch_size=128, num_epochs=5,
                      param_group=True):
    train_iter = torch.utils.data.DataLoader(torchvision.datasets.ImageFolder(
        os.path.join(data_dir, 'train'), transform=train_augs),
        batch_size=batch_size, shuffle=True)
    test_iter = torch.utils.data.DataLoader(torchvision.datasets.ImageFolder(
        os.path.join(data_dir, 'test'), transform=test_augs),
        batch_size=batch_size)
    devices = d2l.try_all_gpus()
    loss = nn.CrossEntropyLoss(reduction="none")
    if param_group:
        params_1x = [param for name, param in net.named_parameters()
             if name not in ["fc.weight", "fc.bias"]]
        trainer = torch.optim.SGD([{'params': params_1x},
                                   {'params': net.fc.parameters(),
                                    'lr': learning_rate * 10}],
                                lr=learning_rate, weight_decay=0.001)
    else:
        trainer = torch.optim.SGD(net.parameters(), lr=learning_rate,
                                  weight_decay=0.001)
    d2l.train_ch13(net, train_iter, test_iter, loss, trainer, num_epochs,
                   devices)

（1）函数定义和参数

net：待训练的神经网络模型。
learning_rate：初始学习率，控制模型参数更新的步幅。
batch_size=128：每次训练时使用的批量大小，默认为 128。
num_epochs=5：训练的轮数，默认为 5。
param_group=True：如果为 True，模型输出层的参数将使用比其他层更高的学习率进行训练。

（2）加载训练和测试数据集

train_iter = torch.utils.data.DataLoader(torchvision.datasets.ImageFolder(
    os.path.join(data_dir, 'train'), transform=train_augs),
    batch_size=batch_size, shuffle=True)

test_iter = torch.utils.data.DataLoader(torchvision.datasets.ImageFolder(
    os.path.join(data_dir, 'test'), transform=test_augs),
    batch_size=batch_size)

使用 torch.utils.data.DataLoader 来加载图像数据集，分别构造训练集和测试集的迭代器。

（3）获取设备信息

devices = d2l.try_all_gpus()

devices：调用 d2l.try_all_gpus()，返回可用的 GPU 设备（通常返回多个 GPU）。如果没有可用的 GPU，则返回 CPU。

（4）定义损失函数

loss = nn.CrossEntropyLoss(reduction="none")

使用交叉熵损失函数（CrossEntropyLoss），这是分类问题中常用的损失函数，尤其是多类分类任务。
reduction="none"：表示不立即对损失值进行求和或平均，这样可以在后续的训练过程中灵活处理每个样本的损失。

（5）设置参数组与优化器

if param_group:
    params_1x = [param for name, param in net.named_parameters()
                 if name not in ["fc.weight", "fc.bias"]]
    trainer = torch.optim.SGD([{'params': params_1x},
                               {'params': net.fc.parameters(),
                                'lr': learning_rate * 10}],
                              lr=learning_rate, weight_decay=0.001)
else:
    trainer = torch.optim.SGD(net.parameters(), lr=learning_rate,
                              weight_decay=0.001)

if param_group：如果 param_group 为 True，说明需要对全连接层（fc）的参数使用更高的学习率（learning_rate * 10）。
- params_1x：通过 named_parameters() 获取模型中除了 fc.weight 和 fc.bias 的所有参数（即非输出层的参数）。
- net.fc.parameters()：访问模型的全连接层 fc 的参数（fc.weight 和 fc.bias），并对其设置一个更高的学习率。
- torch.optim.SGD：使用随机梯度下降（SGD）优化器来更新参数。它有两个参数组：
  1. params_1x：这些参数使用正常的学习率（learning_rate）。
  2. net.fc.parameters()：全连接层的参数使用较高的学习率（learning_rate * 10）。
else：如果 param_group=False，则对整个模型的所有参数使用相同的学习率。

weight_decay 是一种正则化技术，通过在损失函数中加入参数的 L2 范数，限制模型参数的增长，防止过拟合。

（6）调用训练函数

d2l.train_ch13(net, train_iter, test_iter, loss, trainer, num_epochs, devices)

d2l.train_ch13：这是 Dive into Deep Learning（d2l）库中的一个训练函数，它会使用定义好的模型、数据、损失函数和优化器进行训练。
train_iter 和 test_iter：分别是训练和测试集的迭代器。
loss：定义的损失函数（交叉熵损失）。
trainer：之前设置的优化器，用于更新模型的参数。
num_epochs：训练的轮数。
devices：训练时使用的设备（通常是 GPU 或 CPU）。