P6：Pytorch实战：好莱坞明星识别

Monty _Lee

已于 2023-06-02 22:20:55 修改

阅读量124

点赞数

分类专栏： Pytorch 文章标签： pytorch 人工智能 python

于 2023-06-01 22:31:22 首次发布

本文链接：https://blog.csdn.net/weixin_62602550/article/details/130994878

版权

Pytorch 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

🍨 本文为🔗365天深度学习训练营中的学习记录博客
🍖 原作者：K同学啊|接辅导、项目定制

1 引言

本文主要实现了构建 VGG-16 网络实现好莱坞明星的识别。

1.1 训练营要求

基础
- 保存训练过程中的最佳模型权重
- 调用官方的 VGG-16 网络框架
拔高
- 测试集准确率达到50%
- 手动搭建 VGG-16 网络框架

1.2 训练记录

使用原文章代码进行训练
数据增强
修改初始学习率为 1e-2

各个训练记录的结果与改进效果将在文章的结果分析部分详细讲解。

结果：最终 val_accuracy 达到 75.3%。

2 前期工作

前期工作中包含数据处理、划分数据集等相关操作，由于在前面的文章中都有较为详细的解释，故在此只贴出代码。

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
from torchvision import datasets, transforms

import os, PIL, pathlib,random

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

import pathlib

data_dir = '/Users/montylee/Documents/GitHub/DeepLearning/pytorch/P6/datasets/res_data'
data_dir = pathlib.Path(data_dir)

data_paths = list(data_dir.glob('*'))
classNames = [str(path).split('/')[-1] for path in data_paths]
# print(classNames)

train_dir = data_dir / 'train'
test_dir = data_dir / 'test'
# print(train_dir)
# print(test_dir)

from torchvision import datasets, transforms

train_transforms = transforms.Compose([
    transforms.Resize([224,224]),
    transforms.RandomHorizontalFlip(), # 随机翻转和旋转
    transforms.ToTensor(),
    transforms.Normalize(mean = [0.485, 0.456, 0.406], # 均值
                         std = [0.229, 0.224, 0.225]) # 方差
])

test_transforms = transforms.Compose([
    transforms.Resize([224,224]),
    transforms.ToTensor(),
    transforms.Normalize(mean = [0.485, 0.456, 0.406], # 均值
                            std = [0.229, 0.224, 0.225]) # 方差
])

train_data = datasets.ImageFolder(train_dir, transform = train_transforms)
test_data = datasets.ImageFolder(test_dir, transform = test_transforms)

train_data.class_to_idx

batch_size = 32
train_dl = torch.utils.data.DataLoader(train_data,
                                       batch_size = batch_size,
                                       shuffle = True,
                                       num_workers = 1,
                                       drop_last = True)
test_dl = torch.utils.data.DataLoader(test_data,
                                      batch_size = batch_size,
                                      shuffle = True,
                                      num_workers = 1,
                                      drop_last = True)

3 调用 VGG-16

# 调用官方的 VGG-16 模型
from torchvision.models import vgg16

print(f"Using {device} device")

model = vgg16(pretrained = True).to(device)
for param in model.parameters():
    param.requires_grad = False

model.classifier._modules['6'] = nn.Linear(4096, len(classNames))
model.to(device)
model

3.1 torchvision.models

torchvision.models 是PyTorch中提供的一个模型库，它包含了多种经典的计算机视觉模型，用于图像分类、目标检测、语义分割等任务。这些模型已经在大规模图像数据集上进行了预训练，并且可以方便地用于迁移学习或特征提取。

以下是一些常见的模型：

AlexNet：AlexNet是一个较早的深度卷积神经网络模型，用于图像分类任务。它在2012年的ImageNet大规模视觉识别挑战赛中取得了很大的突破。
VGG：VGG是由牛津大学的研究团队提出的一系列深度卷积神经网络模型。VGG模型具有非常深的网络结构，在图像分类任务中表现出色。
ResNet：ResNet是由微软研究院提出的一种深度残差网络模型。它通过引入残差连接来解决深层网络训练中的梯度消失问题，使得可以构建更深的网络结构。
Inception：Inception模型是Google团队提出的一系列网络模型，其中较知名的是InceptionV3和InceptionResNetV2。这些模型使用了Inception模块，能够同时捕捉不同尺度的特征。
DenseNet：DenseNet是一种密集连接的卷积神经网络模型，它通过将前一层的所有特征图连接到后一层，使得特征在网络中得到重复利用。
MobileNet：MobileNet是一种轻量级的卷积神经网络模型，适用于移动设备和嵌入式系统。它通过深度可分离卷积来减少参数量和计算复杂度。

以上只是其中一部分模型，torchvision.models 中还包含其他模型，如GoogLeNet、ShuffleNet、SqueezeNet等。这些模型都可以通过简单的调用进行实例化和使用，还可以加载预训练的权重进行迁移学习或特征提取。

3.2 VGG-16

VGG-16（Visual Geometry Group 16）是由牛津大学视觉几何组（Visual Geometry Group）提出的一个经典的深度卷积神经网络模型。它在2014年的ImageNet大规模视觉识别挑战赛中取得了很好的表现，并被广泛应用于图像分类和特征提取任务中。

其结构图如下，

在这里插入图片描述

VGG-16的网络结构非常简单而直观，它由16个卷积层和3个全连接层组成。具体的网络结构如下：

输入层：接受输入的图像数据。
卷积层：共有13个卷积层，每个卷积层都使用3x3的卷积核，步长为1，padding为1。这些卷积层通过不断的堆叠和池化操作来逐渐减小特征图的尺寸，并增加特征的抽象层次。
池化层：共有5个最大池化层，每个池化层的池化核大小为2x2，步长为2。池化操作用于降低特征图的空间尺寸，同时保留重要的特征。
全连接层：共有3个全连接层，每个全连接层包含4096个神经元。这些全连接层用于将高维特征映射到类别标签上。
Softmax层：最后一个全连接层的输出通过Softmax函数进行概率归一化，得到每个类别的概率分布。

VGG-16的一个重要特点是具有非常深的网络结构，共有16层（包括卷积层、池化层、全连接层和Softmax层）。相比于之前的模型，VGG-16采用了更小的卷积核和更多的卷积层，增加了网络的深度和非线性表示能力，从而提高了模型的准确性。

在使用VGG-16进行图像分类任务时，可以使用预训练的权重进行迁移学习。预训练的VGG-16模型在ImageNet数据集上进行了训练，因此具有良好的特征提取能力。通过在全连接层上进行微调或替换最后的分类层，可以根据具体的任务进行定制和训练。

较为详细的教程可参见下面的文章，

VGG16学习笔记

4 训练模型

4.1 训练函数和测试函数

# 训练函数
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)

    train_loss, train_acc = 0, 0
    for x, y in dataloader:
        x, y  = x.to(device), y.to(device)

        # Compute prediction error
        pred = model(x) # 网络输出
        loss = loss_fn(pred, y) # 计算损失

        optimizer.zero_grad() # 梯度清零
        loss.backward() # 反向传播
        optimizer.step() # 更新参数

        train_acc += (pred.argmax(1) == y).type(torch.float).sum().item()
        train_loss += loss.item()

    train_acc /= size
    train_loss /= num_batches

    return train_acc, train_loss

# 测试函数
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)

    test_loss, test_acc = 0, 0
    with torch.no_grad():
        for x, y in dataloader:
            x, y  = x.to(device), y.to(device)

            pred = model(x)
            loss = loss_fn(pred, y)

            test_loss += loss_fn(pred, y).item()
            test_acc += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_acc /= size
    test_loss /= num_batches

    return test_acc, test_loss

4.2 设置动态学习率

# 设置动态学习率

def adjust_learning_rate(optimizer, epoch, start_lr):
    # 每两个 epoch 衰减为原学习率的 0.92
    lr = start_lr * (0.92 ** (epoch // 2))

    for param_group in optimizer.param_groups:
        param_group['lr'] = lr

# 调用官方动态学习率接口
# from torch.optim.lr_scheduler import LambdaLR
# lambda1 = lambda epoch: 0.98 ** (epoch // 2)
# optimizer = torch.optim.SGD(model.parameters(), lr = 0.1)
# scheduler = LambdaLR(optimizer, lr_lambda = lambda1)

learning_rate = 1e-3
optimizer = torch.optim.SGD(model.parameters(), lr = learning_rate)

4.4 正式训练

from tqdm import tqdm

epoches = 70
loss_fn = nn.CrossEntropyLoss()
train_loss = []
train_acc = []
test_loss = []
test_acc = []

for epoch in tqdm(range(epoches)):
    model.train()
    epoch_train_acc, epoch_train_loss = train(train_dl, model, loss_fn, optimizer)

    model.eval()
    epoch_test_acc, epoch_test_loss = test(test_dl, model, loss_fn)

    train_acc.append(epoch_train_acc)
    train_loss.append(epoch_train_loss)
    test_acc.append(epoch_test_acc)
    test_loss.append(epoch_test_loss)

    template = ('Epoch:{:2d}, Train_acc:{:.1f}%, Train_loss:{:.3f}, Test_acc:{:.1f}%, Test_loss:{:.3f}')
    tqdm.write(template.format(epoch+1, epoch_train_acc*100, epoch_train_loss, epoch_test_acc*100, epoch_test_loss))

print('Done')

5 结果可视化

import matplotlib.pyplot as plt

epochs_range = range(epoches)
# print(epochs_range)

plt.figure(figsize = (12, 3))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, train_acc, label = 'Training Accuracy')
plt.plot(epochs_range, test_acc, label = 'Test Accuracy')
plt.xlim((0,20))
plt.xticks(range(0,epoches+10,10))
plt.legend()
plt.title('Training and Validation Accuracy')

plt.subplot(1, 2, 2)
plt.plot(epochs_range, train_loss, label = 'Training Loss')
plt.plot(epochs_range, test_loss, label = 'Test Loss')
plt.xlim((0,20))
plt.xticks(range(0,epoches+10,10))
plt.legend()
plt.title('Training and Validation Loss')
plt.show()

在这里插入图片描述

5.1 指定图片预测

# 指定图片预测
from PIL import Image
claeees = list(train_data.class_to_idx)

def predict_image(image_path, model, transform, classes):
    test_img = Image.open(image_path).convert('RGB')
    # plt.imshow(test_img)

    test_img_tensor = transform(test_img)
    img = test_img_tensor.to(device).unsqueeze(0)

    model.eval()
    output = model(img)

    _,pred = torch.max(output, 1)
    pred_class = classes[pred]
    print(f'预测结果：{pred_class}')

predict_image(image_path='/Users/montylee/Documents/GitHub/DeepLearning/pytorch/P5/datasets/train/nike/1 (7).jpg',
              model = model,
              transform = train_transforms,
              classes = claeees)

6 保存并加载模型

# 模型保存

PATH = './model.pth' # 保存的参数文件名
torch.save(model.state_dict(), PATH) # 将参数加载到model当中
model.load_state_dict(torch.load(PATH, map_location=device))

7 调试日记

原文代码调试，正确率达到20%就已经收敛了，效果还是挺差的
使用数据增强，正确率提升效果显著，训练了70个epoch，val_accuracy达到63%且没有收敛

附上数据增强代码链接：

文件处理：数据增强 + 统一重命名图片 + 合并两个文件夹中的图片_Monty _Lee的博客-CSDN博客
由于在修改增强数据集后，实际训练中loss下降过于缓慢，考虑修改学习率，修改为1e-2，训练速度得到较大提升，在第30个epoch就达到70%