365天深度学习训练营-第P6周:好莱坞明星识别

一、课题背景和开发环境

📌第P6周:好莱坞明星识别📌

  • 难度:夯实基础
  • 语言:Python3、Pytorch

🍺 要求:

  1. 自己搭建VGG-16网络框架
  2. 调用官方的VGG-16网络框架
  3. 如何查看模型的参数量以及相关指标

🍻 拔高(可选):

  1. 测试集准确率达到60%(难度有点大,但是这个过程可以学到不少)
  2. 手动搭建VGG-16网络框架

开发环境

  • 电脑系统:Windows 10
  • 语言环境:Python 3.8.2
  • 编译器:无(直接在cmd.exe内运行)
  • 深度学习环境:Pytorch
  • 显卡及显存:NVIDIA GeForce GTX 1660 Ti 12G
  • CUDA版本:Release 10.2, V10.2.89(cmd输入nvcc -Vnvcc --version指令可查看)
  • 数据:🔗K同学啊的百度网盘

二、前期准备

1.设置GPU

如果设备上支持GPU就使用GPU,否则使用CPU

import torch
import torchvision

if __name__=='__main__':
    ''' 设置GPU '''
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print("Using {} device\n".format(device))
Using cuda device

2.导入数据并划分数据集

''' 读取本地数据集并划分训练集与测试集 '''
def localDataset(data_dir):
    data_dir = pathlib.Path(data_dir)
    
    # 读取本地数据集
    data_paths = list(data_dir.glob('*'))
    classeNames = [str(path).split("\\")[1] for path in data_paths]
    
    # 关于transforms.Compose的更多介绍可以参考:https://blog.csdn.net/qq_38251616/article/details/124878863
    train_transforms = torchvision.transforms.Compose([
        torchvision.transforms.Resize([224, 224]),  # 将输入图片resize成统一尺寸
        # torchvision.transforms.RandomHorizontalFlip(), # 随机水平翻转
        torchvision.transforms.ToTensor(),          # 将PIL Image或numpy.ndarray转换为tensor,并归一化到[0,1]之间
        torchvision.transforms.Normalize(           # 标准化处理-->转换为标准正太分布(高斯分布),使模型更容易收敛
            mean=[0.485, 0.456, 0.406], 
            std=[0.229, 0.224, 0.225])  # 其中 mean=[0.485,0.456,0.406]与std=[0.229,0.224,0.225] 从数据集中随机抽样计算得到的。
    ])
    
    total_dataset = torchvision.datasets.ImageFolder(data_dir,transform=train_transforms)
    print(total_dataset, '\n')
    print(total_dataset.class_to_idx, '\n')
    
    # 按比例划分训练集和测试集
    train_size = int(0.8 * len(total_dataset))
    test_size  = len(total_dataset) - train_size
    print('train_size', train_size, ', test_size', test_size, '\n')
    train_dataset, test_dataset = torch.utils.data.random_split(total_dataset, [train_size, test_size])
    
    return classeNames, train_dataset, test_dataset


if __name__=='__main__':
    ''' 加载数据 '''
    root = 'data'
    output = 'output'
    data_dir = os.path.join(root, '48-data')
    batch_size = 32
    classeNames, train_ds, test_ds = localDataset(data_dir)
    ''' 图片的类别数 '''
    num_classes = len(classeNames)
    print('num_classes', num_classes)
Dataset ImageFolder
    Number of datapoints: 1800
    Root location: data\48-data
    StandardTransform
Transform: Compose(
               Resize(size=[224, 224], interpolation=bilinear)
               ToTensor()
               Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
           )

{'Angelina Jolie': 0, 'Brad Pitt': 1, 'Denzel Washington': 2, 'Hugh Jackman': 3, 'Jennifer Lawrence': 4, 'Johnny Depp': 5, 'Kate Winslet': 6, 'Leonardo DiCaprio': 7, 'Megan Fox': 8, 'Natalie Portman': 9, 'Nicole Kidman': 10, 'Robert Downey Jr': 11, 'Sandra Bullock': 12, 'Scarlett Johansson': 13, 'Tom Cruise': 14, 'Tom Hanks': 15, 'Will Smith': 16}

train_size 1440 , test_size 360

num_classes 17

3.加载数据

''' 加载数据,并设置batch_size '''
def loadData(train_ds, test_ds, batch_size=32, root='', show_flag=False):
    # 从 train_ds 加载训练集
    train_dl = torch.utils.data.DataLoader(train_ds,
                                           batch_size=batch_size,
                                           shuffle=True,
                                           num_workers=1)
    # 从 test_ds 加载测试集
    test_dl  = torch.utils.data.DataLoader(test_ds,
                                           batch_size=batch_size,
                                           shuffle=True,
                                           num_workers=1)
    
    # 取一个批次查看数据格式
    # 数据的shape为:[batch_size, channel, height, weight]
    # 其中batch_size为自己设定,channel,height和weight分别是图片的通道数,高度和宽度。
    for X, y in test_dl:
        print('Shape of X [N, C, H, W]: ', X.shape)
        print('Shape of y: ', y.shape, y.dtype, '\n')
        break
    
    imgs, labels = next(iter(train_dl))
    print('Image shape: ', imgs.shape, '\n')
    # torch.Size([32, 3, 224, 224])  # 所有数据集中的图像都是224*224的RGB图
    displayData(imgs, root, show_flag)
    return train_dl, test_dl


''' 数据可视化 '''
def displayData(imgs, root='', flag=False):
    # 指定图片大小,图像大小为20宽、5高的绘图(单位为英寸inch)
    plt.figure('Data Visualization', figsize=(20, 5)) 
    for i, imgs in enumerate(imgs[:20]):
        # 维度顺序调整 [3, 224, 224]->[224, 224, 3]
        npimg = imgs.numpy().transpose((1, 2, 0))
        # 将整个figure分成2行10列,绘制第i+1个子图。
        plt.subplot(2, 10, i+1)
        plt.imshow(npimg)  # cmap=plt.cm.binary
        plt.axis('off')
    plt.savefig(os.path.join(root, 'DatasetDisplay.png'))
    if flag:
        plt.show()
    else:
        plt.close('all')


batch_size = 32
train_dl, test_dl = loadData(train_ds, test_ds, batch_size, root, True)
Shape of X [N, C, H, W]:  torch.Size([32, 3, 224, 224])
Shape of y:  torch.Size([32]) torch.int64

Image shape:  torch.Size([32, 3, 224, 224])

数据可视化


三、调用官方的VGG-16模型

from torchvision.models import vgg16


if __name__=='__main__':
    ''' 设置GPU '''
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print("Using {} device\n".format(device))
    
    ''' 调用官方的VGG-16模型 '''
    # 加载预训练模型,并且对模型进行微调
    model = vgg16(pretrained = True).to(device)  # 加载预训练的vgg16模型
    for param in model.parameters():
        param.requires_grad = False # 冻结模型的参数,这样子在训练的时候只训练最后一层的参数
    # 修改classifier模块的第6层(即:(6): Linear(in_features=4096, out_features=2, bias=True))
    # 注意查看我们下方打印出来的模型
    model.classifier._modules['6'] = nn.Linear(4096, 17) # 修改vgg16模型中最后一层全连接层,输出目标类别个数
    model.to(device)
    
    ''' 调用并将模型转移到GPU中(我们模型运行均在GPU中进行) '''
    # model = Model().to(device)
    ''' 显示网络结构 '''
    summary(model)
    print(model)
Using cuda device

Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to C:\Users\Desktop/.cache\torch\hub\checkpoints\vgg16-397923af.pth
100%|███████████████████████████████████████████████████████████████████████████████| 528M/528M [03:08<00:00, 2.94MB/s]
=================================================================
Layer (type:depth-idx)                   Param #
=================================================================
VGG                                      --
├─Sequential: 1-1                        --
│    └─Conv2d: 2-1                       (1,792)
│    └─ReLU: 2-2                         --
│    └─Conv2d: 2-3                       (36,928)
│    └─ReLU: 2-4                         --
│    └─MaxPool2d: 2-5                    --
│    └─Conv2d: 2-6                       (73,856)
│    └─ReLU: 2-7                         --
│    └─Conv2d: 2-8                       (147,584)
│    └─ReLU: 2-9                         --
│    └─MaxPool2d: 2-10                   --
│    └─Conv2d: 2-11                      (295,168)
│    └─ReLU: 2-12                        --
│    └─Conv2d: 2-13                      (590,080)
│    └─ReLU: 2-14                        --
│    └─Conv2d: 2-15                      (590,080)
│    └─ReLU: 2-16                        --
│    └─MaxPool2d: 2-17                   --
│    └─Conv2d: 2-18                      (1,180,160)
│    └─ReLU: 2-19                        --
│    └─Conv2d: 2-20                      (2,359,808)
│    └─ReLU: 2-21                        --
│    └─Conv2d: 2-22                      (2,359,808)
│    └─ReLU: 2-23                        --
│    └─MaxPool2d: 2-24                   --
│    └─Conv2d: 2-25                      (2,359,808)
│    └─ReLU: 2-26                        --
│    └─Conv2d: 2-27                      (2,359,808)
│    └─ReLU: 2-28                        --
│    └─Conv2d: 2-29                      (2,359,808)
│    └─ReLU: 2-30                        --
│    └─MaxPool2d: 2-31                   --
├─AdaptiveAvgPool2d: 1-2                 --
├─Sequential: 1-3                        --
│    └─Linear: 2-32                      (102,764,544)
│    └─ReLU: 2-33                        --
│    └─Dropout: 2-34                     --
│    └─Linear: 2-35                      (16,781,312)
│    └─ReLU: 2-36                        --
│    └─Dropout: 2-37                     --
│    └─Linear: 2-38                      69,649
=================================================================
Total params: 134,330,193
Trainable params: 69,649
Non-trainable params: 134,260,544
=================================================================
VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (18): ReLU(inplace=True)
    (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (20): ReLU(inplace=True)
    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (22): ReLU(inplace=True)
    (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (25): ReLU(inplace=True)
    (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (27): ReLU(inplace=True)
    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (29): ReLU(inplace=True)
    (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
  (classifier): Sequential(
    (0): Linear(in_features=25088, out_features=4096, bias=True)
    (1): ReLU(inplace=True)
    (2): Dropout(p=0.5, inplace=False)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU(inplace=True)
    (5): Dropout(p=0.5, inplace=False)
    (6): Linear(in_features=4096, out_features=17, bias=True)
  )
)

四、训练模型

1.编写训练函数

optimizer.zero_grad()
loss.backward()
optimizer.step()
关于以上三个函数,我在之前的文章中有做说明,这里不再赘述

# 训练循环
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)  # 训练集的大小
    num_batches = len(dataloader)   # 批次数目

    train_loss, train_acc = 0, 0  # 初始化训练损失和正确率
    
    for X, y in dataloader:  # 获取图片及其标签
        X, y = X.to(device), y.to(device)
        
        # 计算预测误差
        pred = model(X)          # 网络输出
        loss = loss_fn(pred, y)  # 计算网络输出和真实值之间的差距,targets为真实值,计算二者差值即为损失
        
        # 反向传播
        optimizer.zero_grad()  # grad属性归零
        loss.backward()        # 反向传播
        optimizer.step()       # 每一步自动更新
        
        # 记录acc与loss
        train_acc  += (pred.argmax(1) == y).type(torch.float).sum().item()
        train_loss += loss.item()
            
    train_acc  /= size
    train_loss /= num_batches

    return train_acc, train_loss

2.编写测试函数

测试函数和训练函数大致相同,但是由于不进行梯度下降对网络权重进行更新,所以不需要传入优化器

def test (dataloader, model, loss_fn):
    size        = len(dataloader.dataset)  # 测试集的大小
    num_batches = len(dataloader)          # 批次数目,(size/batch_size,向上取整)
    test_loss, test_acc = 0, 0
    
    # 当不进行训练时,停止梯度更新,节省计算内存消耗
    with torch.no_grad():
        for imgs, target in dataloader:
            imgs, target = imgs.to(device), target.to(device)
            
            # 计算loss
            target_pred = model(imgs)
            loss        = loss_fn(target_pred, target)
            
            test_loss += loss.item()
            test_acc  += (target_pred.argmax(1) == target).type(torch.float).sum().item()

    test_acc  /= size
    test_loss /= num_batches

    return test_acc, test_loss

3.设置动态学习率

''' 自定义设置动态学习率 '''
def adjust_learning_rate(optimizer, epoch, start_lr):
    # 每 2 个epoch衰减到原来的 0.92
    lr = start_lr * (0.92 ** (epoch // 2))
    for param_group in optimizer.param_groups:
        param_group['lr'] = lr


learn_rate = 1e-4 # 初始学习率
optimizer  = torch.optim.SGD(model.parameters(), lr=learn_rate)

for epoch in range(start_epoch, epochs):
    # 更新学习率(使用自定义学习率时使用)
    adjust_learning_rate(optimizer, epoch, learn_rate)
learn_rate = 1e-4 # 初始学习率
optimizer  = torch.optim.SGD(model.parameters(), lr=learn_rate)

# 调用官方动态学习率接口时使用
lambda1 = lambda epoch: 0.92 ** (epoch // 4)
scheduler   = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=lambda1)  # 选定调整方法

for epoch in range(start_epoch, epochs):
    # 更新学习率(调用官方动态学习率接口时使用)
    scheduler.step()

4.正式训练&保存最优模型

model.train()
model.eval()

关于以上两个个函数,我在之前的文章中有做说明,这里不再赘述

import time

''' 设置超参数 '''
start_epoch = 0
epochs      = 50
learn_rate  = 1e-4 # 初始学习率
loss_fn     = nn.CrossEntropyLoss()  # 创建损失函数
optimizer   = torch.optim.SGD(model.parameters(), lr=learn_rate)
#optimizer   = torch.optim.Adam(model.parameters(),lr=learn_rate)
# 调用官方动态学习率接口时使用
lambda1 = lambda epoch: 0.92 ** (epoch // 4)
scheduler   = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=lambda1)  # 选定调整方法

train_loss  = []
train_acc   = []
test_loss   = []
test_acc    = []
epoch_best_acc = 0

''' 加载之前保存的模型 '''
if not os.path.exists(output) or not os.path.isdir(output):
    os.makedirs(output)
if start_epoch > 0:
    resumeFile = os.path.join(output, 'epoch'+str(start_epoch)+'.pkl')
    if not os.path.exists(resumeFile) or not os.path.isfile(resumeFile):
        start_epoch = 0
    else:
        model.load_state_dict(torch.load(resumeFile))  # 加载模型参数

''' 开始训练模型 '''
print('\nStart training...')
best_model = None
for epoch in range(start_epoch, epochs):
    # 更新学习率(使用自定义学习率时使用)
    # adjust_learning_rate(optimizer, epoch, learn_rate)
    
    model.train()
    epoch_train_acc, epoch_train_loss = train(train_dl, model, loss_fn, optimizer)
    scheduler.step() # 更新学习率(调用官方动态学习率接口时使用)
    
    model.eval()
    epoch_test_acc, epoch_test_loss = test(test_dl, model, loss_fn)
    
    train_acc.append(epoch_train_acc)
    train_loss.append(epoch_train_loss)
    test_acc.append(epoch_test_acc)
    test_loss.append(epoch_test_loss)
    
    # 获取当前的学习率
    lr = optimizer.state_dict()['param_groups'][0]['lr']
    
    template = ('Epoch:{:2d}, Train_acc:{:.1f}%, Train_loss:{:.3f}, Test_acc:{:.1f}%, Test_loss:{:.3f}, Lr:{:.2E}')
    print(time.strftime('[%Y-%m-%d %H:%M:%S]'), template.format(epoch+1, epoch_train_acc*100, epoch_train_loss, epoch_test_acc*100, epoch_test_loss, lr))
    
    # 保存最佳模型
    if epoch_test_acc>epoch_best_acc:
        epoch_best_acc = epoch_test_acc
        best_model = copy.deepcopy(model)
        print(('acc = {:.1f}%, saving model to best.pkl').format(epoch_best_acc*100))
        saveFile = os.path.join(output, 'best.pkl')
        torch.save(model.state_dict(), saveFile)
print('Done\n')
Start training...
[2022-11-02 15:54:51] Epoch: 1, Train_acc:5.3%, Train_loss:2.936, Test_acc:6.7%, Test_loss:2.852, Lr:1.00E-04
acc = 6.7%, saving model to best.pkl
[2022-11-02 15:55:14] Epoch: 2, Train_acc:6.1%, Train_loss:2.897, Test_acc:7.5%, Test_loss:2.833, Lr:1.00E-04
acc = 7.5%, saving model to best.pkl
[2022-11-02 15:55:36] Epoch: 3, Train_acc:7.8%, Train_loss:2.873, Test_acc:8.6%, Test_loss:2.793, Lr:1.00E-04
acc = 8.6%, saving model to best.pkl
[2022-11-02 15:56:00] Epoch: 4, Train_acc:8.5%, Train_loss:2.825, Test_acc:10.3%, Test_loss:2.767, Lr:9.20E-05
acc = 10.3%, saving model to best.pkl
[2022-11-02 15:56:22] Epoch: 5, Train_acc:10.6%, Train_loss:2.784, Test_acc:10.8%, Test_loss:2.732, Lr:9.20E-05
acc = 10.8%, saving model to best.pkl
[2022-11-02 15:56:45] Epoch: 6, Train_acc:10.6%, Train_loss:2.772, Test_acc:14.7%, Test_loss:2.714, Lr:9.20E-05
acc = 14.7%, saving model to best.pkl
[2022-11-02 15:57:07] Epoch: 7, Train_acc:11.0%, Train_loss:2.775, Test_acc:15.6%, Test_loss:2.702, Lr:9.20E-05
acc = 15.6%, saving model to best.pkl
[2022-11-02 15:57:30] Epoch: 8, Train_acc:11.8%, Train_loss:2.725, Test_acc:15.0%, Test_loss:2.672, Lr:8.46E-05
[2022-11-02 15:57:45] Epoch: 9, Train_acc:12.2%, Train_loss:2.717, Test_acc:16.4%, Test_loss:2.671, Lr:8.46E-05
acc = 16.4%, saving model to best.pkl
[2022-11-02 15:58:08] Epoch:10, Train_acc:11.8%, Train_loss:2.715, Test_acc:16.9%, Test_loss:2.642, Lr:8.46E-05
acc = 16.9%, saving model to best.pkl
[2022-11-02 15:58:30] Epoch:11, Train_acc:15.5%, Train_loss:2.671, Test_acc:17.5%, Test_loss:2.632, Lr:8.46E-05
acc = 17.5%, saving model to best.pkl
[2022-11-02 15:58:53] Epoch:12, Train_acc:14.1%, Train_loss:2.661, Test_acc:17.2%, Test_loss:2.618, Lr:7.79E-05
[2022-11-02 15:59:07] Epoch:13, Train_acc:14.6%, Train_loss:2.650, Test_acc:17.2%, Test_loss:2.607, Lr:7.79E-05
[2022-11-02 15:59:22] Epoch:14, Train_acc:16.2%, Train_loss:2.637, Test_acc:17.2%, Test_loss:2.602, Lr:7.79E-05
[2022-11-02 15:59:37] Epoch:15, Train_acc:16.5%, Train_loss:2.610, Test_acc:17.5%, Test_loss:2.583, Lr:7.79E-05
[2022-11-02 15:59:52] Epoch:16, Train_acc:17.1%, Train_loss:2.607, Test_acc:17.8%, Test_loss:2.571, Lr:7.16E-05
acc = 17.8%, saving model to best.pkl
[2022-11-02 16:00:14] Epoch:17, Train_acc:17.1%, Train_loss:2.592, Test_acc:17.8%, Test_loss:2.561, Lr:7.16E-05
[2022-11-02 16:00:29] Epoch:18, Train_acc:15.3%, Train_loss:2.587, Test_acc:18.1%, Test_loss:2.565, Lr:7.16E-05
acc = 18.1%, saving model to best.pkl
[2022-11-02 16:00:52] Epoch:19, Train_acc:16.7%, Train_loss:2.573, Test_acc:18.1%, Test_loss:2.545, Lr:7.16E-05
[2022-11-02 16:01:06] Epoch:20, Train_acc:16.2%, Train_loss:2.574, Test_acc:18.1%, Test_loss:2.545, Lr:6.59E-05
[2022-11-02 16:01:21] Epoch:21, Train_acc:17.2%, Train_loss:2.559, Test_acc:17.5%, Test_loss:2.533, Lr:6.59E-05
[2022-11-02 16:01:35] Epoch:22, Train_acc:17.3%, Train_loss:2.556, Test_acc:17.8%, Test_loss:2.516, Lr:6.59E-05
[2022-11-02 16:01:50] Epoch:23, Train_acc:15.8%, Train_loss:2.553, Test_acc:17.8%, Test_loss:2.497, Lr:6.59E-05
[2022-11-02 16:02:04] Epoch:24, Train_acc:16.7%, Train_loss:2.533, Test_acc:17.8%, Test_loss:2.503, Lr:6.06E-05
[2022-11-02 16:02:19] Epoch:25, Train_acc:17.4%, Train_loss:2.522, Test_acc:17.8%, Test_loss:2.505, Lr:6.06E-05
[2022-11-02 16:02:33] Epoch:26, Train_acc:17.5%, Train_loss:2.517, Test_acc:17.8%, Test_loss:2.494, Lr:6.06E-05
[2022-11-02 16:02:48] Epoch:27, Train_acc:19.4%, Train_loss:2.509, Test_acc:17.8%, Test_loss:2.489, Lr:6.06E-05
[2022-11-02 16:03:03] Epoch:28, Train_acc:18.0%, Train_loss:2.495, Test_acc:18.1%, Test_loss:2.490, Lr:5.58E-05
[2022-11-02 16:03:17] Epoch:29, Train_acc:18.1%, Train_loss:2.515, Test_acc:18.1%, Test_loss:2.477, Lr:5.58E-05
[2022-11-02 16:03:32] Epoch:30, Train_acc:20.1%, Train_loss:2.479, Test_acc:18.1%, Test_loss:2.480, Lr:5.58E-05
[2022-11-02 16:03:47] Epoch:31, Train_acc:17.6%, Train_loss:2.479, Test_acc:18.3%, Test_loss:2.456, Lr:5.58E-05
acc = 18.3%, saving model to best.pkl
[2022-11-02 16:04:11] Epoch:32, Train_acc:18.5%, Train_loss:2.469, Test_acc:18.6%, Test_loss:2.481, Lr:5.13E-05
acc = 18.6%, saving model to best.pkl
[2022-11-02 16:04:34] Epoch:33, Train_acc:18.8%, Train_loss:2.483, Test_acc:18.6%, Test_loss:2.451, Lr:5.13E-05
[2022-11-02 16:04:49] Epoch:34, Train_acc:18.9%, Train_loss:2.483, Test_acc:18.9%, Test_loss:2.461, Lr:5.13E-05
acc = 18.9%, saving model to best.pkl
[2022-11-02 16:05:12] Epoch:35, Train_acc:19.4%, Train_loss:2.483, Test_acc:18.9%, Test_loss:2.432, Lr:5.13E-05
[2022-11-02 16:05:26] Epoch:36, Train_acc:18.5%, Train_loss:2.456, Test_acc:19.2%, Test_loss:2.449, Lr:4.72E-05
acc = 19.2%, saving model to best.pkl
[2022-11-02 16:05:50] Epoch:37, Train_acc:19.4%, Train_loss:2.465, Test_acc:19.2%, Test_loss:2.426, Lr:4.72E-05
[2022-11-02 16:06:04] Epoch:38, Train_acc:18.4%, Train_loss:2.449, Test_acc:19.2%, Test_loss:2.435, Lr:4.72E-05
[2022-11-02 16:06:19] Epoch:39, Train_acc:19.7%, Train_loss:2.430, Test_acc:19.2%, Test_loss:2.444, Lr:4.72E-05
[2022-11-02 16:06:34] Epoch:40, Train_acc:20.7%, Train_loss:2.446, Test_acc:19.4%, Test_loss:2.429, Lr:4.34E-05
acc = 19.4%, saving model to best.pkl
[2022-11-02 16:06:57] Epoch:41, Train_acc:20.1%, Train_loss:2.444, Test_acc:19.4%, Test_loss:2.415, Lr:4.34E-05
[2022-11-02 16:07:11] Epoch:42, Train_acc:18.8%, Train_loss:2.449, Test_acc:20.0%, Test_loss:2.414, Lr:4.34E-05
acc = 20.0%, saving model to best.pkl
[2022-11-02 16:07:33] Epoch:43, Train_acc:20.7%, Train_loss:2.425, Test_acc:20.8%, Test_loss:2.432, Lr:4.34E-05
acc = 20.8%, saving model to best.pkl
[2022-11-02 16:07:56] Epoch:44, Train_acc:18.5%, Train_loss:2.414, Test_acc:20.8%, Test_loss:2.399, Lr:4.00E-05
[2022-11-02 16:08:11] Epoch:45, Train_acc:20.3%, Train_loss:2.419, Test_acc:20.8%, Test_loss:2.425, Lr:4.00E-05
[2022-11-02 16:08:25] Epoch:46, Train_acc:20.9%, Train_loss:2.414, Test_acc:21.4%, Test_loss:2.400, Lr:4.00E-05
acc = 21.4%, saving model to best.pkl
[2022-11-02 16:08:48] Epoch:47, Train_acc:20.8%, Train_loss:2.413, Test_acc:21.4%, Test_loss:2.388, Lr:4.00E-05
[2022-11-02 16:09:02] Epoch:48, Train_acc:19.9%, Train_loss:2.419, Test_acc:21.9%, Test_loss:2.402, Lr:3.68E-05
acc = 21.9%, saving model to best.pkl
[2022-11-02 16:09:25] Epoch:49, Train_acc:21.0%, Train_loss:2.407, Test_acc:21.9%, Test_loss:2.400, Lr:3.68E-05
[2022-11-02 16:09:39] Epoch:50, Train_acc:20.2%, Train_loss:2.396, Test_acc:22.2%, Test_loss:2.384, Lr:3.68E-05
acc = 22.2%, saving model to best.pkl
Done

最终结果,最优模型(Epoch:50的结果)的训练集准确率达到20.2%,测试集准确率达到22.2%


五、结果可视化

''' 结果可视化 '''
def displayResult(train_acc, test_acc, train_loss, test_loss, start_epoch, epochs, output=''):
    # 隐藏警告
    warnings.filterwarnings("ignore")                # 忽略警告信息
    plt.rcParams['font.sans-serif']    = ['SimHei']  # 用来正常显示中文标签
    plt.rcParams['axes.unicode_minus'] = False       # 用来正常显示负号
    plt.rcParams['figure.dpi']         = 100         # 分辨率
    
    epochs_range = range(start_epoch, epochs)
    
    plt.figure('Result Visualization', figsize=(12, 3))
    plt.subplot(1, 2, 1)
    
    plt.plot(epochs_range, train_acc, label='Training Accuracy')
    plt.plot(epochs_range, test_acc, label='Test Accuracy')
    plt.legend(loc='lower right')
    plt.title('Training and Validation Accuracy')
    
    plt.subplot(1, 2, 2)
    plt.plot(epochs_range, train_loss, label='Training Loss')
    plt.plot(epochs_range, test_loss, label='Test Loss')
    plt.legend(loc='upper right')
    plt.title('Training and Validation Loss')
    plt.savefig(os.path.join(output, 'AccuracyLoss.png'))
    plt.show()

''' 绘制准确率&损失率曲线图 '''
displayResult(train_acc, test_acc, train_loss, test_loss, start_epoch, epochs, output)

结果可视化


六、加载模型&指定图片进行预测

''' 预测函数 '''
def predict(model, img_path):
    img = Image.open(img_path)
    test_transforms = torchvision.transforms.Compose([
        torchvision.transforms.Resize([224, 224]),  # 将输入图片resize成统一尺寸
        torchvision.transforms.ToTensor(),          # 将PIL Image或numpy.ndarray转换为tensor,并归一化到[0,1]之间
        torchvision.transforms.Normalize(           # 标准化处理-->转换为标准正太分布(高斯分布),使模型更容易收敛
            mean=[0.485, 0.456, 0.406], 
            std=[0.229, 0.224, 0.225])  # 其中 mean=[0.485,0.456,0.406]与std=[0.229,0.224,0.225] 从数据集中随机抽样计算得到的。
    ])
    img = test_transforms(img)
    img = img.to(device).unsqueeze(0)
    output = model(img)
    #print(output.argmax(1))
    
    _, indices = torch.max(output, 1)
    percentage = torch.nn.functional.softmax(output, dim=1)[0] * 100
    perc = percentage[int(indices)].item()
    result = classeNames[indices]
    print('predicted:', result, perc)


if __name__=='__main__':
    classeNames = list({'Angelina Jolie': 0, 'Brad Pitt': 1, 'Denzel Washington': 2, 'Hugh Jackman': 3, 'Jennifer Lawrence': 4, 'Johnny Depp': 5, 'Kate Winslet': 6, 'Leonardo DiCaprio': 7, 'Megan Fox': 8, 'Natalie Portman': 9, 'Nicole Kidman': 10, 'Robert Downey Jr': 11, 'Sandra Bullock': 12, 'Scarlett Johansson': 13, 'Tom Cruise': 14, 'Tom Hanks': 15, 'Will Smith': 16})
    num_classes = len(classeNames)
    
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print("Using {} device\n".format(device))
    
    #model = Model().to(device)
    model = vgg16().to(device)  # 加载官方的vgg16模型
    for param in model.parameters():
        param.requires_grad = False # 冻结模型的参数,这样子在训练的时候只训练最后一层的参数
    # 修改classifier模块的第6层(即:(6): Linear(in_features=4096, out_features=2, bias=True))
    # 注意查看我们下方打印出来的模型
    model.classifier._modules['6'] = nn.Linear(4096,len(classeNames)) # 修改vgg16模型中最后一层全连接层,输出目标类别个数
    model.to(device)
    model.load_state_dict(torch.load(os.path.join('output', 'best.pkl')))
    model.eval()
    
    img_path = 'data/48-data/Robert Downey Jr/013_9e49009e.jpg'
    predict(model, img_path)
Using cuda device

predicted: Leonardo DiCaprio 13.023811340332031

当前结果预测出错,输入图片为 小罗伯特唐尼,网络输出结果为 莱纳昂多,且置信度仅有13%


七、手动搭建VGG-16网络框架

参考代码日志中打印出来的VGG-16网络,以下是手动编写复现的vgg-16网络框架。下方网络结构经测试和前文中用的官方网络一致。

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.sequ1=nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1),     # 64*224*224
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=1),    # 64*224*224
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False), # 64*112*112
            nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),   # 128*112*112
            nn.ReLU(inplace=True),
            nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=1),  # 128*112*112
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False), # 128*56*56
            nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1),  # 256*56*56
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1),  # 256*56*56
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1),  # 256*56*56
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False), # 256*28*28
            nn.Conv2d(256, 512, kernel_size=3, stride=1, padding=1),  # 512*28*28
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1),  # 512*28*28
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1),  # 512*28*28
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False), # 512*14*14
            nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1),  # 512*14*14
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1),  # 512*14*14
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1),  # 512*14*14
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)  # 512*7*7
        )
        self.pool2=nn.AdaptiveAvgPool2d(output_size=(7, 7))    # 512*7*7
        self.sequ3=nn.Sequential(
            nn.Linear(in_features=25088, out_features=4096, bias=True),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5, inplace=False),
            nn.Linear(in_features=4096, out_features=4096, bias=True),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5, inplace=False),
            nn.Linear(in_features=4096, out_features=17, bias=True)
        )
    
    def forward(self, x):
       x = self.sequ1(x)
       x = self.pool2(x)
       x = self.sequ3(x)
       
       return x

八、总结

  • 本次课题的拔高目标(准确率达到60%)尚未完全实现。观察loss和acc曲线可以看出,模型的准确率确实在稳步增长中,但是仅50轮次的训练不足以使模型达到更高的精度。

  • 初步分析有以下几点原因可能导致模型精度无法快速增长:(仅个人看法,不一定正确)
    – 网络相对较前几次课题更复杂了,参数更多,因此少量批次的训练无法导出很好的效果
    – 分类类别相对前几次增加了,但每类的数据样本量(100张)不是很多
    – 我们训练的是时候将官方的VGG-16预训练模型的大部分参数(主要是特征提取部分的网络参数)进行了冻结,这部分参数无法随着我们的训练过程进行更新

  • 这边尝试了用前几次的简单CNN模型进行训练,在第50轮时准确率已经能稳定在50%以上。可见越是复杂的网络越需要深度迭代训练才行。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值