P6 VGG16识别人脸

我的环境:

  • 语言环境:python 3.8

  • 编译器:jupyter notebook

  • 深度学习环境:Pytorch

    torch == 2.1.0+cpu

    torchvision == 0.16.0+cpu

一、准备工作

1. 导入库函数

import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision
from torchvision import transforms, datasets

import os,PIL,pathlib,random

2.加载数据

● 第一步:使用pathlib.Path()函数将字符串类型的文件夹路径转换为pathlib.Path对象。

● 第二步:使用glob()方法获取data_dir路径下的所有文件路径,并以列表形式存储在data_paths中。

● 第三步:通过split()函数对data_paths中的每个文件路径执行分割操作,获得各个文件所属的类别名称,并存储在classNames中

● 第四步:打印classNames列表,显示每个文件所属的类别名称。

data_dir = "D:/BaiduNetdiskDownload/datasets/t6"
data_dir = pathlib.Path(data_dir)
data_paths = list(data_dir.glob('*/'))
classNames = [str(path).split("\\")[4] for path in data_paths]
classNames
['Angelina Jolie',
 'Brad Pitt',
 'Denzel Washington',
 'Hugh Jackman',
 'Jennifer Lawrence',
 'Johnny Depp',
 'Kate Winslet',
 'Leonardo DiCaprio',
 'Megan Fox',
 'Natalie Portman',
 'Nicole Kidman',
 'Robert Downey Jr',
 'Sandra Bullock',
 'Scarlett Johansson',
 'Tom Cruise',
 'Tom Hanks',
 'Will Smith']

3. 数据加载(设置batchsize和取样等功能)

# 关于transforms.Compose的更多介绍可以参考:https://blog.csdn.net/qq_38251616/article/details/124878863
train_transforms = transforms.Compose([
    transforms.Resize([224, 224]),  # 将输入图片resize成统一尺寸
    transforms.ToTensor(),          # 将PIL Image或numpy.ndarray转换为tensor,并归一化到[0,1]之间
    transforms.Normalize(           # 标准化处理-->转换为标准正太分布(高斯分布),使模型更容易收敛
        mean=[0.485, 0.456, 0.406], 
        std=[0.229, 0.224, 0.225])  # 其中 mean=[0.485,0.456,0.406]与std=[0.229,0.224,0.225] 从数据集中随机抽样计算得到的。
])

total_data = datasets.ImageFolder("D:/BaiduNetdiskDownload/datasets/t6",transform=train_transforms)
total_data
Dataset ImageFolder
    Number of datapoints: 1800
    Root location: D:/BaiduNetdiskDownload/datasets/t6
    StandardTransform
Transform: Compose(
               Resize(size=[224, 224], interpolation=bilinear, max_size=None, antialias=warn)
               ToTensor()
               Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
           )
total_data.class_to_idx
{'Angelina Jolie': 0,
 'Brad Pitt': 1,
 'Denzel Washington': 2,
 'Hugh Jackman': 3,
 'Jennifer Lawrence': 4,
 'Johnny Depp': 5,
 'Kate Winslet': 6,
 'Leonardo DiCaprio': 7,
 'Megan Fox': 8,
 'Natalie Portman': 9,
 'Nicole Kidman': 10,
 'Robert Downey Jr': 11,
 'Sandra Bullock': 12,
 'Scarlett Johansson': 13,
 'Tom Cruise': 14,
 'Tom Hanks': 15,
 'Will Smith': 16}
train_size = int(0.8 * len(total_data))
test_size  = len(total_data) - train_size
train_dataset, test_dataset = torch.utils.data.random_split(total_data, [train_size, test_size])
train_dataset, test_dataset
(<torch.utils.data.dataset.Subset at 0x20f2c83fca0>,
 <torch.utils.data.dataset.Subset at 0x20f2c83fd60>)
batch_size = 32

train_dl = torch.utils.data.DataLoader(train_dataset, 
                                       batch_size=batch_size, 
                                       shuffle=True,
                                       num_workers=1)

test_dl  = torch.utils.data.DataLoader(test_dataset, 
                                       batch_size=batch_size,
                                       shuffle=True,
                                       num_workers=1)
for X, y in test_dl:
    print("Shape of X [N, C, H, W]: ", X.shape)
    print("Shape of y: ", y.shape, y.dtype)
    break
Shape of X [N, C, H, W]:  torch.Size([32, 3, 224, 224])
Shape of y:  torch.Size([32]) torch.int64

二、CNN网络配置、编译、训练

在这里插入图片描述

VGG16包含13个卷积层+3个全连接层。5个池化层。

1. 模型建立

from torchvision.models import vgg16

# 加载预训练模型,并且对模型进行微调
model = vgg16(pretrained = True)# 加载预训练的vgg16模型

for param in model.parameters():
    param.requires_grad = False # 冻结模型的参数,这样子在训练的时候只训练最后一层的参数

# 修改classifier模块的第6层(即:(6): Linear(in_features=4096, out_features=2, bias=True))
# 注意查看我们下方打印出来的模型
model.classifier._modules['6'] = nn.Linear(4096,len(classNames)) # 修改vgg16模型中最后一层全连接层,输出目标类别个数
model
c:\users\hejialin\appdata\local\programs\python\python38\lib\site-packages\torchvision\models\_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
c:\users\hejialin\appdata\local\programs\python\python38\lib\site-packages\torchvision\models\_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=VGG16_Weights.IMAGENET1K_V1`. You can also use `weights=VGG16_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)





VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (18): ReLU(inplace=True)
    (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (20): ReLU(inplace=True)
    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (22): ReLU(inplace=True)
    (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (25): ReLU(inplace=True)
    (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (27): ReLU(inplace=True)
    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (29): ReLU(inplace=True)
    (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
  (classifier): Sequential(
    (0): Linear(in_features=25088, out_features=4096, bias=True)
    (1): ReLU(inplace=True)
    (2): Dropout(p=0.5, inplace=False)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU(inplace=True)
    (5): Dropout(p=0.5, inplace=False)
    (6): Linear(in_features=4096, out_features=17, bias=True)
  )
)
import torchinfo
torchinfo.summary(model) #显示模型信息
=================================================================
Layer (type:depth-idx)                   Param #
=================================================================
VGG                                      --
├─Sequential: 1-1                        --
│    └─Conv2d: 2-1                       (1,792)
│    └─ReLU: 2-2                         --
│    └─Conv2d: 2-3                       (36,928)
│    └─ReLU: 2-4                         --
│    └─MaxPool2d: 2-5                    --
│    └─Conv2d: 2-6                       (73,856)
│    └─ReLU: 2-7                         --
│    └─Conv2d: 2-8                       (147,584)
│    └─ReLU: 2-9                         --
│    └─MaxPool2d: 2-10                   --
│    └─Conv2d: 2-11                      (295,168)
│    └─ReLU: 2-12                        --
│    └─Conv2d: 2-13                      (590,080)
│    └─ReLU: 2-14                        --
│    └─Conv2d: 2-15                      (590,080)
│    └─ReLU: 2-16                        --
│    └─MaxPool2d: 2-17                   --
│    └─Conv2d: 2-18                      (1,180,160)
│    └─ReLU: 2-19                        --
│    └─Conv2d: 2-20                      (2,359,808)
│    └─ReLU: 2-21                        --
│    └─Conv2d: 2-22                      (2,359,808)
│    └─ReLU: 2-23                        --
│    └─MaxPool2d: 2-24                   --
│    └─Conv2d: 2-25                      (2,359,808)
│    └─ReLU: 2-26                        --
│    └─Conv2d: 2-27                      (2,359,808)
│    └─ReLU: 2-28                        --
│    └─Conv2d: 2-29                      (2,359,808)
│    └─ReLU: 2-30                        --
│    └─MaxPool2d: 2-31                   --
├─AdaptiveAvgPool2d: 1-2                 --
├─Sequential: 1-3                        --
│    └─Linear: 2-32                      (102,764,544)
│    └─ReLU: 2-33                        --
│    └─Dropout: 2-34                     --
│    └─Linear: 2-35                      (16,781,312)
│    └─ReLU: 2-36                        --
│    └─Dropout: 2-37                     --
│    └─Linear: 2-38                      69,649
=================================================================
Total params: 134,330,193
Trainable params: 69,649
Non-trainable params: 134,260,544
=================================================================

2. 训练函数

2.1 训练函数

  1. optimizer.zero_grad(),梯度清零。
  2. loss.backward(),反向传播计算每个w的梯度值。
  3. optimizer.step(),梯度下降法更新参数值。
# 训练循环
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)  # 训练集的大小,一共60000张图片
    num_batches = len(dataloader)   # 批次数目,1875(60000/32)

    train_loss, train_acc = 0, 0  # 初始化训练损失和正确率
    
    for X, y in dataloader:  # 获取图片及其标签
        #X, y = X.to(device), y.to(device)
        
        # 计算预测误差
        pred = model(X)          # 网络输出
        loss = loss_fn(pred, y)  # 计算网络输出和真实值之间的差距,targets为真实值,计算二者差值即为损失
        
        # 反向传播
        optimizer.zero_grad()  # grad属性归零
        loss.backward()        # 反向传播
        optimizer.step()       # 每一步自动更新
        
        # 记录acc与loss
        train_acc  += (pred.argmax(1) == y).type(torch.float).sum().item()
        train_loss += loss.item()
            
    train_acc  /= size
    train_loss /= num_batches

    return train_acc, train_loss

train_acc和loss的计算原理:

  • pred.argmax(1)返回预测结果pred行最大值的索引,每行是表示一个样本的预测概率分布。==y表示判断预测是否正确。
  • .type(torch.float)将判断结果转为浮点类型可以进行求和。
  • .sum()对预测结果的正误进行求和。
  • .item()将求和结果转为标量值便于输出。

2.2 测试函数

去掉了训练函数中梯度下降和权重更新的步骤。

def test (dataloader, model, loss_fn):
    size        = len(dataloader.dataset)  # 测试集的大小,一共10000张图片
    num_batches = len(dataloader)          # 批次数目,313(10000/32=312.5,向上取整)
    test_loss, test_acc = 0, 0
    
    # 当不进行训练时,停止梯度更新,节省计算内存消耗
    with torch.no_grad():
        for imgs, target in dataloader:
            #imgs, target = imgs.to(device), target.to(device)
            
            # 计算loss
            target_pred = model(imgs)
            loss        = loss_fn(target_pred, target)
            
            test_loss += loss.item()
            test_acc  += (target_pred.argmax(1) == target).type(torch.float).sum().item()

    test_acc  /= size
    test_loss /= num_batches

    return test_acc, test_loss

2.3 设置动态学习率

def adjust_learning_rate(optimizer, epoch, start_lr):
    # 每 2 个epoch衰减到原来的 0.98
    lr = start_lr * (0.92 ** (epoch // 2))
    for param_group in optimizer.param_groups:
        param_group['lr'] = lr

learn_rate = 1e-4 # 初始学习率
optimizer  = torch.optim.SGD(model.parameters(), lr=learn_rate)
#opt = torch.optim.Adam(model.parameters(), lr=learn_rate)
# # 调用官方动态学习率接口时使用
# lambda1 = lambda epoch: (0.92 ** (epoch // 2)
# optimizer = torch.optim.SGD(model.parameters(), lr=learn_rate)
# scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=lambda1) #选定调整方法

2.4 模型训练

  1. model.train(),用于训练,启用BN层和dropout。
  2. model.eval(),用于测试,不启用BN层和dropout。

loss_fn    = nn.CrossEntropyLoss() # 创建损失函数
epochs     = 40
train_loss = []
train_acc  = []
test_loss  = []
test_acc   = []
best_test_acc = 0
PATH = './model.pth'
for epoch in range(epochs):
    # 更新学习率(使用自定义学习率时使用)
    adjust_learning_rate(optimizer, epoch, learn_rate)
    
    model.train()
    epoch_train_acc, epoch_train_loss = train(train_dl, model, loss_fn, optimizer)
    
    model.eval()
    epoch_test_acc, epoch_test_loss = test(test_dl, model, loss_fn)
    
    train_acc.append(epoch_train_acc)
    train_loss.append(epoch_train_loss)
    test_acc.append(epoch_test_acc)
    test_loss.append(epoch_test_loss)

    # 获取当前的学习率
    lr = optimizer.state_dict()['param_groups'][0]['lr']
    
    template = ('Epoch:{:2d}, Train_acc:{:.1f}%, Train_loss:{:.3f}, Test_acc:{:.1f}%, Test_loss:{:.3f}, Lr:{:.2E}')
    print(template.format(epoch+1, epoch_train_acc*100, epoch_train_loss, 
                          epoch_test_acc*100, epoch_test_loss, lr))
    if best_test_acc < epoch_test_acc:
        best_test_acc = epoch_test_acc
        torch.save(model.state_dict(),PATH)
        print('save model')
print('Done')
Epoch: 1, Train_acc:6.8%, Train_loss:2.935, Test_acc:11.1%, Test_loss:2.829, Lr:1.00E-04
save model
Epoch: 2, Train_acc:6.7%, Train_loss:2.888, Test_acc:14.2%, Test_loss:2.789, Lr:1.00E-04
save model
Epoch: 3, Train_acc:7.7%, Train_loss:2.866, Test_acc:15.0%, Test_loss:2.772, Lr:9.20E-05
save model
Epoch: 4, Train_acc:8.9%, Train_loss:2.833, Test_acc:15.0%, Test_loss:2.750, Lr:9.20E-05
Epoch: 5, Train_acc:10.3%, Train_loss:2.804, Test_acc:15.8%, Test_loss:2.717, Lr:8.46E-05
save model
Epoch: 6, Train_acc:10.2%, Train_loss:2.801, Test_acc:16.7%, Test_loss:2.704, Lr:8.46E-05
save model
Epoch: 7, Train_acc:11.2%, Train_loss:2.760, Test_acc:17.2%, Test_loss:2.694, Lr:7.79E-05
save model
Epoch: 8, Train_acc:13.0%, Train_loss:2.747, Test_acc:17.5%, Test_loss:2.675, Lr:7.79E-05
save model
Epoch: 9, Train_acc:14.4%, Train_loss:2.730, Test_acc:18.3%, Test_loss:2.664, Lr:7.16E-05
save model
Epoch:10, Train_acc:14.7%, Train_loss:2.719, Test_acc:18.9%, Test_loss:2.647, Lr:7.16E-05
save model
Epoch:11, Train_acc:14.3%, Train_loss:2.710, Test_acc:19.2%, Test_loss:2.621, Lr:6.59E-05
save model
Epoch:12, Train_acc:13.8%, Train_loss:2.697, Test_acc:18.9%, Test_loss:2.620, Lr:6.59E-05
Epoch:13, Train_acc:15.1%, Train_loss:2.665, Test_acc:18.6%, Test_loss:2.608, Lr:6.06E-05
Epoch:14, Train_acc:15.5%, Train_loss:2.674, Test_acc:18.9%, Test_loss:2.603, Lr:6.06E-05
Epoch:15, Train_acc:16.0%, Train_loss:2.660, Test_acc:18.9%, Test_loss:2.610, Lr:5.58E-05
Epoch:16, Train_acc:16.7%, Train_loss:2.648, Test_acc:19.4%, Test_loss:2.586, Lr:5.58E-05
save model
Epoch:17, Train_acc:15.8%, Train_loss:2.634, Test_acc:19.4%, Test_loss:2.576, Lr:5.13E-05
Epoch:18, Train_acc:16.9%, Train_loss:2.632, Test_acc:19.4%, Test_loss:2.569, Lr:5.13E-05
Epoch:19, Train_acc:15.5%, Train_loss:2.623, Test_acc:19.4%, Test_loss:2.575, Lr:4.72E-05
Epoch:20, Train_acc:16.6%, Train_loss:2.595, Test_acc:19.4%, Test_loss:2.557, Lr:4.72E-05
Epoch:21, Train_acc:15.6%, Train_loss:2.617, Test_acc:19.7%, Test_loss:2.549, Lr:4.34E-05
save model
Epoch:22, Train_acc:16.7%, Train_loss:2.605, Test_acc:19.7%, Test_loss:2.549, Lr:4.34E-05
Epoch:23, Train_acc:16.6%, Train_loss:2.610, Test_acc:19.7%, Test_loss:2.544, Lr:4.00E-05
Epoch:24, Train_acc:16.9%, Train_loss:2.604, Test_acc:19.7%, Test_loss:2.535, Lr:4.00E-05
Epoch:25, Train_acc:17.2%, Train_loss:2.591, Test_acc:20.0%, Test_loss:2.537, Lr:3.68E-05
save model
Epoch:26, Train_acc:16.7%, Train_loss:2.591, Test_acc:19.7%, Test_loss:2.521, Lr:3.68E-05
Epoch:27, Train_acc:15.8%, Train_loss:2.579, Test_acc:20.0%, Test_loss:2.520, Lr:3.38E-05
Epoch:28, Train_acc:17.4%, Train_loss:2.578, Test_acc:20.0%, Test_loss:2.506, Lr:3.38E-05
Epoch:29, Train_acc:17.4%, Train_loss:2.571, Test_acc:20.0%, Test_loss:2.510, Lr:3.11E-05
Epoch:30, Train_acc:17.6%, Train_loss:2.561, Test_acc:20.0%, Test_loss:2.514, Lr:3.11E-05
Epoch:31, Train_acc:18.3%, Train_loss:2.543, Test_acc:20.0%, Test_loss:2.521, Lr:2.86E-05
Epoch:32, Train_acc:19.0%, Train_loss:2.552, Test_acc:20.0%, Test_loss:2.507, Lr:2.86E-05
Epoch:33, Train_acc:19.1%, Train_loss:2.548, Test_acc:20.0%, Test_loss:2.498, Lr:2.63E-05
Epoch:34, Train_acc:19.2%, Train_loss:2.550, Test_acc:20.0%, Test_loss:2.492, Lr:2.63E-05
Epoch:35, Train_acc:19.4%, Train_loss:2.529, Test_acc:20.3%, Test_loss:2.500, Lr:2.42E-05
save model
Epoch:36, Train_acc:15.4%, Train_loss:2.550, Test_acc:20.3%, Test_loss:2.504, Lr:2.42E-05
Epoch:37, Train_acc:18.4%, Train_loss:2.546, Test_acc:20.3%, Test_loss:2.499, Lr:2.23E-05
Epoch:38, Train_acc:18.8%, Train_loss:2.542, Test_acc:20.3%, Test_loss:2.493, Lr:2.23E-05
Epoch:39, Train_acc:19.0%, Train_loss:2.523, Test_acc:20.3%, Test_loss:2.483, Lr:2.05E-05
Epoch:40, Train_acc:19.0%, Train_loss:2.545, Test_acc:20.3%, Test_loss:2.469, Lr:2.05E-05
Done

三、训练结果可视化

1. loss and accuracy

import matplotlib.pyplot as plt
#隐藏警告
import warnings
warnings.filterwarnings("ignore")               #忽略警告信息
plt.rcParams['font.sans-serif']    = ['SimHei'] # 用来正常显示中文标签
plt.rcParams['axes.unicode_minus'] = False      # 用来正常显示负号
plt.rcParams['figure.dpi']         = 100        #分辨率

epochs_range = range(epochs)

plt.figure(figsize=(12, 3))
plt.subplot(1, 2, 1)

plt.plot(epochs_range, train_acc, label='Training Accuracy')
plt.plot(epochs_range, test_acc, label='Test Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')

plt.subplot(1, 2, 2)
plt.plot(epochs_range, train_loss, label='Training Loss')
plt.plot(epochs_range, test_loss, label='Test Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()

在这里插入图片描述

测试集准确率仅能达到20%。拔高要求后续再完成。

2. 指定图片预测

  • torch.squeeze(input,dim = None),进行维度压缩,去掉维度为1的维度,或去掉dim维度。
  • torch.unsqueeze(input,dim = int),进行维度扩充,在dim = int的维度增加维数为1的维度。
from PIL import Image 

classes = list(total_data.class_to_idx)

def predict_one_image(image_path, model, transform, classes):
    
    test_img = Image.open(image_path).convert('RGB')
    plt.imshow(test_img)  # 展示预测的图片

    test_img = transform(test_img)
    img = test_img.unsqueeze(0)
    
    model.eval()
    output = model(img)

    _,pred = torch.max(output,1)
    pred_class = classes[pred]
    print(f'预测结果是:{pred_class}')
#进行预测
predict_one_image(image_path='D:/BaiduNetdiskDownload/datasets/t6/Scarlett Johansson/001_cb004eea.jpg', 
                  model=loaded_model, 
                  transform=train_transforms, 
                  classes=classes)
预测结果是:Scarlett Johansson

在这里插入图片描述

总结

  1. 调用了VGG-16网络,训练速度很慢,人脸识别的准确率不够高,后续需要进一步提高准确率至60%。
  • 6
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值