PyTorch 笔记Ⅹ——迁移学习ResNet18_&_VGG16

获取所使用的NVIDIA显卡设备

!nvidia-smi
Tue Sep  1 10:29:57 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.66       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   45C    P0    27W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                 ERR! |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

谷歌云盘挂载及数据读取

from google.colab import drive
drive.mount('/content/drive')
Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive

文件拷贝到虚拟机是为了读取速度更快,因为Google Colab读取文件需要跟Google Drive通信,文件太多会导致读取IO口延迟太高

!cp /content/drive/My\ Drive/pyotrch_dataset/hymenoptera_aug_data.rar /content/

文件解压

!unrar x hymenoptera_aug_data.rar
UNRAR 5.50 freeware      Copyright (c) 1993-2017 Alexander Roshal


Extracting from hymenoptera_aug_data.rar

Creating    hymenoptera_data                                          OK
Creating    hymenoptera_data/train                                    OK
Creating    hymenoptera_data/train/ants                               OK
Extracting  hymenoptera_data/train/ants/0_0.jpg                           0%  OK 
Extracting  hymenoptera_data/train/ants/0_1.jpg                           0%  OK
Extracting  hymenoptera_data/train/ants/0_10.jpg                          0%  OK
Extracting  hymenoptera_data/train/ants/0_2.jpg                           0%  OK 
Extracting  hymenoptera_data/train/ants/0_3.jpg                           0%  OK
Extracting  hymenoptera_data/train/ants/0_4.jpg                           0%  OK 
Extracting  hymenoptera_data/train/ants/0_5.jpg                           0%  OK
Extracting  hymenoptera_data/train/ants/0_6.jpg                           0%  OK 
Extracting  hymenoptera_data/train/ants/0_7.jpg                           0%  OK 
Extracting  hymenoptera_data/train/ants/0_8.jpg                           0%  OK
...............................................
Creating    hymenoptera_data/train/bees                               OK
Extracting  hymenoptera_data/train/bees/0_0.jpg                          28%  OK 
Extracting  hymenoptera_data/train/bees/0_1.jpg                          28%  OK
Extracting  hymenoptera_data/train/bees/0_10.jpg                         28%  OK
Extracting  hymenoptera_data/train/bees/0_2.jpg                          28%  OK
Extracting  hymenoptera_data/train/bees/0_3.jpg                          28%  OK
Extracting  hymenoptera_data/train/bees/0_4.jpg                          28%  OK 
Extracting  hymenoptera_data/train/bees/0_5.jpg                          28%  OK
Extracting  hymenoptera_data/train/bees/0_6.jpg                          28%  OK
Extracting  hymenoptera_data/train/bees/0_7.jpg                          28%  OK 
Extracting  hymenoptera_data/train/bees/0_8.jpg                          28%  OK
...............................................
Extracting  hymenoptera_data/val/bees/9_1.jpg                            99%  OK 
Extracting  hymenoptera_data/val/bees/9_10.jpg                           99%  OK 
Extracting  hymenoptera_data/val/bees/9_2.jpg                            99%  OK 
Extracting  hymenoptera_data/val/bees/9_3.jpg                            99%  OK 
Extracting  hymenoptera_data/val/bees/9_4.jpg                            99%  OK 
Extracting  hymenoptera_data/val/bees/9_5.jpg                            99%  OK 
Extracting  hymenoptera_data/val/bees/9_6.jpg                            99%  OK 
Extracting  hymenoptera_data/val/bees/9_7.jpg                            99%  OK 
Extracting  hymenoptera_data/val/bees/9_8.jpg                            99%  OK 
Extracting  hymenoptera_data/val/bees/9_9.jpg                            99%  OK 
All OK

导入必要的包

import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
from torch.utils.data import DataLoader
import torchvision
from torchvision.transforms import transforms
from torchvision import models
import numpy as np
import matplotlib.pyplot as plt
import os

torch.__version__
'1.6.0+cu101'

数据集制作

数据集沿用前面数据增强的数据集

torchvision.datasets.ImageFolder(root, transform=None, target_transform=None, loader=<function default_loader>, is_valid_file=None)

通用数据加载器,其中图像以这种方式排列:

root/dog/xxx.png
root/dog/xxy.png
root/dog/xxz.png

root/cat/123.png
root/cat/nsdf3.png
root/cat/asd932_.png

root (string) – Root directory path.

transform (callable, optional) – 函数转换,用于对输入的图像进行一系列操作,例如transforms.RandomCrop

target_transform (callable, optional) – 函数转换,用于对输入的标签进行一系列操作

loader (callable, optional) – 给定路径加载图像的函数

is_valid_file – 获取图像文件路径并检查文件是否为有效文件的函数(用于检查损坏的文件,避免一些图片打不开,损坏等)

transforms.RandomResizedCrop(224)是随机裁剪一个224×224的正方形,当然可以省略这个操作,如果省略需要替换为transforms.Resize((224, 224), interpolation=2)

data_dir = './hymenoptera_data'

train_dataset = torchvision.datasets.ImageFolder(root=os.path.join(data_dir, 'train'),
                                                 transform=transforms.Compose(
                                                     [
                                                         transforms.RandomResizedCrop(224),
                                                         transforms.ToTensor(),
                                                         transforms.Normalize(
                                                             mean=(0.485, 0.456, 0.406),
                                                             std=(0.229, 0.224, 0.225))
                                                     ]))

val_dataset = torchvision.datasets.ImageFolder(root=os.path.join(data_dir, 'val'),
                                               transform=transforms.Compose(
                                                     [
                                                         transforms.RandomResizedCrop(224),
                                                         transforms.ToTensor(),
                                                         transforms.Normalize(
                                                             mean=(0.485, 0.456, 0.406),
                                                             std=(0.229, 0.224, 0.225))
                                                     ]))
train_dataloader = DataLoader(dataset=train_dataset, batch_size=100, shuffle=100)
val_dataloader = DataLoader(dataset=val_dataset, batch_size=100, shuffle=100)

获取类别名,类别在列表中的索引则是类别对应的数字映射

class_names = train_dataset.classes
print('class_names:{}'.format(class_names))
class_names:['ants', 'bees']

获取设备信息

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

print(device)
cuda:0

数据可视化

torchvision.utils.make_grid(tensor: Union[torch.Tensor, List[torch.Tensor]], nrow: int = 8, padding: int = 2, normalize: bool = False, range: Optional[Tuple[int, int]] = None, scale_each: bool = False, pad_value: int = 0) → torch.Tensor

这是一种处理图片显示的函数,主要目的是将图片规整排列,可以与matplotlib一起使用

参数:

tensor (Tensor or list) – 输入一个小批次的四维张量,其shape(B x C x H x W)或则是相同尺寸的图像列表list

nrow (int, optional) – 每一行展示的图像数目,默认为8,最终展示图像的网格尺寸为(B / nrow, nrow)【向上取整】

padding (int, optional) – 图片与图片之间的边框距离,以像素点为单位,默认为2

normalize (bool, optional) – bool值,如果为真则会归一化到(0, 1),可以和后面的range参数配合使用,确定normalize的范围
range (tuple, optional) – tuple (min, max) 与normalize配合使用,如果不设定值,其值由输入图像tensor来自动确定

scale_each (bool, optional) – 如果为True,则分别缩放该批图像中的每个图像,而不是缩放所有图像的(min,max)。默认值:False

pad_value (float, optional) – 填充像素的值。默认值:0,为每一张图片填充像素值【我感觉一般用不到】

# 展示图像的函数
def imshow(img):
    fig = plt.gcf()
    fig.set_size_inches(15, 6)
    
    npimg = img.numpy()
    print(npimg.shape)
    
    # 将通道数移至最后,每个图像的尺寸为32x32,其中边宽为2
    # 所以维度为(32+2+2)x(32x4+10)
    plt.imshow(np.transpose(npimg, (1, 2, 0)))


# 将数据制作成迭代器,并获取随机数据
dataiter = iter(train_dataloader)
images, labels = dataiter.next()

# 展示图像
imshow(torchvision.utils.make_grid(images[0:10], nrow=5, padding=0, normalize=True))
# 显示图像标签,利用列表推断式获取4张图片的标签
print(' '.join('%s ' % class_names[labels[j]] for j in range(10)))
(3, 448, 1120)
ants  bees  ants  bees  ants  bees  bees  bees  bees  ants 

1

ResNet18

由于ResNet18模型较小,模型参数较少,因此我们不采用冻结层的方法而是将整个网络全部用于训练

ResNet18模型获取与配置

利用pretrained=True,目标会自动下载预训练模型

resnet18 = models.resnet18(pretrained=True)

查看ResNet18模型结构

resnet18
ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer2): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer3): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer4): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
  (fc): Linear(in_features=512, out_features=1000, bias=True)
)

获取全连接层的输入参数

num_fc_in = resnet18.fc.in_features

num_fc_in
512

根据类别数重新配置输出参数

resnet18.fc = nn.Linear(num_fc_in, 2)

模型转至GPU设备

resnet18 = resnet18.to(device)

查看模型摘要

from torchsummary import summary

summary(resnet18, input_size=(3, 224, 224))
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 112, 112]           9,408
       BatchNorm2d-2         [-1, 64, 112, 112]             128
              ReLU-3         [-1, 64, 112, 112]               0
         MaxPool2d-4           [-1, 64, 56, 56]               0
            Conv2d-5           [-1, 64, 56, 56]          36,864
       BatchNorm2d-6           [-1, 64, 56, 56]             128
              ReLU-7           [-1, 64, 56, 56]               0
            Conv2d-8           [-1, 64, 56, 56]          36,864
       BatchNorm2d-9           [-1, 64, 56, 56]             128
             ReLU-10           [-1, 64, 56, 56]               0
       BasicBlock-11           [-1, 64, 56, 56]               0
           Conv2d-12           [-1, 64, 56, 56]          36,864
      BatchNorm2d-13           [-1, 64, 56, 56]             128
             ReLU-14           [-1, 64, 56, 56]               0
           Conv2d-15           [-1, 64, 56, 56]          36,864
      BatchNorm2d-16           [-1, 64, 56, 56]             128
             ReLU-17           [-1, 64, 56, 56]               0
       BasicBlock-18           [-1, 64, 56, 56]               0
           Conv2d-19          [-1, 128, 28, 28]          73,728
      BatchNorm2d-20          [-1, 128, 28, 28]             256
             ReLU-21          [-1, 128, 28, 28]               0
           Conv2d-22          [-1, 128, 28, 28]         147,456
      BatchNorm2d-23          [-1, 128, 28, 28]             256
           Conv2d-24          [-1, 128, 28, 28]           8,192
      BatchNorm2d-25          [-1, 128, 28, 28]             256
             ReLU-26          [-1, 128, 28, 28]               0
       BasicBlock-27          [-1, 128, 28, 28]               0
           Conv2d-28          [-1, 128, 28, 28]         147,456
      BatchNorm2d-29          [-1, 128, 28, 28]             256
             ReLU-30          [-1, 128, 28, 28]               0
           Conv2d-31          [-1, 128, 28, 28]         147,456
      BatchNorm2d-32          [-1, 128, 28, 28]             256
             ReLU-33          [-1, 128, 28, 28]               0
       BasicBlock-34          [-1, 128, 28, 28]               0
           Conv2d-35          [-1, 256, 14, 14]         294,912
      BatchNorm2d-36          [-1, 256, 14, 14]             512
             ReLU-37          [-1, 256, 14, 14]               0
           Conv2d-38          [-1, 256, 14, 14]         589,824
      BatchNorm2d-39          [-1, 256, 14, 14]             512
           Conv2d-40          [-1, 256, 14, 14]          32,768
      BatchNorm2d-41          [-1, 256, 14, 14]             512
             ReLU-42          [-1, 256, 14, 14]               0
       BasicBlock-43          [-1, 256, 14, 14]               0
           Conv2d-44          [-1, 256, 14, 14]         589,824
      BatchNorm2d-45          [-1, 256, 14, 14]             512
             ReLU-46          [-1, 256, 14, 14]               0
           Conv2d-47          [-1, 256, 14, 14]         589,824
      BatchNorm2d-48          [-1, 256, 14, 14]             512
             ReLU-49          [-1, 256, 14, 14]               0
       BasicBlock-50          [-1, 256, 14, 14]               0
           Conv2d-51            [-1, 512, 7, 7]       1,179,648
      BatchNorm2d-52            [-1, 512, 7, 7]           1,024
             ReLU-53            [-1, 512, 7, 7]               0
           Conv2d-54            [-1, 512, 7, 7]       2,359,296
      BatchNorm2d-55            [-1, 512, 7, 7]           1,024
           Conv2d-56            [-1, 512, 7, 7]         131,072
      BatchNorm2d-57            [-1, 512, 7, 7]           1,024
             ReLU-58            [-1, 512, 7, 7]               0
       BasicBlock-59            [-1, 512, 7, 7]               0
           Conv2d-60            [-1, 512, 7, 7]       2,359,296
      BatchNorm2d-61            [-1, 512, 7, 7]           1,024
             ReLU-62            [-1, 512, 7, 7]               0
           Conv2d-63            [-1, 512, 7, 7]       2,359,296
      BatchNorm2d-64            [-1, 512, 7, 7]           1,024
             ReLU-65            [-1, 512, 7, 7]               0
       BasicBlock-66            [-1, 512, 7, 7]               0
AdaptiveAvgPool2d-67            [-1, 512, 1, 1]               0
           Linear-68                    [-1, 2]           1,026
================================================================
Total params: 11,177,538
Trainable params: 11,177,538
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 62.79
Params size (MB): 42.64
Estimated Total Size (MB): 106.00
----------------------------------------------------------------

一共只需要训练11,177,538个参数,千万级的参数还是算少的

参数配置

定义模型参数与优化器

loss_fc = nn.CrossEntropyLoss()

optimizer = optim.Adam(resnet18.parameters(), lr=0.0001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)

定义衰减学习率

torch.optim.lr_scheduler.StepLR(optimizer, step_size, gamma=0.1, last_epoch=-1, verbose=False)

参数:
optimizer (Optimizer) – 优化器

step_size (int) – 学习率的衰减周期

gamma (float) – 学习率衰减乘数,默认:0.1

last_epoch (int) – 最后一个epoch的索引,默认值:-1

verbose (bool) – 如果为True,每次更新都会打印消息。默认值:False

# 这里定义的50个epoch进行一次衰减,为原来的0.5倍

scheduler = lr_scheduler.StepLR(optimizer=optimizer, step_size=10, gamma=0.5)

model.train() :启用BatchNormalization和Dropout
model.eval() :不启用BatchNormalization和Dropout

traineval用于控制bndropout的行为。因为bndropout层比较特殊,2者在traininginference阶段的计算方法不同。例如bntraining阶段需要计算running-meanrunning-var,但是在inference阶段不需要计算这2个参数,这2个参数来自训练阶段整个数据集的meanvar

开始训练

num_epochs = 50

for epoch in range(num_epochs):

    running_loss = 0.0

    for i, sample_batch in enumerate(train_dataloader):
        inputs = sample_batch[0]
        labels = sample_batch[1]

        resnet18.train()
        inputs = inputs.to(device)
        labels = labels.to(device)

        optimizer.zero_grad()
        outputs = resnet18(inputs)
        loss = loss_fc(outputs, labels)
        loss.backward()
        optimizer.step()
        scheduler.step()
        running_loss += loss.item()

        if (i+1) % 20 == 0:
            correct = 0
            total = 0
            resnet18.eval()
            for images_test, labels_test in val_dataloader:
                images_test = images_test.to(device)
                labels_test = labels_test.to(device)

                outputs_test = resnet18(images_test)
                _, prediction = torch.max(outputs_test, 1)
                correct += (torch.sum((prediction == labels_test))).item()
                total += labels_test.size(0)
            print('[{}, {}] val_loss = {:.5f} val_acc = {:.5f}'.format(epoch + 1, i + 1, running_loss / 20,
                                                                        correct / total))
            running_loss = 0.0

print('training finish !')
torch.save(resnet18.state_dict(), '/content/drive/My Drive/pyotrch_dataset/resnet18/resnet18.pth')
[1, 20] val_loss = 0.24572 val_acc = 0.89958
[2, 20] val_loss = 0.08751 val_acc = 0.91444
[3, 20] val_loss = 0.08537 val_acc = 0.90196
[4, 20] val_loss = 0.08147 val_acc = 0.91325
[5, 20] val_loss = 0.08427 val_acc = 0.91325
[6, 20] val_loss = 0.07889 val_acc = 0.89780
[7, 20] val_loss = 0.07812 val_acc = 0.90612
[8, 20] val_loss = 0.07872 val_acc = 0.90850
[9, 20] val_loss = 0.08467 val_acc = 0.90434
[10, 20] val_loss = 0.08600 val_acc = 0.90850
.............................................
[40, 20] val_loss = 0.07911 val_acc = 0.90790
[41, 20] val_loss = 0.08371 val_acc = 0.91147
[42, 20] val_loss = 0.07833 val_acc = 0.90671
[43, 20] val_loss = 0.08826 val_acc = 0.91147
[44, 20] val_loss = 0.07140 val_acc = 0.90969
[45, 20] val_loss = 0.07659 val_acc = 0.91087
[46, 20] val_loss = 0.07821 val_acc = 0.90612
[47, 20] val_loss = 0.08456 val_acc = 0.91266
[48, 20] val_loss = 0.07910 val_acc = 0.90790
[49, 20] val_loss = 0.08175 val_acc = 0.91325
[50, 20] val_loss = 0.07183 val_acc = 0.91384
training finish !

模型读取与预测

PIL与Transform读取方式

采用Pytorch指定的PIL方式读取图像,图像的处理更方便

from random import shuffle
import PIL.Image as Image


# 获取文件名列表并随机打乱
imagelist = ['./hymenoptera_data/val/ants/' + pic for pic in os.listdir('./hymenoptera_data/val/ants')] + \
            ['./hymenoptera_data/val/bees/' + pic for pic in os.listdir('./hymenoptera_data/val/bees')]
shuffle(imagelist)
transform=transforms.Compose([transforms.Resize((224, 224), interpolation=2),
                              transforms.ToTensor(),
                              transforms.Normalize(
                              mean=(0.485, 0.456, 0.406),
                              std=(0.229, 0.224, 0.225))])
fig = plt.gcf()
fig.set_size_inches(18, 18)

shuffle(imagelist)

resnet18.eval()

with torch.no_grad():
    for i in range(9):
        ax_img = plt.subplot(3, 3, i + 1)
        
        img = Image.open(imagelist[i])
        ax_img.imshow(img)
        img = transform(img).unsqueeze(0)
        img = img.to(device)
        
        outputs = resnet18(img)
        prediction = torch.max(outputs, 1)[1]
        ax_img.set_title('Label:' \
                         + imagelist[i].split('/')[-2] \
                         + '    Predict:' \
                         + class_names[prediction.item()],
                         fontsize=15)

1

VGG16

VGG16模型获取与配置

vgg16 = models.vgg16(pretrained=True)
Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /root/.cache/torch/hub/checkpoints/vgg16-397923af.pth



HBox(children=(FloatProgress(value=0.0, max=553433881.0), HTML(value='')))

查看模型,可以看到模型内部的包装结构,方便提取各部分

vgg16
VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (18): ReLU(inplace=True)
    (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (20): ReLU(inplace=True)
    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (22): ReLU(inplace=True)
    (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (25): ReLU(inplace=True)
    (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (27): ReLU(inplace=True)
    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (29): ReLU(inplace=True)
    (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
  (classifier): Sequential(
    (0): Linear(in_features=25088, out_features=4096, bias=True)
    (1): ReLU(inplace=True)
    (2): Dropout(p=0.5, inplace=False)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU(inplace=True)
    (5): Dropout(p=0.5, inplace=False)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

冻结所有层

for param in vgg16.parameters():
    param.requires_grad = False

获取全连接输入参数

num_fc_in = vgg16.classifier[0].in_features

num_fc_in
25088

重新定义分类层

vgg16.classifier = nn.Sequential(
              nn.Linear(num_fc_in, 128), 
              nn.ReLU(), 
              nn.Dropout(0.3),
              nn.Linear(128, 32),
              nn.ReLU(), 
              nn.Dropout(0.3),
              nn.Linear(32, 2))

模型转移至GPU

vgg16 = vgg16.to(device)

查看模型摘要

from torchsummary import summary

summary(vgg16, input_size=(3, 224, 224))
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 224, 224]           1,792
              ReLU-2         [-1, 64, 224, 224]               0
            Conv2d-3         [-1, 64, 224, 224]          36,928
              ReLU-4         [-1, 64, 224, 224]               0
         MaxPool2d-5         [-1, 64, 112, 112]               0
            Conv2d-6        [-1, 128, 112, 112]          73,856
              ReLU-7        [-1, 128, 112, 112]               0
            Conv2d-8        [-1, 128, 112, 112]         147,584
              ReLU-9        [-1, 128, 112, 112]               0
        MaxPool2d-10          [-1, 128, 56, 56]               0
           Conv2d-11          [-1, 256, 56, 56]         295,168
             ReLU-12          [-1, 256, 56, 56]               0
           Conv2d-13          [-1, 256, 56, 56]         590,080
             ReLU-14          [-1, 256, 56, 56]               0
           Conv2d-15          [-1, 256, 56, 56]         590,080
             ReLU-16          [-1, 256, 56, 56]               0
        MaxPool2d-17          [-1, 256, 28, 28]               0
           Conv2d-18          [-1, 512, 28, 28]       1,180,160
             ReLU-19          [-1, 512, 28, 28]               0
           Conv2d-20          [-1, 512, 28, 28]       2,359,808
             ReLU-21          [-1, 512, 28, 28]               0
           Conv2d-22          [-1, 512, 28, 28]       2,359,808
             ReLU-23          [-1, 512, 28, 28]               0
        MaxPool2d-24          [-1, 512, 14, 14]               0
           Conv2d-25          [-1, 512, 14, 14]       2,359,808
             ReLU-26          [-1, 512, 14, 14]               0
           Conv2d-27          [-1, 512, 14, 14]       2,359,808
             ReLU-28          [-1, 512, 14, 14]               0
           Conv2d-29          [-1, 512, 14, 14]       2,359,808
             ReLU-30          [-1, 512, 14, 14]               0
        MaxPool2d-31            [-1, 512, 7, 7]               0
AdaptiveAvgPool2d-32            [-1, 512, 7, 7]               0
           Linear-33                  [-1, 128]       3,211,392
             ReLU-34                  [-1, 128]               0
          Dropout-35                  [-1, 128]               0
           Linear-36                   [-1, 32]           4,128
             ReLU-37                   [-1, 32]               0
          Dropout-38                   [-1, 32]               0
           Linear-39                    [-1, 2]              66
================================================================
Total params: 17,930,274
Trainable params: 3,215,586
Non-trainable params: 14,714,688
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 218.59
Params size (MB): 68.40
Estimated Total Size (MB): 287.56
----------------------------------------------------------------

参数配置

scheduler = lr_scheduler.StepLR(optimizer=optimizer, step_size=10, gamma=0.5)
optimizer = optim.Adam(vgg16.parameters(), lr=0.0001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)

loss_fc = nn.CrossEntropyLoss()

开始训练

由于上一个网络的scheduler存在,所以会报一个scheduleroptimizer顺序执行的警告

num_epochs = 50

for epoch in range(num_epochs):

    running_loss = 0.0

    for i, sample_batch in enumerate(train_dataloader):
        inputs = sample_batch[0]
        labels = sample_batch[1]

        vgg16.train()
        inputs = inputs.to(device)
        labels = labels.to(device)

        optimizer.zero_grad()
        outputs = vgg16(inputs)
        loss = loss_fc(outputs, labels)
        loss.backward()
        optimizer.step()
        scheduler.step()
        running_loss += loss.item()

        if (i+1) % 20 == 0:
            correct = 0
            total = 0
            vgg16.eval()
            for images_test, labels_test in val_dataloader:
                images_test = images_test.to(device)
                labels_test = labels_test.to(device)

                outputs_test = vgg16(images_test)
                _, prediction = torch.max(outputs_test, 1)
                correct += (torch.sum((prediction == labels_test))).item()
                total += labels_test.size(0)
            print('[{}, {}] val_loss = {:.5f} val_acc = {:.5f}'.format(epoch + 1, i + 1, running_loss / 20,
                                                                        correct / total))
            running_loss = 0.0

print('training finish !')
torch.save(vgg16.state_dict(), '/content/drive/My Drive/pyotrch_dataset/vgg16/vgg16.pth')
/usr/local/lib/python3.6/dist-packages/torch/optim/lr_scheduler.py:123: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)


[1, 20] val_loss = 0.45805 val_acc = 0.90196
[2, 20] val_loss = 0.21736 val_acc = 0.90612
[3, 20] val_loss = 0.17701 val_acc = 0.90196
[4, 20] val_loss = 0.12621 val_acc = 0.91384
[5, 20] val_loss = 0.12525 val_acc = 0.90909
[6, 20] val_loss = 0.11202 val_acc = 0.90196
[7, 20] val_loss = 0.09616 val_acc = 0.91325
[8, 20] val_loss = 0.09081 val_acc = 0.90671
[9, 20] val_loss = 0.07763 val_acc = 0.90850
[10, 20] val_loss = 0.08697 val_acc = 0.89424
.............................................
[40, 20] val_loss = 0.05327 val_acc = 0.90018
[41, 20] val_loss = 0.05588 val_acc = 0.89780
[42, 20] val_loss = 0.05056 val_acc = 0.90612
[43, 20] val_loss = 0.05349 val_acc = 0.90731
[44, 20] val_loss = 0.05190 val_acc = 0.90255
[45, 20] val_loss = 0.05003 val_acc = 0.90790
[46, 20] val_loss = 0.05013 val_acc = 0.89899
[47, 20] val_loss = 0.04844 val_acc = 0.89067
[48, 20] val_loss = 0.04601 val_acc = 0.90969
[49, 20] val_loss = 0.05328 val_acc = 0.90255
[50, 20] val_loss = 0.04244 val_acc = 0.90196
training finish !

模型读取与预测

shuffle(imagelist)

OpenCV读取方式

OpenCV读取的方式的好处是可以方便的应用在摄像头与视频流中

import cv2

transform=transforms.Compose([transforms.ToTensor(),
                              transforms.Normalize(
                              mean=(0.485, 0.456, 0.406),
                              std=(0.229, 0.224, 0.225))
                              ])
fig = plt.gcf()
fig.set_size_inches(18, 18)

vgg16.eval()

with torch.no_grad():
    for i in range(9):
        ax_img = plt.subplot(3, 3, i + 1)
        
        img = cv2.imread(imagelist[i])
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        img = cv2.resize(img, (224, 224))

        IMG = img.copy()
        ax_img.imshow(IMG)
        
        img = transform(img).unsqueeze(0)
        img = img.to(device)
        
        outputs = vgg16(img)
        prediction = torch.max(outputs, 1)[1]
        
        ax_img.set_title('Label:' \
                         + imagelist[i].split('/')[-2] \
                         + '    Predict:' \
                         + class_names[prediction.item()],
                         fontsize=15)

1

也可以用下面一行代码进行表示,但是运算获得的tensor需要减去均值,这里就不详细展开了

img = torch.from_numpy(cv2.resize(cv2.cvtColor(cv2.imread(imagelist[0]), cv2.COLOR_BGR2RGB), (224, 224))).unsqueeze(0).transpose(1, 3).type(torch.FloatTensor).to(device)

PIL与OpenCV对比

OpenCV归一化处理图像

img = cv2.imread(imagelist[0])
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.resize(img, (224, 224))
transform=transforms.Compose([transforms.ToTensor(),
                              transforms.Normalize(
                              mean=(0.485, 0.456, 0.406),
                              std=(0.229, 0.224, 0.225))
                              ])
img = transform(img)
img
tensor([[[-2.0837, -1.8953, -1.8953,  ..., -0.4397, -0.4397, -0.3541],
         [-2.0837, -1.8439, -1.8268,  ..., -0.7993, -0.6281, -0.4397],
         [-2.0837, -1.8268, -1.7925,  ..., -0.7137, -0.5938, -0.5596],
         ...,
         [-2.1179, -2.1179, -2.1179,  ..., -1.4500, -1.6555, -1.7412],
         [-2.1179, -2.1179, -2.1179,  ..., -1.4843, -1.7069, -1.7583],
         [-2.1179, -2.1179, -2.1179,  ..., -1.5699, -1.7412, -1.7754]],

        [[-2.0007, -1.7906, -1.7906,  ..., -0.4776, -0.4776, -0.3901],
         [-2.0007, -1.7206, -1.7206,  ..., -0.8102, -0.6352, -0.4426],
         [-2.0007, -1.7206, -1.6856,  ..., -0.6527, -0.5301, -0.5301],
         ...,
         [-2.0357, -2.0357, -2.0357,  ..., -1.3179, -1.5280, -1.6155],
         [-2.0357, -2.0357, -2.0357,  ..., -1.3529, -1.5805, -1.6331],
         [-2.0357, -2.0357, -2.0357,  ..., -1.4405, -1.6155, -1.6331]],

        [[-1.6999, -1.4907, -1.4907,  ...,  0.2348,  0.2522,  0.3393],
         [-1.6999, -1.4384, -1.4210,  ..., -0.0441,  0.1302,  0.3219],
         [-1.7347, -1.4559, -1.4210,  ...,  0.1128,  0.2348,  0.2696],
         ...,
         [-1.8044, -1.8044, -1.8044,  ..., -1.1421, -1.3513, -1.4384],
         [-1.8044, -1.8044, -1.8044,  ..., -1.1770, -1.4036, -1.4559],
         [-1.8044, -1.8044, -1.8044,  ..., -1.2641, -1.4384, -1.4733]]])
img.size()
torch.Size([3, 224, 224])

PIL归一化处理图像

img = Image.open(imagelist[0])
transform=transforms.Compose([transforms.Resize((224, 224), interpolation=2),
                              transforms.ToTensor(),
                              transforms.Normalize(
                              mean=(0.485, 0.456, 0.406),
                              std=(0.229, 0.224, 0.225))
                              ])
img = transform(img)
img
tensor([[[-2.0494, -1.8953, -1.8782,  ..., -0.5082, -0.4739, -0.3883],
         [-2.0494, -1.8439, -1.8097,  ..., -0.7650, -0.6109, -0.4568],
         [-2.0494, -1.8268, -1.7754,  ..., -0.7137, -0.5938, -0.5596],
         ...,
         [-2.1179, -2.1179, -2.1179,  ..., -1.4672, -1.6555, -1.7240],
         [-2.1179, -2.1179, -2.1179,  ..., -1.4843, -1.7069, -1.7583],
         [-2.1179, -2.1179, -2.1179,  ..., -1.5699, -1.7412, -1.7583]],

        [[-1.9657, -1.7906, -1.7731,  ..., -0.5301, -0.4951, -0.3901],
         [-1.9482, -1.7381, -1.7031,  ..., -0.7577, -0.6001, -0.4426],
         [-1.9482, -1.7206, -1.6681,  ..., -0.6702, -0.5476, -0.5301],
         ...,
         [-2.0357, -2.0357, -2.0357,  ..., -1.3354, -1.5280, -1.5980],
         [-2.0357, -2.0357, -2.0357,  ..., -1.3529, -1.5805, -1.6331],
         [-2.0357, -2.0357, -2.0357,  ..., -1.4405, -1.6155, -1.6331]],

        [[-1.6650, -1.4907, -1.4733,  ...,  0.1999,  0.2348,  0.3393],
         [-1.6476, -1.4384, -1.4210,  ..., -0.0092,  0.1651,  0.3045],
         [-1.6824, -1.4559, -1.4036,  ...,  0.1128,  0.2348,  0.2522],
         ...,
         [-1.8044, -1.8044, -1.8044,  ..., -1.1596, -1.3513, -1.4210],
         [-1.8044, -1.8044, -1.8044,  ..., -1.1770, -1.4036, -1.4559],
         [-1.8044, -1.8044, -1.8044,  ..., -1.2641, -1.4384, -1.4559]]])

结论:可以发现两者之间具有一定的差距,这是因为在运算时,由于计算机运算时保存的位数有限,尽管运算法则相同,但是运算的优先顺序不同,计算出来的值就可能不一样,比如 a × b / c a×b/c a×b/c a / c × b a/c×b a/c×b所获取的结果就不同

另外,OpenCV与Pytorch中内嵌PIL函数在执行resize操作时都默认使用的是双线性插值,其默认值可以在各自的官网查询得到

  • 5
    点赞
  • 27
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值