文章目录
获取所使用的NVIDIA显卡设备
!nvidia-smi
Tue Sep 1 10:29:57 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.66 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:00:04.0 Off | 0 |
| N/A 45C P0 27W / 250W | 0MiB / 16280MiB | 0% Default |
| | | ERR! |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
谷歌云盘挂载及数据读取
from google.colab import drive
drive.mount('/content/drive')
Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly&response_type=code
Enter your authorization code:
··········
Mounted at /content/drive
文件拷贝到虚拟机是为了读取速度更快,因为Google Colab读取文件需要跟Google Drive通信,文件太多会导致读取IO口延迟太高
!cp /content/drive/My\ Drive/pyotrch_dataset/hymenoptera_aug_data.rar /content/
文件解压
!unrar x hymenoptera_aug_data.rar
UNRAR 5.50 freeware Copyright (c) 1993-2017 Alexander Roshal
Extracting from hymenoptera_aug_data.rar
Creating hymenoptera_data OK
Creating hymenoptera_data/train OK
Creating hymenoptera_data/train/ants OK
Extracting hymenoptera_data/train/ants/0_0.jpg 0% OK
Extracting hymenoptera_data/train/ants/0_1.jpg 0% OK
Extracting hymenoptera_data/train/ants/0_10.jpg 0% OK
Extracting hymenoptera_data/train/ants/0_2.jpg 0% OK
Extracting hymenoptera_data/train/ants/0_3.jpg 0% OK
Extracting hymenoptera_data/train/ants/0_4.jpg 0% OK
Extracting hymenoptera_data/train/ants/0_5.jpg 0% OK
Extracting hymenoptera_data/train/ants/0_6.jpg 0% OK
Extracting hymenoptera_data/train/ants/0_7.jpg 0% OK
Extracting hymenoptera_data/train/ants/0_8.jpg 0% OK
...............................................
Creating hymenoptera_data/train/bees OK
Extracting hymenoptera_data/train/bees/0_0.jpg 28% OK
Extracting hymenoptera_data/train/bees/0_1.jpg 28% OK
Extracting hymenoptera_data/train/bees/0_10.jpg 28% OK
Extracting hymenoptera_data/train/bees/0_2.jpg 28% OK
Extracting hymenoptera_data/train/bees/0_3.jpg 28% OK
Extracting hymenoptera_data/train/bees/0_4.jpg 28% OK
Extracting hymenoptera_data/train/bees/0_5.jpg 28% OK
Extracting hymenoptera_data/train/bees/0_6.jpg 28% OK
Extracting hymenoptera_data/train/bees/0_7.jpg 28% OK
Extracting hymenoptera_data/train/bees/0_8.jpg 28% OK
...............................................
Extracting hymenoptera_data/val/bees/9_1.jpg 99% OK
Extracting hymenoptera_data/val/bees/9_10.jpg 99% OK
Extracting hymenoptera_data/val/bees/9_2.jpg 99% OK
Extracting hymenoptera_data/val/bees/9_3.jpg 99% OK
Extracting hymenoptera_data/val/bees/9_4.jpg 99% OK
Extracting hymenoptera_data/val/bees/9_5.jpg 99% OK
Extracting hymenoptera_data/val/bees/9_6.jpg 99% OK
Extracting hymenoptera_data/val/bees/9_7.jpg 99% OK
Extracting hymenoptera_data/val/bees/9_8.jpg 99% OK
Extracting hymenoptera_data/val/bees/9_9.jpg 99% OK
All OK
导入必要的包
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
from torch.utils.data import DataLoader
import torchvision
from torchvision.transforms import transforms
from torchvision import models
import numpy as np
import matplotlib.pyplot as plt
import os
torch.__version__
'1.6.0+cu101'
数据集制作
数据集沿用前面数据增强的数据集
torchvision.datasets.ImageFolder(root, transform=None, target_transform=None, loader=<function default_loader>
, is_valid_file=None)
通用数据加载器,其中图像以这种方式排列:
root/dog/xxx.png
root/dog/xxy.png
root/dog/xxz.png
root/cat/123.png
root/cat/nsdf3.png
root/cat/asd932_.png
root (string) – Root directory path.
transform (callable, optional) – 函数转换,用于对输入的图像进行一系列操作,例如transforms.RandomCrop
target_transform (callable, optional) – 函数转换,用于对输入的标签进行一系列操作
loader (callable, optional) – 给定路径加载图像的函数
is_valid_file – 获取图像文件路径并检查文件是否为有效文件的函数(用于检查损坏的文件,避免一些图片打不开,损坏等)
transforms.RandomResizedCrop(224)
是随机裁剪一个224×224的正方形,当然可以省略这个操作,如果省略需要替换为transforms.Resize((224, 224), interpolation=2)
data_dir = './hymenoptera_data'
train_dataset = torchvision.datasets.ImageFolder(root=os.path.join(data_dir, 'train'),
transform=transforms.Compose(
[
transforms.RandomResizedCrop(224),
transforms.ToTensor(),
transforms.Normalize(
mean=(0.485, 0.456, 0.406),
std=(0.229, 0.224, 0.225))
]))
val_dataset = torchvision.datasets.ImageFolder(root=os.path.join(data_dir, 'val'),
transform=transforms.Compose(
[
transforms.RandomResizedCrop(224),
transforms.ToTensor(),
transforms.Normalize(
mean=(0.485, 0.456, 0.406),
std=(0.229, 0.224, 0.225))
]))
train_dataloader = DataLoader(dataset=train_dataset, batch_size=100, shuffle=100)
val_dataloader = DataLoader(dataset=val_dataset, batch_size=100, shuffle=100)
获取类别名,类别在列表中的索引则是类别对应的数字映射
class_names = train_dataset.classes
print('class_names:{}'.format(class_names))
class_names:['ants', 'bees']
获取设备信息
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print(device)
cuda:0
数据可视化
torchvision.utils.make_grid(tensor: Union[torch.Tensor, List[torch.Tensor]], nrow: int = 8, padding: int = 2, normalize: bool = False, range: Optional[Tuple[int, int]] = None, scale_each: bool = False, pad_value: int = 0) → torch.Tensor
这是一种处理图片显示的函数,主要目的是将图片规整排列,可以与matplotlib
一起使用
参数:
tensor (Tensor or list) – 输入一个小批次的四维张量,其shape
为(B x C x H x W)
或则是相同尺寸的图像列表list
nrow (int, optional) – 每一行展示的图像数目,默认为8,最终展示图像的网格尺寸为(B / nrow, nrow)
【向上取整】
padding (int, optional) – 图片与图片之间的边框距离,以像素点为单位,默认为2
normalize (bool, optional) – bool
值,如果为真则会归一化到(0, 1)
,可以和后面的range
参数配合使用,确定normalize
的范围
range (tuple, optional) – tuple (min, max) 与normalize
配合使用,如果不设定值,其值由输入图像tensor来自动确定
scale_each (bool, optional) – 如果为True
,则分别缩放该批图像中的每个图像,而不是缩放所有图像的(min,max)。默认值:False
pad_value (float, optional) – 填充像素的值。默认值:0,为每一张图片填充像素值【我感觉一般用不到】
# 展示图像的函数
def imshow(img):
fig = plt.gcf()
fig.set_size_inches(15, 6)
npimg = img.numpy()
print(npimg.shape)
# 将通道数移至最后,每个图像的尺寸为32x32,其中边宽为2
# 所以维度为(32+2+2)x(32x4+10)
plt.imshow(np.transpose(npimg, (1, 2, 0)))
# 将数据制作成迭代器,并获取随机数据
dataiter = iter(train_dataloader)
images, labels = dataiter.next()
# 展示图像
imshow(torchvision.utils.make_grid(images[0:10], nrow=5, padding=0, normalize=True))
# 显示图像标签,利用列表推断式获取4张图片的标签
print(' '.join('%s ' % class_names[labels[j]] for j in range(10)))
(3, 448, 1120)
ants bees ants bees ants bees bees bees bees ants
ResNet18
由于ResNet18
模型较小,模型参数较少,因此我们不采用冻结层的方法而是将整个网络全部用于训练
ResNet18模型获取与配置
利用pretrained=True
,目标会自动下载预训练模型
resnet18 = models.resnet18(pretrained=True)
查看ResNet18
模型结构
resnet18
ResNet(
(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(layer1): Sequential(
(0): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(1): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(layer2): Sequential(
(0): BasicBlock(
(conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(layer3): Sequential(
(0): BasicBlock(
(conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(layer4): Sequential(
(0): BasicBlock(
(conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
(fc): Linear(in_features=512, out_features=1000, bias=True)
)
获取全连接层的输入参数
num_fc_in = resnet18.fc.in_features
num_fc_in
512
根据类别数重新配置输出参数
resnet18.fc = nn.Linear(num_fc_in, 2)
模型转至GPU设备
resnet18 = resnet18.to(device)
查看模型摘要
from torchsummary import summary
summary(resnet18, input_size=(3, 224, 224))
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 64, 112, 112] 9,408
BatchNorm2d-2 [-1, 64, 112, 112] 128
ReLU-3 [-1, 64, 112, 112] 0
MaxPool2d-4 [-1, 64, 56, 56] 0
Conv2d-5 [-1, 64, 56, 56] 36,864
BatchNorm2d-6 [-1, 64, 56, 56] 128
ReLU-7 [-1, 64, 56, 56] 0
Conv2d-8 [-1, 64, 56, 56] 36,864
BatchNorm2d-9 [-1, 64, 56, 56] 128
ReLU-10 [-1, 64, 56, 56] 0
BasicBlock-11 [-1, 64, 56, 56] 0
Conv2d-12 [-1, 64, 56, 56] 36,864
BatchNorm2d-13 [-1, 64, 56, 56] 128
ReLU-14 [-1, 64, 56, 56] 0
Conv2d-15 [-1, 64, 56, 56] 36,864
BatchNorm2d-16 [-1, 64, 56, 56] 128
ReLU-17 [-1, 64, 56, 56] 0
BasicBlock-18 [-1, 64, 56, 56] 0
Conv2d-19 [-1, 128, 28, 28] 73,728
BatchNorm2d-20 [-1, 128, 28, 28] 256
ReLU-21 [-1, 128, 28, 28] 0
Conv2d-22 [-1, 128, 28, 28] 147,456
BatchNorm2d-23 [-1, 128, 28, 28] 256
Conv2d-24 [-1, 128, 28, 28] 8,192
BatchNorm2d-25 [-1, 128, 28, 28] 256
ReLU-26 [-1, 128, 28, 28] 0
BasicBlock-27 [-1, 128, 28, 28] 0
Conv2d-28 [-1, 128, 28, 28] 147,456
BatchNorm2d-29 [-1, 128, 28, 28] 256
ReLU-30 [-1, 128, 28, 28] 0
Conv2d-31 [-1, 128, 28, 28] 147,456
BatchNorm2d-32 [-1, 128, 28, 28] 256
ReLU-33 [-1, 128, 28, 28] 0
BasicBlock-34 [-1, 128, 28, 28] 0
Conv2d-35 [-1, 256, 14, 14] 294,912
BatchNorm2d-36 [-1, 256, 14, 14] 512
ReLU-37 [-1, 256, 14, 14] 0
Conv2d-38 [-1, 256, 14, 14] 589,824
BatchNorm2d-39 [-1, 256, 14, 14] 512
Conv2d-40 [-1, 256, 14, 14] 32,768
BatchNorm2d-41 [-1, 256, 14, 14] 512
ReLU-42 [-1, 256, 14, 14] 0
BasicBlock-43 [-1, 256, 14, 14] 0
Conv2d-44 [-1, 256, 14, 14] 589,824
BatchNorm2d-45 [-1, 256, 14, 14] 512
ReLU-46 [-1, 256, 14, 14] 0
Conv2d-47 [-1, 256, 14, 14] 589,824
BatchNorm2d-48 [-1, 256, 14, 14] 512
ReLU-49 [-1, 256, 14, 14] 0
BasicBlock-50 [-1, 256, 14, 14] 0
Conv2d-51 [-1, 512, 7, 7] 1,179,648
BatchNorm2d-52 [-1, 512, 7, 7] 1,024
ReLU-53 [-1, 512, 7, 7] 0
Conv2d-54 [-1, 512, 7, 7] 2,359,296
BatchNorm2d-55 [-1, 512, 7, 7] 1,024
Conv2d-56 [-1, 512, 7, 7] 131,072
BatchNorm2d-57 [-1, 512, 7, 7] 1,024
ReLU-58 [-1, 512, 7, 7] 0
BasicBlock-59 [-1, 512, 7, 7] 0
Conv2d-60 [-1, 512, 7, 7] 2,359,296
BatchNorm2d-61 [-1, 512, 7, 7] 1,024
ReLU-62 [-1, 512, 7, 7] 0
Conv2d-63 [-1, 512, 7, 7] 2,359,296
BatchNorm2d-64 [-1, 512, 7, 7] 1,024
ReLU-65 [-1, 512, 7, 7] 0
BasicBlock-66 [-1, 512, 7, 7] 0
AdaptiveAvgPool2d-67 [-1, 512, 1, 1] 0
Linear-68 [-1, 2] 1,026
================================================================
Total params: 11,177,538
Trainable params: 11,177,538
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 62.79
Params size (MB): 42.64
Estimated Total Size (MB): 106.00
----------------------------------------------------------------
一共只需要训练11,177,538个参数,千万级的参数还是算少的
参数配置
定义模型参数与优化器
loss_fc = nn.CrossEntropyLoss()
optimizer = optim.Adam(resnet18.parameters(), lr=0.0001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)
定义衰减学习率
torch.optim.lr_scheduler.StepLR(optimizer, step_size, gamma=0.1, last_epoch=-1, verbose=False)
参数:
optimizer (Optimizer) – 优化器
step_size (int) – 学习率的衰减周期
gamma (float) – 学习率衰减乘数,默认:0.1
last_epoch (int) – 最后一个epoch的索引,默认值:-1
verbose (bool) – 如果为True,每次更新都会打印消息。默认值:False
# 这里定义的50个epoch进行一次衰减,为原来的0.5倍
scheduler = lr_scheduler.StepLR(optimizer=optimizer, step_size=10, gamma=0.5)
model.train() :启用BatchNormalization和Dropout
model.eval() :不启用BatchNormalization和Dropout
train
和eval
用于控制bn
与dropout
的行为。因为bn
,dropout
层比较特殊,2者在training
和inference
阶段的计算方法不同。例如bn
,training
阶段需要计算running-mean
和running-var
,但是在inference
阶段不需要计算这2个参数,这2个参数来自训练阶段整个数据集的mean
与var
。
开始训练
num_epochs = 50
for epoch in range(num_epochs):
running_loss = 0.0
for i, sample_batch in enumerate(train_dataloader):
inputs = sample_batch[0]
labels = sample_batch[1]
resnet18.train()
inputs = inputs.to(device)
labels = labels.to(device)
optimizer.zero_grad()
outputs = resnet18(inputs)
loss = loss_fc(outputs, labels)
loss.backward()
optimizer.step()
scheduler.step()
running_loss += loss.item()
if (i+1) % 20 == 0:
correct = 0
total = 0
resnet18.eval()
for images_test, labels_test in val_dataloader:
images_test = images_test.to(device)
labels_test = labels_test.to(device)
outputs_test = resnet18(images_test)
_, prediction = torch.max(outputs_test, 1)
correct += (torch.sum((prediction == labels_test))).item()
total += labels_test.size(0)
print('[{}, {}] val_loss = {:.5f} val_acc = {:.5f}'.format(epoch + 1, i + 1, running_loss / 20,
correct / total))
running_loss = 0.0
print('training finish !')
torch.save(resnet18.state_dict(), '/content/drive/My Drive/pyotrch_dataset/resnet18/resnet18.pth')
[1, 20] val_loss = 0.24572 val_acc = 0.89958
[2, 20] val_loss = 0.08751 val_acc = 0.91444
[3, 20] val_loss = 0.08537 val_acc = 0.90196
[4, 20] val_loss = 0.08147 val_acc = 0.91325
[5, 20] val_loss = 0.08427 val_acc = 0.91325
[6, 20] val_loss = 0.07889 val_acc = 0.89780
[7, 20] val_loss = 0.07812 val_acc = 0.90612
[8, 20] val_loss = 0.07872 val_acc = 0.90850
[9, 20] val_loss = 0.08467 val_acc = 0.90434
[10, 20] val_loss = 0.08600 val_acc = 0.90850
.............................................
[40, 20] val_loss = 0.07911 val_acc = 0.90790
[41, 20] val_loss = 0.08371 val_acc = 0.91147
[42, 20] val_loss = 0.07833 val_acc = 0.90671
[43, 20] val_loss = 0.08826 val_acc = 0.91147
[44, 20] val_loss = 0.07140 val_acc = 0.90969
[45, 20] val_loss = 0.07659 val_acc = 0.91087
[46, 20] val_loss = 0.07821 val_acc = 0.90612
[47, 20] val_loss = 0.08456 val_acc = 0.91266
[48, 20] val_loss = 0.07910 val_acc = 0.90790
[49, 20] val_loss = 0.08175 val_acc = 0.91325
[50, 20] val_loss = 0.07183 val_acc = 0.91384
training finish !
模型读取与预测
PIL与Transform读取方式
采用Pytorch指定的PIL方式读取图像,图像的处理更方便
from random import shuffle
import PIL.Image as Image
# 获取文件名列表并随机打乱
imagelist = ['./hymenoptera_data/val/ants/' + pic for pic in os.listdir('./hymenoptera_data/val/ants')] + \
['./hymenoptera_data/val/bees/' + pic for pic in os.listdir('./hymenoptera_data/val/bees')]
shuffle(imagelist)
transform=transforms.Compose([transforms.Resize((224, 224), interpolation=2),
transforms.ToTensor(),
transforms.Normalize(
mean=(0.485, 0.456, 0.406),
std=(0.229, 0.224, 0.225))])
fig = plt.gcf()
fig.set_size_inches(18, 18)
shuffle(imagelist)
resnet18.eval()
with torch.no_grad():
for i in range(9):
ax_img = plt.subplot(3, 3, i + 1)
img = Image.open(imagelist[i])
ax_img.imshow(img)
img = transform(img).unsqueeze(0)
img = img.to(device)
outputs = resnet18(img)
prediction = torch.max(outputs, 1)[1]
ax_img.set_title('Label:' \
+ imagelist[i].split('/')[-2] \
+ ' Predict:' \
+ class_names[prediction.item()],
fontsize=15)
VGG16
VGG16模型获取与配置
vgg16 = models.vgg16(pretrained=True)
Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /root/.cache/torch/hub/checkpoints/vgg16-397923af.pth
HBox(children=(FloatProgress(value=0.0, max=553433881.0), HTML(value='')))
查看模型,可以看到模型内部的包装结构,方便提取各部分
vgg16
VGG(
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace=True)
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(6): ReLU(inplace=True)
(7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(8): ReLU(inplace=True)
(9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): ReLU(inplace=True)
(14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(15): ReLU(inplace=True)
(16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(18): ReLU(inplace=True)
(19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(20): ReLU(inplace=True)
(21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(22): ReLU(inplace=True)
(23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(25): ReLU(inplace=True)
(26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(27): ReLU(inplace=True)
(28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(29): ReLU(inplace=True)
(30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
(classifier): Sequential(
(0): Linear(in_features=25088, out_features=4096, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.5, inplace=False)
(3): Linear(in_features=4096, out_features=4096, bias=True)
(4): ReLU(inplace=True)
(5): Dropout(p=0.5, inplace=False)
(6): Linear(in_features=4096, out_features=1000, bias=True)
)
)
冻结所有层
for param in vgg16.parameters():
param.requires_grad = False
获取全连接输入参数
num_fc_in = vgg16.classifier[0].in_features
num_fc_in
25088
重新定义分类层
vgg16.classifier = nn.Sequential(
nn.Linear(num_fc_in, 128),
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(128, 32),
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(32, 2))
模型转移至GPU
vgg16 = vgg16.to(device)
查看模型摘要
from torchsummary import summary
summary(vgg16, input_size=(3, 224, 224))
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 64, 224, 224] 1,792
ReLU-2 [-1, 64, 224, 224] 0
Conv2d-3 [-1, 64, 224, 224] 36,928
ReLU-4 [-1, 64, 224, 224] 0
MaxPool2d-5 [-1, 64, 112, 112] 0
Conv2d-6 [-1, 128, 112, 112] 73,856
ReLU-7 [-1, 128, 112, 112] 0
Conv2d-8 [-1, 128, 112, 112] 147,584
ReLU-9 [-1, 128, 112, 112] 0
MaxPool2d-10 [-1, 128, 56, 56] 0
Conv2d-11 [-1, 256, 56, 56] 295,168
ReLU-12 [-1, 256, 56, 56] 0
Conv2d-13 [-1, 256, 56, 56] 590,080
ReLU-14 [-1, 256, 56, 56] 0
Conv2d-15 [-1, 256, 56, 56] 590,080
ReLU-16 [-1, 256, 56, 56] 0
MaxPool2d-17 [-1, 256, 28, 28] 0
Conv2d-18 [-1, 512, 28, 28] 1,180,160
ReLU-19 [-1, 512, 28, 28] 0
Conv2d-20 [-1, 512, 28, 28] 2,359,808
ReLU-21 [-1, 512, 28, 28] 0
Conv2d-22 [-1, 512, 28, 28] 2,359,808
ReLU-23 [-1, 512, 28, 28] 0
MaxPool2d-24 [-1, 512, 14, 14] 0
Conv2d-25 [-1, 512, 14, 14] 2,359,808
ReLU-26 [-1, 512, 14, 14] 0
Conv2d-27 [-1, 512, 14, 14] 2,359,808
ReLU-28 [-1, 512, 14, 14] 0
Conv2d-29 [-1, 512, 14, 14] 2,359,808
ReLU-30 [-1, 512, 14, 14] 0
MaxPool2d-31 [-1, 512, 7, 7] 0
AdaptiveAvgPool2d-32 [-1, 512, 7, 7] 0
Linear-33 [-1, 128] 3,211,392
ReLU-34 [-1, 128] 0
Dropout-35 [-1, 128] 0
Linear-36 [-1, 32] 4,128
ReLU-37 [-1, 32] 0
Dropout-38 [-1, 32] 0
Linear-39 [-1, 2] 66
================================================================
Total params: 17,930,274
Trainable params: 3,215,586
Non-trainable params: 14,714,688
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 218.59
Params size (MB): 68.40
Estimated Total Size (MB): 287.56
----------------------------------------------------------------
参数配置
scheduler = lr_scheduler.StepLR(optimizer=optimizer, step_size=10, gamma=0.5)
optimizer = optim.Adam(vgg16.parameters(), lr=0.0001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)
loss_fc = nn.CrossEntropyLoss()
开始训练
由于上一个网络的scheduler
存在,所以会报一个scheduler
与optimizer
顺序执行的警告
num_epochs = 50
for epoch in range(num_epochs):
running_loss = 0.0
for i, sample_batch in enumerate(train_dataloader):
inputs = sample_batch[0]
labels = sample_batch[1]
vgg16.train()
inputs = inputs.to(device)
labels = labels.to(device)
optimizer.zero_grad()
outputs = vgg16(inputs)
loss = loss_fc(outputs, labels)
loss.backward()
optimizer.step()
scheduler.step()
running_loss += loss.item()
if (i+1) % 20 == 0:
correct = 0
total = 0
vgg16.eval()
for images_test, labels_test in val_dataloader:
images_test = images_test.to(device)
labels_test = labels_test.to(device)
outputs_test = vgg16(images_test)
_, prediction = torch.max(outputs_test, 1)
correct += (torch.sum((prediction == labels_test))).item()
total += labels_test.size(0)
print('[{}, {}] val_loss = {:.5f} val_acc = {:.5f}'.format(epoch + 1, i + 1, running_loss / 20,
correct / total))
running_loss = 0.0
print('training finish !')
torch.save(vgg16.state_dict(), '/content/drive/My Drive/pyotrch_dataset/vgg16/vgg16.pth')
/usr/local/lib/python3.6/dist-packages/torch/optim/lr_scheduler.py:123: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
[1, 20] val_loss = 0.45805 val_acc = 0.90196
[2, 20] val_loss = 0.21736 val_acc = 0.90612
[3, 20] val_loss = 0.17701 val_acc = 0.90196
[4, 20] val_loss = 0.12621 val_acc = 0.91384
[5, 20] val_loss = 0.12525 val_acc = 0.90909
[6, 20] val_loss = 0.11202 val_acc = 0.90196
[7, 20] val_loss = 0.09616 val_acc = 0.91325
[8, 20] val_loss = 0.09081 val_acc = 0.90671
[9, 20] val_loss = 0.07763 val_acc = 0.90850
[10, 20] val_loss = 0.08697 val_acc = 0.89424
.............................................
[40, 20] val_loss = 0.05327 val_acc = 0.90018
[41, 20] val_loss = 0.05588 val_acc = 0.89780
[42, 20] val_loss = 0.05056 val_acc = 0.90612
[43, 20] val_loss = 0.05349 val_acc = 0.90731
[44, 20] val_loss = 0.05190 val_acc = 0.90255
[45, 20] val_loss = 0.05003 val_acc = 0.90790
[46, 20] val_loss = 0.05013 val_acc = 0.89899
[47, 20] val_loss = 0.04844 val_acc = 0.89067
[48, 20] val_loss = 0.04601 val_acc = 0.90969
[49, 20] val_loss = 0.05328 val_acc = 0.90255
[50, 20] val_loss = 0.04244 val_acc = 0.90196
training finish !
模型读取与预测
shuffle(imagelist)
OpenCV读取方式
OpenCV读取的方式的好处是可以方便的应用在摄像头与视频流中
import cv2
transform=transforms.Compose([transforms.ToTensor(),
transforms.Normalize(
mean=(0.485, 0.456, 0.406),
std=(0.229, 0.224, 0.225))
])
fig = plt.gcf()
fig.set_size_inches(18, 18)
vgg16.eval()
with torch.no_grad():
for i in range(9):
ax_img = plt.subplot(3, 3, i + 1)
img = cv2.imread(imagelist[i])
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.resize(img, (224, 224))
IMG = img.copy()
ax_img.imshow(IMG)
img = transform(img).unsqueeze(0)
img = img.to(device)
outputs = vgg16(img)
prediction = torch.max(outputs, 1)[1]
ax_img.set_title('Label:' \
+ imagelist[i].split('/')[-2] \
+ ' Predict:' \
+ class_names[prediction.item()],
fontsize=15)
也可以用下面一行代码进行表示,但是运算获得的tensor
需要减去均值,这里就不详细展开了
img = torch.from_numpy(cv2.resize(cv2.cvtColor(cv2.imread(imagelist[0]), cv2.COLOR_BGR2RGB), (224, 224))).unsqueeze(0).transpose(1, 3).type(torch.FloatTensor).to(device)
PIL与OpenCV对比
OpenCV归一化处理图像
img = cv2.imread(imagelist[0])
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.resize(img, (224, 224))
transform=transforms.Compose([transforms.ToTensor(),
transforms.Normalize(
mean=(0.485, 0.456, 0.406),
std=(0.229, 0.224, 0.225))
])
img = transform(img)
img
tensor([[[-2.0837, -1.8953, -1.8953, ..., -0.4397, -0.4397, -0.3541],
[-2.0837, -1.8439, -1.8268, ..., -0.7993, -0.6281, -0.4397],
[-2.0837, -1.8268, -1.7925, ..., -0.7137, -0.5938, -0.5596],
...,
[-2.1179, -2.1179, -2.1179, ..., -1.4500, -1.6555, -1.7412],
[-2.1179, -2.1179, -2.1179, ..., -1.4843, -1.7069, -1.7583],
[-2.1179, -2.1179, -2.1179, ..., -1.5699, -1.7412, -1.7754]],
[[-2.0007, -1.7906, -1.7906, ..., -0.4776, -0.4776, -0.3901],
[-2.0007, -1.7206, -1.7206, ..., -0.8102, -0.6352, -0.4426],
[-2.0007, -1.7206, -1.6856, ..., -0.6527, -0.5301, -0.5301],
...,
[-2.0357, -2.0357, -2.0357, ..., -1.3179, -1.5280, -1.6155],
[-2.0357, -2.0357, -2.0357, ..., -1.3529, -1.5805, -1.6331],
[-2.0357, -2.0357, -2.0357, ..., -1.4405, -1.6155, -1.6331]],
[[-1.6999, -1.4907, -1.4907, ..., 0.2348, 0.2522, 0.3393],
[-1.6999, -1.4384, -1.4210, ..., -0.0441, 0.1302, 0.3219],
[-1.7347, -1.4559, -1.4210, ..., 0.1128, 0.2348, 0.2696],
...,
[-1.8044, -1.8044, -1.8044, ..., -1.1421, -1.3513, -1.4384],
[-1.8044, -1.8044, -1.8044, ..., -1.1770, -1.4036, -1.4559],
[-1.8044, -1.8044, -1.8044, ..., -1.2641, -1.4384, -1.4733]]])
img.size()
torch.Size([3, 224, 224])
PIL归一化处理图像
img = Image.open(imagelist[0])
transform=transforms.Compose([transforms.Resize((224, 224), interpolation=2),
transforms.ToTensor(),
transforms.Normalize(
mean=(0.485, 0.456, 0.406),
std=(0.229, 0.224, 0.225))
])
img = transform(img)
img
tensor([[[-2.0494, -1.8953, -1.8782, ..., -0.5082, -0.4739, -0.3883],
[-2.0494, -1.8439, -1.8097, ..., -0.7650, -0.6109, -0.4568],
[-2.0494, -1.8268, -1.7754, ..., -0.7137, -0.5938, -0.5596],
...,
[-2.1179, -2.1179, -2.1179, ..., -1.4672, -1.6555, -1.7240],
[-2.1179, -2.1179, -2.1179, ..., -1.4843, -1.7069, -1.7583],
[-2.1179, -2.1179, -2.1179, ..., -1.5699, -1.7412, -1.7583]],
[[-1.9657, -1.7906, -1.7731, ..., -0.5301, -0.4951, -0.3901],
[-1.9482, -1.7381, -1.7031, ..., -0.7577, -0.6001, -0.4426],
[-1.9482, -1.7206, -1.6681, ..., -0.6702, -0.5476, -0.5301],
...,
[-2.0357, -2.0357, -2.0357, ..., -1.3354, -1.5280, -1.5980],
[-2.0357, -2.0357, -2.0357, ..., -1.3529, -1.5805, -1.6331],
[-2.0357, -2.0357, -2.0357, ..., -1.4405, -1.6155, -1.6331]],
[[-1.6650, -1.4907, -1.4733, ..., 0.1999, 0.2348, 0.3393],
[-1.6476, -1.4384, -1.4210, ..., -0.0092, 0.1651, 0.3045],
[-1.6824, -1.4559, -1.4036, ..., 0.1128, 0.2348, 0.2522],
...,
[-1.8044, -1.8044, -1.8044, ..., -1.1596, -1.3513, -1.4210],
[-1.8044, -1.8044, -1.8044, ..., -1.1770, -1.4036, -1.4559],
[-1.8044, -1.8044, -1.8044, ..., -1.2641, -1.4384, -1.4559]]])
结论:可以发现两者之间具有一定的差距,这是因为在运算时,由于计算机运算时保存的位数有限,尽管运算法则相同,但是运算的优先顺序不同,计算出来的值就可能不一样,比如 a × b / c a×b/c a×b/c与 a / c × b a/c×b a/c×b所获取的结果就不同
另外,OpenCV与Pytorch中内嵌PIL函数在执行resize操作时都默认使用的是双线性插值,其默认值可以在各自的官网查询得到