在本教程中,大家将学习如何使用迁移学习训练卷积神经网络进行图像分类。大家可以在cs231n notes阅读有关迁移学习的更多信息。
引用这些注释,
在实践中,很少有人从头开始训练整个卷积网络(使用随机初始化),因为拥有足够大小的数据集相对很少。取而代之的是,通常在非常大的数据集上对ConvNet进行预训练(例如ImageNet,其中包含120万个具有1000个类别的图像),然后将ConvNet用作初始化或固定特征提取器以完成感兴趣的任务。
这两个主要的迁移学习方案如下所示:
- 微调convnet:我们不是使用随机初始化,而是使用预训练的网络初始化网络,就像在imagenet 1000数据集上训练的网络一样。其余的训练看起来像往常一样。
- ConvNet作为固定特征提取器:在这里,我们将冻结除最终完全连接层之外的所有网络的权重。最后一个完全连接的层将替换为具有随机权重的新层,并且仅训练该层。
# License: BSD # Author: Sasank Chilamkurthy from __future__ import print_function, division import torch import torch.nn as nn import torch.optim as optim from torch.optim import lr_scheduler import numpy as np import torchvision from torchvision import datasets, models, transforms import matplotlib.pyplot as plt import time import os import copy plt.ion() # interactive mode
Load Data
我们将使用torchvision和torch.utils.data包来加载数据。
我们今天要解决的问题是训练一个模型来对蚂蚁和蜜蜂进行分类 。我们为蚂蚁和蜜蜂提供了大约120张训练图像。每个类别有75个验证图像。通常,如果从头开始训练,这是一个很小的数据集。由于我们正在使用迁移学习,因此我们应该能够很好地概括。
该数据集是imagenet的很小一部分。
注意
从here 下载数据 并将其解压缩到当前目录。
# Data augmentation and normalization for training # Just normalization for validation data_transforms = { 'train': transforms.Compose([ transforms.RandomResizedCrop(224), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]), 'val': transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]), } data_dir = 'data/hymenoptera_data' image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['train', 'val']} dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=4, shuffle=True, num_workers=4) for x in ['train', 'val']} dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']} class_names = image_datasets['train'].classes device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
Visualize a few images
让我们可视化一些训练图像以了解数据augmentations。
def imshow(inp, title=None): """Imshow for Tensor.""" inp = inp.numpy().transpose((1, 2, 0)) mean = np.array([0.485, 0.456, 0.406]) std = np.array([0.229, 0.224, 0.225]) inp = std * inp + mean inp = np.clip(inp, 0, 1) plt.imshow(inp) if title is not None: plt.title(title) plt.pause(0.001) # pause a bit so that plots are updated # Get a batch of training data inputs, classes = next(iter(dataloaders['train'])) # Make a grid from batch out = torchvision.utils.make_grid(inputs) imshow(out, title=[class_names[x] for x in classes])
Training the model
现在,让我们编写一个通用函数来训练模型。在这里,我们将说明:
- Scheduling the learning rate
- Saving the best model
在下面,parameter scheduler
是的LR调度程序对象 torch.optim.lr_scheduler
。
def train_model(model, criterion, optimizer, scheduler, num_epochs=25): since = time.time() best_model_wts = copy.deepcopy(model.state_dict()) best_acc = 0.0 for epoch in range(num_epochs): print('Epoch {}/{}'.format(epoch, num_epochs - 1)) print('-' * 10) # Each epoch has a training and validation phase for phase in ['train', 'val']: if phase == 'train': model.train() # Set model to training mode else: model.eval() # Set model to evaluate mode running_loss = 0.0 running_corrects = 0 # Iterate over data. for inputs, labels in dataloaders[phase]: inputs = inputs.to(device) labels = labels.to(device) # zero the parameter gradients optimizer.zero_grad() # forward # track history if only in train with torch.set_grad_enabled(phase == 'train'): outputs = model(inputs) _, preds = torch.max(outputs, 1) loss = criterion(outputs, labels) # backward + optimize only if in training phase if phase == 'train': loss.backward() optimizer.step() # statistics running_loss += loss.item() * inputs.size(0) running_corrects += torch.sum(preds == labels.data) if phase == 'train': scheduler.step() epoch_loss = running_loss / dataset_sizes[phase] epoch_acc = running_corrects.double() / dataset_sizes[phase] print('{} Loss: {:.4f} Acc: {:.4f}'.format( phase, epoch_loss, epoch_acc)) # deep copy the model if phase == 'val' and epoch_acc > best_acc: best_acc = epoch_acc best_model_wts = copy.deepcopy(model.state_dict()) print() time_elapsed = time.time() - since print('Training complete in {:.0f}m {:.0f}s'.format( time_elapsed // 60, time_elapsed % 60)) print('Best val Acc: {:4f}'.format(best_acc)) # load best model weights model.load_state_dict(best_model_wts) return model
Visualizing the model predictions
通用功能,显示一些图像的预测
def visualize_model(model, num_images=6): was_training = model.training model.eval() images_so_far = 0 fig = plt.figure() with torch.no_grad(): for i, (inputs, labels) in enumerate(dataloaders['val']): inputs = inputs.to(device) labels = labels.to(device) outputs = model(inputs) _, preds = torch.max(outputs, 1) for j in range(inputs.size()[0]): images_so_far += 1 ax = plt.subplot(num_images//2, 2, images_so_far) ax.axis('off') ax.set_title('predicted: {}'.format(class_names[preds[j]])) imshow(inputs.cpu().data[j]) if images_so_far == num_images: model.train(mode=was_training) return model.train(mode=was_training)
Finetuning the convnet
加载预训练的模型并重置最终的完全连接层。
model_ft = models.resnet18(pretrained=True) num_ftrs = model_ft.fc.in_features # Here the size of each output sample is set to 2. # Alternatively, it can be generalized to nn.Linear(num_ftrs, len(class_names)). model_ft.fc = nn.Linear(num_ftrs, 2) model_ft = model_ft.to(device) criterion = nn.CrossEntropyLoss() # Observe that all parameters are being optimized optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9) # Decay LR by a factor of 0.1 every 7 epochs exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)
Train and evaluate
在CPU上大约需要15-25分钟。但是在GPU上,此过程不到一分钟。
model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler, num_epochs=25)
输出:
Epoch 0/24 ---------- train Loss: 0.5303 Acc: 0.7254 val Loss: 0.1851 Acc: 0.9412 Epoch 1/24 ---------- train Loss: 0.6292 Acc: 0.7500 val Loss: 0.2400 Acc: 0.8954 Epoch 2/24 ---------- train Loss: 0.5178 Acc: 0.7951 val Loss: 0.4620 Acc: 0.8170 Epoch 3/24 ---------- train Loss: 0.6185 Acc: 0.7746 val Loss: 0.4558 Acc: 0.8105 Epoch 4/24 ---------- train Loss: 0.5016 Acc: 0.8033 val Loss: 0.2238 Acc: 0.9085 Epoch 5/24 ---------- train Loss: 0.5401 Acc: 0.7787 val Loss: 0.2541 Acc: 0.9150 Epoch 6/24 ---------- train Loss: 0.6555 Acc: 0.7746 val Loss: 0.7163 Acc: 0.8235 Epoch 7/24 ---------- train Loss: 0.6747 Acc: 0.7787 val Loss: 0.4213 Acc: 0.9150 Epoch 8/24 ---------- train Loss: 0.3433 Acc: 0.8730 val Loss: 0.4561 Acc: 0.8954 Epoch 9/24 ---------- train Loss: 0.3233 Acc: 0.8689 val Loss: 0.3941 Acc: 0.8954 Epoch 10/24 ---------- train Loss: 0.2835 Acc: 0.8730 val Loss: 0.3506 Acc: 0.9020 Epoch 11/24 ---------- train Loss: 0.2457 Acc: 0.9139 val Loss: 0.3053 Acc: 0.9216 Epoch 12/24 ---------- train Loss: 0.3091 Acc: 0.8730 val Loss: 0.3371 Acc: 0.9085 Epoch 13/24 ---------- train Loss: 0.2078 Acc: 0.9303 val Loss: 0.3243 Acc: 0.9085 Epoch 14/24 ---------- train Loss: 0.3594 Acc: 0.8648 val Loss: 0.3140 Acc: 0.9085 Epoch 15/24 ---------- train Loss: 0.2887 Acc: 0.8893 val Loss: 0.3262 Acc: 0.9085 Epoch 16/24 ---------- train Loss: 0.2889 Acc: 0.8852 val Loss: 0.3708 Acc: 0.9020 Epoch 17/24 ---------- train Loss: 0.2343 Acc: 0.9139 val Loss: 0.3128 Acc: 0.9150 Epoch 18/24 ---------- train Loss: 0.2302 Acc: 0.9098 val Loss: 0.3695 Acc: 0.9085 Epoch 19/24 ---------- train Loss: 0.2817 Acc: 0.8770 val Loss: 0.3139 Acc: 0.9150 Epoch 20/24 ---------- train Loss: 0.3369 Acc: 0.8525 val Loss: 0.3220 Acc: 0.9020 Epoch 21/24 ---------- train Loss: 0.2474 Acc: 0.8852 val Loss: 0.3119 Acc: 0.9150 Epoch 22/24 ---------- train Loss: 0.3421 Acc: 0.8689 val Loss: 0.3006 Acc: 0.9216 Epoch 23/24 ---------- train Loss: 0.3327 Acc: 0.8730 val Loss: 0.3396 Acc: 0.9085 Epoch 24/24 ---------- train Loss: 0.2350 Acc: 0.9262 val Loss: 0.3474 Acc: 0.9085 Training complete in 1m 9s Best val Acc: 0.941176 v
ConvNet as fixed feature extractor
在这里,我们需要冻结除最后一层之外的所有网络。我们需要设置requires_grad == False冻结参数,以免在backward()计算梯度。
大家可以在here的文档中了解更多信息 。
model_conv = torchvision.models.resnet18(pretrained=True) for param in model_conv.parameters(): param.requires_grad = False # Parameters of newly constructed modules have requires_grad=True by default num_ftrs = model_conv.fc.in_features model_conv.fc = nn.Linear(num_ftrs, 2) model_conv = model_conv.to(device) criterion = nn.CrossEntropyLoss() # Observe that only parameters of final layer are being optimized as # opposed to before. optimizer_conv = optim.SGD(model_conv.fc.parameters(), lr=0.001, momentum=0.9) # Decay LR by a factor of 0.1 every 7 epochs exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)
Train and evaluate
与以前的方案相比,在CPU上将花费大约一半的时间。这是可以预期的,因为对于大多数网络而言,不需要计算gradients。但是,确实需要计算forward。
model_conv = train_model(model_conv, criterion, optimizer_conv, exp_lr_scheduler, num_epochs=25)
输出:
Epoch 0/24 ---------- train Loss: 0.5981 Acc: 0.6516 val Loss: 0.1949 Acc: 0.9412 Epoch 1/24 ---------- train Loss: 0.5211 Acc: 0.7582 val Loss: 0.2598 Acc: 0.9020 Epoch 2/24 ---------- train Loss: 0.4479 Acc: 0.8074 val Loss: 0.1734 Acc: 0.9542 Epoch 3/24 ---------- train Loss: 0.3378 Acc: 0.8156 val Loss: 0.2151 Acc: 0.9281 Epoch 4/24 ---------- train Loss: 0.5440 Acc: 0.7746 val Loss: 0.2629 Acc: 0.8824 Epoch 5/24 ---------- train Loss: 0.4266 Acc: 0.7910 val Loss: 0.1794 Acc: 0.9346 Epoch 6/24 ---------- train Loss: 0.4519 Acc: 0.8238 val Loss: 0.1495 Acc: 0.9412 Epoch 7/24 ---------- train Loss: 0.3560 Acc: 0.8320 val Loss: 0.2136 Acc: 0.9346 Epoch 8/24 ---------- train Loss: 0.3114 Acc: 0.8484 val Loss: 0.1568 Acc: 0.9412 Epoch 9/24 ---------- train Loss: 0.3586 Acc: 0.8320 val Loss: 0.1704 Acc: 0.9542 Epoch 10/24 ---------- train Loss: 0.3900 Acc: 0.8484 val Loss: 0.1714 Acc: 0.9412 Epoch 11/24 ---------- train Loss: 0.3598 Acc: 0.8279 val Loss: 0.1672 Acc: 0.9412 Epoch 12/24 ---------- train Loss: 0.2808 Acc: 0.8770 val Loss: 0.1608 Acc: 0.9412 Epoch 13/24 ---------- train Loss: 0.4493 Acc: 0.7910 val Loss: 0.1614 Acc: 0.9477 Epoch 14/24 ---------- train Loss: 0.3615 Acc: 0.8361 val Loss: 0.2065 Acc: 0.9281 Epoch 15/24 ---------- train Loss: 0.3940 Acc: 0.8402 val Loss: 0.2140 Acc: 0.9346 Epoch 16/24 ---------- train Loss: 0.3014 Acc: 0.8934 val Loss: 0.1559 Acc: 0.9477 Epoch 17/24 ---------- train Loss: 0.3259 Acc: 0.8402 val Loss: 0.1603 Acc: 0.9477 Epoch 18/24 ---------- train Loss: 0.3714 Acc: 0.8525 val Loss: 0.1664 Acc: 0.9412 Epoch 19/24 ---------- train Loss: 0.3091 Acc: 0.8607 val Loss: 0.1965 Acc: 0.9346 Epoch 20/24 ---------- train Loss: 0.3468 Acc: 0.8607 val Loss: 0.1600 Acc: 0.9346 Epoch 21/24 ---------- train Loss: 0.3002 Acc: 0.8770 val Loss: 0.1672 Acc: 0.9477 Epoch 22/24 ---------- train Loss: 0.3168 Acc: 0.8443 val Loss: 0.1595 Acc: 0.9412 Epoch 23/24 ---------- train Loss: 0.3108 Acc: 0.8525 val Loss: 0.1531 Acc: 0.9477 Epoch 24/24 ---------- train Loss: 0.3812 Acc: 0.8402 val Loss: 0.1546 Acc: 0.9346 Training complete in 0m 34s Best val Acc: 0.954248 visualize_model(model_conv)
Further Learning
如果大家想了解有关迁移学习的更多信息,请查看 Quantized Transfer Learning for Computer Vision Tutorial.
接下来,给大家介绍一下租用GPU做实验的方法,我们是在智星云租用的GPU,使用体验很好。具体大家可以参考:智星云官网: http://www.ai-galaxy.cn/,淘宝店:https://shop36573300.taobao.com/公众号: 智星AI
参考文献:
https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html
https://pytorch.org/tutorials/intermediate/quantized_transfer_learning_tutorial.html
https://pytorch.org/docs/master/notes/autograd.html