- 什么是迁移学习?
在深度神经网络 算法的应用过程中,如果我们面对的是数据规模较大的问题,那么在搭 建好深度神经网络模型后,我们势必要花费大量的算力和时间去训练模 型和优化参数,最后耗费了这么多资源得到的模型只能解决这一个问 题,性价比非常低。如果我们用这么多资源训练的模型能够解决同一类 问题,那么模型的性价比会提高很多,这就促使使用迁移模型解决同一 类问题的方法出现。因为该方法的出现,我们通过对一个训练好的模型 进行细微调整,就能将其应用到相似的问题中,最后还能取得很好的效 果;另外,对于原始数据较少的问题,我们也能够通过采用迁移模型进 行有效解决,所以,如果能够选取合适的迁移学习方法,则会对解决我 们所面临的问题有很大的帮助。
下面我们以Kaggle网站上的“Dogs vs.Cats”竞赛项目为例,对迁移学习进行学习。此数据集以免费公开。在这个数据集的训练数据集中一共有 25000张猫和狗的图片,其中包含12500张猫的图片和12500张狗的图 片。在测试数据集中有12500张图片,不过其中的猫狗图片是无序混杂 的,而且没有对应的标签。这些数据集将被用于对模型进行训练和对参 数进行优化,以及在最后对模型的泛化能力进行验证。 数据下载链接:https://download.csdn.net/download/Galen_xia/12888565
1 .数据准备
数据集下载完后,我们从猫狗图片中各抽取2000张,共4000张作为验证数据集。文件结构如下:
|-DvC
|-train
|- cat
|- dog
|-valid
|- cat
|- dog
2 .数据预处理、载入和预览
import torch
from torchvision import datasets,transforms
import os
import time
from torch.autograd import Variable
data_dir = 'DvC'
data_transform = {
x:transforms.Compose([transforms.Resize([64,64]),transforms.ToTensor()]) for x in ['train','valid']
}
image_datasets = {
x:datasets.ImageFolder(root=os.path.join(data_dir,x),transform = data_transform[x]) for x in ['train','valid']
}
dataloader = {
x:torch.utils.data.DataLoader(dataset=image_datasets[x],batch_size=16,shuffle=True) for x in ['train','valid']
}
在进行数据的载入时我们使用torch.transforms中的Resize类将原始图 片的大小统一缩放至64×64。在以上代码中对数据的变换和导入都使用 了字典的形式,因为我们需要分别对训练数据集和验证数据集的数据载 入方法进行简单定义,所以使用字典可以简化代码,也方便之后进行相 应的调用和操作。
os.path.join就是来自之前提到的 os包的方法,它的作用是将输入参数中的两个名字拼接成一个完整的文件路径。其他常用的os.path类方法查看下面这个链接:
https://blog.csdn.net/Galen_xia/article/details/108800213
下面获取一个批次的数据并进行数据预览和分析,代码如下:
x_example,y_example = next(iter(dataloader['train']))
print ('x_example个数{}'.format(len(x_example)))
print ('y_example个数{}'.format(len(y_example)))
x_example个数16
y_example个数16
以上代码通过next和iter迭代操作获取一个批次的装载数据。
x_example是Tensor数据类型的变量,因为做了图片大小的缩放变换,所以现在图片的大小全部是64×64了,那么X_example的维度就是(16, 3, 64, 64),16代表在这个批次中有16张图片;3代表色彩通道数,因为原始图片是彩色的,所以使用了R、G、B这三个通道;64代表图片的宽度值和高度值。
y_example也是Tensor数据类型的变量,不过其中的元素全部是0和 1。为什么会出现0和1?这是因为在进行数据装载时已经对dog文件夹和cat文件夹下的内容进行了独热编码(One-Hot Encoding),所以这时的0和1不仅是每张图片的标签,还分别对应猫的图片和狗的图片。我们可以做一个简单的打印输出,来验证这个独热编码的对应关系,代码如下:
index_classes = image_datasets['train'].class_to_idx
print (index_classes)
{‘cat’: 0, ‘dog’: 1}
不过,为了增加之后绘制的图像标签的可识别性,我们还需要通过image_datasets[“train”].classes将原始标签的结果存储在名为example_clasees的变量中。代码如下:
example_classes = image_datasets['train'].classes
print (example_classes)
[‘cat’, ‘dog’]
我们使用Matplotlib对一个批次的图片进行绘制,具体的代码如下:
import matplotlib.pyplot as plt
%matplotlib inline
img = torchvision.utils.make_grid(x_example)
img = img.numpy().transpose([1,2,0])
print ([example_classes[i.item()] for i in y_example])
plt.imshow(img)
plt.show()
[‘dog’, ‘dog’, ‘cat’, ‘cat’, ‘dog’, ‘dog’, ‘dog’, ‘dog’, ‘cat’, ‘cat’, ‘dog’, ‘dog’, ‘cat’, ‘cat’, ‘cat’, ‘dog’]
3 .模型搭建和参数优化
自定义VGGNet:基于VGG16架构来搭建一个简化版的VGGNet模型,这个简化版模型要求输入的图片大小全部缩放到64×64,而在标准的VGG16架构模型中输入的图片大小应当是224×224的;同时简化版模型删除了VGG16最后的三个卷积层和池化层,也改变了全连接层中的连接参数,这一系列的改变都是为了减少整个模型参与训练的参数数量。简化版模型的搭建代码如下:
class Models(torch.nn.Module):
def __init__(self):
super(Models, self).__init__()
self.Conv = torch.nn.Sequential(
torch.nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1),
torch.nn.ReLU(),
torch.nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(kernel_size=2, stride=2),
torch.nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),
torch.nn.ReLU(),
torch.nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(kernel_size=2, stride=2),
torch.nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1),
torch.nn.ReLU(),
torch.nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1),
torch.nn.ReLU(),
torch.nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(kernel_size=2, stride=2),
torch.nn.Conv2d(256, 512, kernel_size=3, stride=1, padding=1),
torch.nn.ReLU(),
torch.nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1),
torch.nn.ReLU(),
torch.nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(kernel_size=2, stride=2)
)
self.Classes = torch.nn.Sequential(
torch.nn.Linear(4 * 4 * 512, 1024),
torch.nn.ReLU(),
torch.nn.Dropout(p=0.5),
torch.nn.Linear(1024, 1024),
torch.nn.ReLU(),
torch.nn.Dropout(p=0.5),
torch.nn.Linear(1024, 2)
)
def forward(self, input):
x = self.Conv(input)
x = x.view(-1, 4 * 4 * 512)
x = self.Classes(x)
return x
在搭建好模型后,通过print对搭建的模型进行打印输出来显示模
型中的细节,打印输出的代码如下:
model = Models()
print (model)
Models(
(Conv): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU()
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU()
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(6): ReLU()
(7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(8): ReLU()
(9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU()
(12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): ReLU()
(14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(15): ReLU()
(16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(18): ReLU()
(19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(20): ReLU()
(21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(22): ReLU()
(23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(Classes): Sequential(
(0): Linear(in_features=8192, out_features=1024, bias=True)
(1): ReLU()
(2): Dropout(p=0.5, inplace=False)
(3): Linear(in_features=1024, out_features=1024, bias=True)
(4): ReLU()
(5): Dropout(p=0.5, inplace=False)
(6): Linear(in_features=1024, out_features=2, bias=True)
)
)
然后,定义好模型的损失函数和对参数进行优化的优化函数,代码
如下:
loss_f = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-5)
epoch_n = 10
time_open = time.time()
for epoch in range(epoch_n):
print('Epoch {}/{}'.format(epoch + 1, epoch_n))
print('----' * 10)
for phase in ['train', 'valid']:
if phase == 'train':
print('Training...')
model.train(True)
else:
print('Validing...')
model.train(False)
running_loss = 0.0
running_corrects = 0
for batch, data in enumerate(dataloader[phase], 1):
x, y = data
x, y = Variable(x), Variable(y)
y_pred = model(x)
_, pred = torch.max(y_pred.data, 1)
optimizer.zero_grad()
loss = loss_f(y_pred, y)
if phase == 'train':
loss.backward()
optimizer.step()
running_loss += loss.data
running_corrects += torch.sum(pred == y.data)
if batch % 500 == 0 and phase == 'train':
print('Batch {},Train Loss:{},Train ACC:{}'.format(
batch, running_loss / batch, 100 * running_corrects / (16 * batch)))
epoch_loss = running_loss * 16 / len(image_datasets[phase])
epoch_acc = 100 * running_corrects / len(image_datasets[phase])
print('{} Loss:{} ACC:{}'.format(phase, epoch_loss, epoch_acc))
time_end = time.time() - time_open
print(time_end)
在代码中优化函数使用的是Adam,损失函数使用的是交叉熵,训练次数总共是10 次,最后的输出结果如下:
......
Epoch 9/10
----------------------------------------
Training...
Batch 500,Train Loss:0.5421267151832581,Train ACC:72
Batch 1000,Train Loss:0.5455202460289001,Train ACC:72
train Loss:0.5441449284553528 ACC:72
Validing...
valid Loss:0.5180226564407349 ACC:74
Epoch 10/10
----------------------------------------
Training...
Batch 500,Train Loss:0.5260576009750366,Train ACC:73
Batch 1000,Train Loss:0.5251520872116089,Train ACC:73
train Loss:0.5234497785568237 ACC:73
Validing...
valid Loss:0.5083054900169373 ACC:75
15662.592585325241
虽然准确率差强人意,但因为全程使用了计算机的 CPU进行计算,所以整个过程非常耗时,下面我们对原始 代码进行适当调整,将在模型训练的过程中需要计算的参数全部迁移至 GPUs上,这个过程非常简单和方便,只需重新对这部分参数进行类型 转换就可以了,当然,在此之前,我们需要先确认GPUs硬件是否可 用,具体的代码如下:
use_gpu = torch.cuda.is_available()
print (use_gpu)
True
返回的值是True,这说明我们的GPUs已经具备了被使用的全部条件,新的训练代码如下:
use_gpu = torch.cuda.is_available()
if use_gpu:
model = Models().cuda()
else:model = Models
loss_f = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-5)
epoch_n = 10
time_open = time.time()
for epoch in range(epoch_n):
print('Epoch {}/{}'.format(epoch + 1, epoch_n))
print('----' * 10)
for phase in ['train', 'valid']:
if phase == 'train':
print('Training...')
model.train(True)
else:
print('Validing...')
model.train(False)
running_loss = 0.0
running_corrects = 0
for batch, data in enumerate(dataloader[phase], 1):
x, y = data
if use_gpu:
x, y = Variable(x.cuda()), Variable(y.cuda())
else:
x, y = Variable(x), Variable(y)
y_pred = model(x)
_, pred = torch.max(y_pred.data, 1)
optimizer.zero_grad()
loss = loss_f(y_pred, y)
if phase == 'train':
loss.backward()
optimizer.step()
running_loss += loss.data
running_corrects += torch.sum(pred == y.data)
if batch % 500 == 0 and phase == 'train':
print('Batch {},Train Loss:{},Train ACC:{}'.format(
batch, running_loss / batch, 100 * running_corrects / (16 * batch)))
epoch_loss = running_loss * 16 / len(image_datasets[phase])
epoch_acc = 100 * running_corrects / len(image_datasets[phase])
print('{} Loss:{} ACC:{}'.format(phase, epoch_loss, epoch_acc))
time_end = time.time() - time_open
print(time_end)
在以上代码中,model = model.cuda()和
x, y =Variable(x.cuda()),Variable(y.cuda())就是参与迁移至GPUs的具体代码,在进行10次训练后,输出的结果如下:
......
Epoch 9/10
----------------------------------------
Training...
Batch 500,Train Loss:0.5389402508735657,Train ACC:72
Batch 1000,Train Loss:0.5377429127693176,Train ACC:72
train Loss:0.5390774607658386 ACC:72
Validing...
valid Loss:0.5216068029403687 ACC:74
Epoch 10/10
----------------------------------------
Training...
Batch 500,Train Loss:0.5351917743682861,Train ACC:72
Batch 1000,Train Loss:0.5298460721969604,Train ACC:73
train Loss:0.5265329480171204 ACC:73
Validing...
valid Loss:0.498246431350708 ACC:75
1019.3638801574707
与之前的训练相比,耗时大幅下降,明显比使用CPU进行参数计算在效率上高出不少。
下面我把完整代码贴出来:
import torch
from torchvision import datasets,transforms
import os
import time
from torch.autograd import Variable
data_dir = 'DvC'
data_transform = {
x:transforms.Compose([transforms.Scale([64,64]),transforms.ToTensor()]) for x in ['train','valid']
}
image_datasets = {
x:datasets.ImageFolder(root=os.path.join(data_dir,x),transform = data_transform[x]) for x in ['train','valid']
}
dataloader = {
x:torch.utils.data.DataLoader(dataset=image_datasets[x],batch_size=16,shuffle=True) for x in ['train','valid']
}
class Models(torch.nn.Module):
def __init__(self):
super(Models, self).__init__()
self.Conv = torch.nn.Sequential(
torch.nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1),
torch.nn.ReLU(),
torch.nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(kernel_size=2, stride=2),
torch.nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),
torch.nn.ReLU(),
torch.nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(kernel_size=2, stride=2),
torch.nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1),
torch.nn.ReLU(),
torch.nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1),
torch.nn.ReLU(),
torch.nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(kernel_size=2, stride=2),
torch.nn.Conv2d(256, 512, kernel_size=3, stride=1, padding=1),
torch.nn.ReLU(),
torch.nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1),
torch.nn.ReLU(),
torch.nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(kernel_size=2, stride=2)
)
self.Classes = torch.nn.Sequential(
torch.nn.Linear(4 * 4 * 512, 1024),
torch.nn.ReLU(),
torch.nn.Dropout(p=0.5),
torch.nn.Linear(1024, 1024),
torch.nn.ReLU(),
torch.nn.Dropout(p=0.5),
torch.nn.Linear(1024, 2)
)
def forward(self, input):
x = self.Conv(input)
x = x.view(-1, 4 * 4 * 512)
x = self.Classes(x)
return x
use_gpu = torch.cuda.is_available()
if use_gpu:
model = Models().cuda()
else:model = Models
loss_f = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-5)
epoch_n = 10
time_open = time.time()
for epoch in range(epoch_n):
print('Epoch {}/{}'.format(epoch + 1, epoch_n))
print('----' * 10)
for phase in ['train', 'valid']:
if phase == 'train':
print('Training...')
model.train(True)
else:
print('Validing...')
model.train(False)
running_loss = 0.0
running_corrects = 0
for batch, data in enumerate(dataloader[phase], 1):
x, y = data
if use_gpu:
x, y = Variable(x.cuda()), Variable(y.cuda())
else:
x, y = Variable(x), Variable(y)
y_pred = model(x)
_, pred = torch.max(y_pred.data, 1)
optimizer.zero_grad()
loss = loss_f(y_pred, y)
if phase == 'train':
loss.backward()
optimizer.step()
running_loss += loss.data
running_corrects += torch.sum(pred == y.data)
if batch % 500 == 0 and phase == 'train':
print('Batch {},Train Loss:{},Train ACC:{}'.format(
batch, running_loss / batch, 100 * running_corrects / (16 * batch)))
epoch_loss = running_loss * 16 / len(image_datasets[phase])
epoch_acc = 100 * running_corrects / len(image_datasets[phase])
print('{} Loss:{} ACC:{}'.format(phase, epoch_loss, epoch_acc))
time_end = time.time() - time_open
print(time_end)