CNN模型裁剪和迁移学习
迁移学习两种途径
In practice, very few people train an entire Convolutional Network from scratch (with random initialization), because it is relatively rare to have a dataset of sufficient size. Instead, it is common to pretrain a ConvNet on a very large dataset (e.g. ImageNet, which contains 1.2 million images with 1000 categories), and then use the ConvNet either as an initialization or a fixed feature extractor for the task of interest.
实践中,几乎没有人从0开始训练完整的卷积神经网络,这是由于缺乏足够大的数据集。相反,通常使用已经在大型数据集训练的网络(如 ImageNet等),使用预训练的卷积网络可以作为初始化任务,或者作为特征提取器。
These two major transfer learning scenarios look as follows:
Finetuning the convnet: Instead of random initializaion, we initialize the network with a pretrained network, like the one that is trained on imagenet 1000 dataset. Rest of the training looks as usual.
ConvNet as fixed feature extractor: Here, we will freeze the weights for all of the network except that of the final fully connected layer. This last fully connected layer is replaced with a new one with random weights and only this layer is trained.
这两种迁移学习的场景如下:
微调网络:使用预训练的网络进行初始化,而不是随机初始化,如采用在imagenet数据集训练过的网络,剩下的训练和通常的训练一样。(译者注:站在巨人肩膀上,对各层参数进行微调和更新,继续在新任务中训练)。
卷积网络特征提取器:除全连接层外,冻结所有权重参数。将最终层替换为一个具有随机权重的全连接层,仅仅对这一层进行训练。(译者注:仅裁剪重新训练全连接层,全面的所有层都作为特征提取器)。
核心代码解释
- 微调预训练卷积网络(Finetuning the convnet)
model_ft = models.resnet18(pretrained=True) #加载预训练网络
num_ftrs = model_ft.fc.in_features #获取全连接层输入特征
model_ft.fc = nn.Linear(num_ftrs, 2) #重置全连接层
model_ft = model_ft.to(device) # 设置采用设备,Cpu or Gpu?
criterion = nn.CrossEntropyLoss() #选取损失函数
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9) # 梯度优化算法
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1) # 学习率更新
model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,
num_epochs=25) #训练及更新
以上代码重新设置了全连接层,权重参数随机初始化,微调整个网络
-
卷积网络作为特征提取器(ConvNet as fixed feature extractor)
model_conv = torchvision.models.resnet18(pretrained=True) #加载预训练模型
for param in model_conv.parameters():
param.requires_grad = False #冻结参数
num_ftrs = model_conv.fc.in_features #获取全连接层输入特征数
model_conv.fc = nn.Linear(num_ftrs, 2) #重置全连接层
model_conv = model_conv.to(device)
criterion = nn.CrossEntropyLoss()
optimizer_conv = optim.SGD(model_conv.fc.parameters(), lr=0.001, momentum=0.9)
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)
model_conv = train_model(model_conv, criterion, optimizer_conv,
exp_lr_scheduler, num_epochs=25) #训练模型
以上模型冻结原有参数的前提下,对初始化的全连接层权重参数进行训练。
测试两种方式,微调模型epoch=1,特征提取模型epoch=3下准确率相当,时间相当,两种方式没有明显差别。
可能在其他预训练模型上有差别,未完待续。