1. 总述:在实践中,很少人从头开始训练整个大型神经网络,因为个人很难掌握大量的数据集,这样即使从头开始训练,得到的网络也不一定让人满意。因此,在一个非常大的数据集上与训练Convnet是很有必要的,经过预训练的ConvNet可以用来初始化也可以作为特征提取器,接下来介绍集中迁移学习的思路。
1.1ConvNet作为固定特征处理器:下载一个已经在ImageNet或者其他大型数据集上预训练的Convnet,删除最有一个全连接层,然后添加需要的连接层,这样做是因为原网络的输出可能跟你所需要的不一致,而只需修改全连接层的输出维度即可。具体实现过程后续给出。
1.2微调(fine-tuning)ConvNet,你可以有选择性的冻结已经预训练的Convnet的前几层或者保留所有层,利用自己手中的数据集对该网络进行权重更新和优化。
2. 如何确定是否应该进行fine-tuning 或者只是当做固定特征提取器?
2.1若新数据集很小,且和原始数据集很相似,选择Convnet作为固定特征提取器是更好的选择
2.2若新数据集很大,但和原始数据集很相似,可以尝试fine-tuning
2.3若新数据集很小,但和原始数据集有很大不同,最好的选择应该是保留CNN层训练一个线性分类器
2.4若新数据集很大,与原始数据集有很大不同,尽管去尝试fine-tuning以提高原网络的适应性
3.模型再训练优化:
def train_model(model,criterion,optimizer,scheduler,num_epochs=25:
#参数:model---需要优化训练的模型,criterion---损失函数计算标准,optimizer---优化函数,scheduler---学习率调度
since = time.time() best_model_wts = copy.deepcopy(model.state_dict())#复制原模型的参数 best_acc = 0.0
for epoch in range(num_epochs): print('Epoch {}/{}'.format(epoch, num_epochs - 1)) print('-' * 10) # Each epoch has a training and validation phase for phase in ['train', 'val']: #phase包含训练集和测试集,操作不一致 if phase == 'train': scheduler.step() #启动学习率衰减 model.train() # Set model to training mode else: model.eval() # Set model to evaluate mode running_loss = 0.0 running_corrects = 0 # Iterate over data. for inputs, labels in dataloaders[phase]: inputs = inputs.to(device) labels = labels.to(device) # zero the parameter gradients optimizer.zero_grad() # forward # track history if only in train with torch.set_grad_enabled(phase == 'train'): outputs = model(inputs) _, preds = torch.max(outputs, 1) loss = criterion(outputs, labels) # backward + optimize only if in training phase if phase == 'train': #对于训练集需要进行参数优化,而测试集则无此必要 loss.backward() optimizer.step() # statistics running_loss += loss.item() * inputs.size(0) #计算所有batch的损失和 running_corrects += torch.sum(preds == labels.data)#计算所有batch的精确度 epoch_loss = running_loss / dataset_sizes[phase] #处理样本总数得到平均的loss epoch_acc = running_corrects.double() / dataset_sizes[phase] print('{} Loss: {:.4f} Acc: {:.4f}'.format( phase, epoch_loss, epoch_acc)) # deep copy the model if phase == 'val' and epoch_acc > best_acc: best_acc = epoch_acc #获取测试集的最高准确率 best_model_wts = copy.deepcopy(model.state_dict()) print() time_elapsed = time.time() - since print('Training complete in {:.0f}m {:.0f}s'.format( time_elapsed // 60, time_elapsed % 60)) print('Best val Acc: {:4f}'.format(best_acc))#输出最佳准确率 # load best model weights model.load_state_dict(best_model_wts) #保留最佳准确时的参数状态 return mode
4.fine-tunning:以resnet18为例进行微调。
model_ft = models.resnet18(pretrained=True) num_ftrs = model_ft.fc.in_features model_ft.fc = nn.Linear(num_ftrs, 2) #修改全连接层,并保留前面所有层,进行优化训练 model_ft = model_ft.to(device) criterion = nn.CrossEntropyLoss() # Observe that all parameters are being optimized optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9) # Decay LR by a factor of 0.1 every 7 epochs exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)#设置学习率衰减 lr:=lr×power(gamma,epoch/step_size)
model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler, num_epochs=25)
5.Convnet作为固定特征提取器,只训练最后一层,通过require_grad=False冻结resnet18的早期层
model_conv = torchvision.models.resnet18(pretrained=True) for param in model_conv.parameters(): param.requires_grad = False # Parameters of newly constructed modules have requires_grad=True by default num_ftrs = model_conv.fc.in_features model_conv.fc = nn.Linear(num_ftrs, 2) model_conv = model_conv.to(device) criterion = nn.CrossEntropyLoss() # Observe that only parameters of final layer are being optimized as # opoosed to before. optimizer_conv = optim.SGD(model_conv.fc.parameters(), lr=0.001, momentum=0.9) # Decay LR by a factor of 0.1 every 7 epochs exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)
model_conv = train_model(model_conv, criterion, optimizer_conv, exp_lr_scheduler, num_epochs=25)