torchvision.models
里面有很多已经训练好的经典网络,我们做迁移学习的时候可以使用这些预训练的神经网络的卷积部分做特征提取器,训练时,只需迭代更新最后的全连接层,即分类器。
一、首先从 torchvision.models
白嫖一个预训练好的 AlexNet:
from torchvision import models
alexnet = models.alexnet(pretrained=True) # 初始化模型类实例,并自动下载预训练的模型参数
print(alexnet)
同大多数神经网络的总体结构一样,AlexNet 分为 features 和 classifier 两大模块。其中 features 模块负责提取特征,以卷积层为主;classifier 模块负责实现分类,以全连接层为主:
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
(1): ReLU(inplace=True)
(2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
(3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
(4): ReLU(inplace=True)
(5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
(6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(7): ReLU(inplace=True)
(8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(9): ReLU(inplace=True)
(10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(inplace=True)
(12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
(classifier): Sequential(
(0): Dropout(p=0.5, inplace=False)
(1): Linear(in_features=9216, out_features=4096, bias=True)
(2): ReLU(inplace=True)
(3): Dropout(p=0.5, inplace=False)
(4): Linear(in_features=4096, out_features=4096, bias=True)
(5): ReLU(inplace=True)
(6): Linear(in_features=4096, out_features=1000, bias=True)
)
)
如果已经有了本地的预训练权重,我们可以不重新写网络结构就能加载预训练模型:
alexnet = models.alexnet(pretrained=False) # 只初始化模型类实例,不下载参数
alexnet.load_state_dict(torch.load(r'models/alexnet.pth')) # 加载本地的预训练模型的参数
二、冻结卷积层的参数更新
上面的 AlexNet 分类器输出 1000 个分类,我们不需要。为了构造一个二元分类器,我们需要重新定义 AlexNet 的 classifier 模块。前两个全连接层的参数可以保持不变,最后一层的输出改为 2:
for param in alexnet.parameters():
# 冻结所有参数的更新
param.requires_grad = False
alexnet.classifier = nn.Sequential(
# 重新定义分类层,参数默认 requires_grad=True
nn.Dropout(),
nn.Linear(256*6*6, 4096),
nn.ReLU(inplace=True),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.ReLU(inplace=True),
nn.Linear(4096, 2),
)
循环遍历 AlexNet 中所有的参数,将参数的 requires_grad
设置为 False,这样做可以限制这些参数的更新。
而重新定义的 classifier 模块的参数则默认保持 requires_grad=True
。这样便可以保证在之后的迁移学习中,只更新全连接层的参数,而不更新特征提取层的参数。
三、迁移学习训练
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(alexnet.classifier.parameters(), lr=0.001, momentum=0.9) # 只更新分类层的参数
multisteplr = optim.lr_scheduler.MultiStepLR(optimizer, milestones=[2,3], gamma=0.65) # lr 衰减策略
训练代码实例:
CUDA = torch.cuda.is_available()
if CUDA:
alexnet = alexnet.cuda()
- data_loader(cifar-10):
trans = transforms.Compose([
transforms.ToTensor(), # 转成 Tensor 格式
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
train_set = datasets.CIFAR10(root="./datasets/cifar-10",
train=True,
transform=trans, # 原始是 PIL Image 格式
download=True)
test_set = datasets.CIFAR10(root="./datasets/cifar-10",
train=False,
transform=trans,
download=True)
train_loader = DataLoader(train_set, batch_size=64, shuffle=True)
test_loader = DataLoader(test_set, batch_size=64, shuffle=True)
- train:
def train(model, criterion, optimizer, lr_step=None, epochs=3):
for epoch in range(epochs):
running_loss = 0.0
for i, data in enumerate(train_loader, 0):
inputs, labels = data
if CUDA:
inputs, labels = inputs.cuda(), labels.cuda()
optimizer.zero_grad()
output = model(inputs)
loss = criterion(output, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
if i % 10 == 9:
print("[Epoch:%d, Batch:%d] Loss: %.3f" % (epoch+1, i+1, running_loss/100))
running_loss = 0.0
if lr_step:
lr_step.step()
print(f"Lr: {optimizer.state_dict()['param_groups'][0]['lr']}")
print("finish training.")
- test:
def test(test_loader, model):
correct = 0
total = 0
for data in test_loader:
images, labels = data
if CUDA:
images, labels = images.cuda(), labels.cuda()
model.eval()
with torch.no_grad():
outputs = model(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum()
print(f'Accuracy on the test set: {100 * correct /total}%')