ResNet介绍
简介
VGG网络试着探寻了一下深度学习网络的深度究竟可以深几许以能持续地提高分类准确率。我们的一般印象当中,深度学习愈是深(复杂,参数多)愈是有着更强的表达能力。凭着这一基本准则CNN分类网络自Alexnet的7层发展到了VGG的16乃至19层,后来更有了Googlenet的22层。可后来我们发现深度CNN网络达到一定深度后再一味地增加层数并不能带来进一步地分类性能提高,反而会招致网络收敛变得更慢,test dataset的分类准确率也变得更差。排除数据集过小带来的模型过拟合等问题后,我们发现过深的网络仍然还会使分类准确度下降(相对于较浅些的网络而言)。
上图显示了常规的CNN网络在训练集和测试集上都出现了随着网络层数的增加,误差反而增加的现象。
残差学习
ResNet的主要思想是在网络中增加了直连通道,即Highway Network的思想。此前的网络结构是对输入做一个非线性变换,而Highway Network则允许保留之前网络层的一定比例的输出。ResNet的思想和Highway Network的思想也非常类似,允许原始输入信息直接传到后面的层中,如下图所示
ResNet的网络结构
两种不同的残差单元
下面我们再分析一下残差单元,ResNet使用两种残差单元,如下图所示。左图对应的是浅层网络,而右图对应的是深层网络。对于短路连接,当输入和输出维度一致时,可以直接将输入加到输出上。但是当维度不一致时(对应的是维度增加一倍),这就不能直接相加。有两种策略:(1)采用zero-padding增加维度,此时一般要先做一个downsample,可以采用stride=2的pooling,这样不会增加参数;(2)采用新的映射(projection shortcut),一般采用1x1的卷积,这样会增加参数,也会增加计算量。短路连接除了直接使用恒等映射,当然都可以采用projection shortcut。
作者对比18-layer和34-layer的网络效果,如图7所示。可以看到普通的网络出现退化现象,但是ResNet很好的解决了退化问题。
最后展示一下ResNet网络与其他网络在ImageNet上的对比结果,如表2所示。可以看到ResNet-152其误差降到了4.49%,当采用集成模型后,误差可以降到3.57%。
ResNet的pytorch实现
import torch
import torchvision
import torch.nn as nn
import torchvision.transforms as transforms
device=torch.device('cuda' if torch.cuda.is_available else 'cpu' )
#超参数
num_epochs=80 #训练的epoch轮数
learning_rate=0.001 #学习率
#对图像的预处理
transform=transforms.Compose([
transforms.Pad(4), #对图像进行填充
transforms.RandomHorizontalFlip(), #依据概率p对PIL图片进行水平翻转,p默认为0.5
transforms.RandomCrop(32), #依据给定的size随机裁剪
transforms.ToTensor() #转换为tensor对象
])
#准备数据
train_dataset=torchvision.datasets.CIFAR10(root='./dataset',
train=True,
download=True,
transform=transform)
test_dataset=torchvision.datasets.CIFAR10(root='./dataset',
train=False,
transform=transforms.ToTensor())
#dataloader
train_loader=torch.utils.data.DataLoader(dataset=train_dataset,
batch_size=100,
shuffle=True)
test_loader=torch.utils.data.DataLoader(dataset=test_dataset,
batch_size=100,
shuffle=False)
#3x3卷积
def conv3x3(in_channels,out_channels,stride=1):
return nn.Conv2d(in_channels,out_channels,kernel_size=3,stride=stride,padding=1,bias=False)
#残差块
class ResidualBlock(nn.Module):
def __init__(self, in_channels, out_channels, stride=1, downsample=None):
super(ResidualBlock,self).__init__()
self.conv1=conv3x3(in_channels,out_channels,stride)
self.bn1=nn.BatchNorm2d(out_channels)
self.relu=nn.ReLU(inplace=True)
self.conv2=conv3x3(out_channels,out_channels)
self.bn2=nn.BatchNorm2d(out_channels)
self.downsample=downsample
def forward(self,x):
residual=x
out=self.conv1(x)
out=self.bn1(out)
out=self.relu(out)
out=self.conv2(out)
out=self.bn2(out)
if self.downsample:
residual = self.downsample(x)
out += residual
out = self.relu(out)
return out
class ResNet(nn.Module):
def __init__(self,block,layers,num_classes=10):
super(ResNet,self).__init__()
self.in_channels=16
self.conv = conv3x3(3, 16)
self.bn=nn.BatchNorm2d(16)
self.relu=nn.ReLU(inplace=True)
self.layer1 = self.make_layer(block, 16, layers[0])
self.layer2 = self.make_layer(block, 32, layers[1], 2)
self.layer3 = self.make_layer(block, 64, layers[2], 2)
self.avg_pool = nn.AvgPool2d(8)
self.fc = nn.Linear(64, num_classes)
def make_layer(self,block,out_channels,blocks,stride=1):
downsample=None
if (stride != 1) or (self.in_channels != out_channels):
downsample = nn.Sequential(
conv3x3(self.in_channels, out_channels, stride=stride),
nn.BatchNorm2d(out_channels))
layers = []
layers.append(block(self.in_channels, out_channels, stride, downsample))
self.in_channels = out_channels
for i in range(1, blocks):
layers.append(block(out_channels, out_channels))
return nn.Sequential(*layers)
def forward(self,x):
out=self.conv(x)
out=self.bn(out)
out=self.relu(out)
out=self.layer1(out)
out = self.layer2(out)
out = self.layer3(out)
out = self.avg_pool(out)
out = out.view(out.size(0), -1)
out = self.fc(out)
return out
model = ResNet(ResidualBlock, [2, 2, 2]).to(device)
#损失函数
criterion = nn.CrossEntropyLoss()
#优化器
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
# 更新学习率
def update_lr(optimizer, lr):
for param_group in optimizer.param_groups:
param_group['lr'] = lr
total_step = len(train_loader)
curr_lr = learning_rate
for epoch in range(num_epochs):
for i, (images, labels) in enumerate(train_loader):
images = images.to(device)
labels = labels.to(device)
# Forward pass
outputs = model(images)
loss = criterion(outputs, labels)
# Backward and optimize
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (i+1) % 100 == 0:
print ("Epoch [{}/{}], Step [{}/{}] Loss: {:.4f}"
.format(epoch+1, num_epochs, i+1, total_step, loss.item()))
# Decay learning rate
if (epoch+1) % 20 == 0:
curr_lr /= 3
update_lr(optimizer, curr_lr)
model.eval()
with torch.no_grad():
correct = 0
total = 0
for images, labels in test_loader:
images = images.to(device)
labels = labels.to(device)
outputs = model(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print('Accuracy of the model on the test images: {} %'.format(100 * correct / total))