学习笔记(2)VGG16
(更详细的请查看论文原文)
VGG16简介
- VGGNet由牛津大学计算机视觉组合和Google DeepMind公司研究员一起研发的深度卷积神经网络。它探索了卷积神经网络的深度和其性能之间的关系,通过反复的堆叠33的小型卷积核和22的最大池化层,成功的构建了16~19层深的卷积神经网络。VGGNet获得了ILSVRC 2014年比赛的亚军和定位项目的冠军,在top5上的错误率为7.5%。目前为止,VGGNet依然被用来提取图像的特征。
- VGG在文章《Very deep convolutional networks for large-scale image recognition》中提出,为了解决ImageNet大赛上1000类图像分类和定位问题,在网络深度不断加深的过程中,文章的实验表明,16层和19层在该任务上效果最好。
网络结构
网络特点
- 小卷积核:相比AlexNet,将卷积核全部替换为 3 × 3,极少用了 1 × 1 ;
- 小池化层:相比AlexNe, 3 × 3 的池化核全部换为 2 × 2 的池化核;
- 层数更深:VGG16为例, 3 → 64 → 126 → 256 → 512 卷积核专注于扩大通道数,3个通道的特征经过经过卷积层的提取扩散到了512个通道;
- 特征图更窄:VGG16为例, 224 → 112 → 56 → 28 → 14 → 7 ,池化层专注于缩小宽和高;
- 全连接转 1 × 1 卷积:测试阶段可以接收任意宽或高为的输入
VGG16结构图
VGG16结构图:
ConvNet配置:
表1:ConvNet配置(以列显示)。随着更多的层被添加,配置的深度从左(A)增加到右(E)(添加的层以粗体显示)。卷积层参数表示为“conv⟨感受野大小⟩-通道数⟩”。为了简洁起见,不显示ReLU激活功能。
Table 2: Number of parameters (in millions).
在表2中,我们报告了每个配置的参数数量。尽管深度很大,我们的网络中权重数量并不大于具有更大卷积层宽度和感受野的较浅网络中的权重数量。
基于VGG16实现cifar10分类
基于Pytorch框架
# Pytorch 0.4.0 VGG16实现cifar10分类.
# @Time: 2018/6/23
# @Author: xfLi
import torch
import torch.nn as nn
import math
import torchvision.transforms as transforms
import torchvision as tv
from torch.utils.data import DataLoader
model_path = './model_pth/vgg16_bn-6c64b313.pth'
BATCH_SIZE = 1
LR = 0.01
EPOCH = 1
class VGG(nn.Module):
def __init__(self, features, num_classes=10): # 类
super(VGG, self).__init__()
# 网络结构(仅包含卷积层和池化层,不包含分类器)
self.features = features
self.classifier = nn.Sequential( # 分类器结构
# fc6
nn.Linear(512 * 7 * 7, 4096),
nn.ReLU(),
nn.Dropout(),
# fc7
nn.Linear(4096, 4096),
nn.ReLU(),
nn.Dropout(),
# fc8
nn.Linear(4096, num_classes))
# 初始化权重
self._initialize_weights()
def forward(self, x):
x = self.features(x)
x = x.view(x.size(0), -1)
x = self.classifier(x)
return x
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, math.sqrt(2. / n))
if m.bias is not None:
m.bias.data.zero_()
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
elif isinstance(m, nn.Linear):
m.weight.data.normal_(0, 0.01)
m.bias.data.zero_()
cfg = [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M']
# 生成网络每层的信息
def make_layers(cfg, batch_norm=False):
layers = []
in_channels = 3
for v in cfg:
if v == 'M':
layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
else:
# 设定卷积层的输出数量
conv2d = nn.Conv2d(in_channels, v, 3, padding=1)
if batch_norm:
layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)]
else:
layers += [conv2d, nn.ReLU(inplace=True)]
in_channels = v
return nn.Sequential(*layers) # 返回一个包含了网络结构的时序容器
def vgg16(**kwargs):
model = VGG(make_layers(cfg, batch_norm=True), **kwargs)
# model.load_state_dict(torch.load(model_path))
return model
def getData(): # 定义数据预处理
transform = transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])])
trainset = tv.datasets.CIFAR10(root='data/', train=True, transform=transform, download=True)
testset = tv.datasets.CIFAR10(root='data/', train=False, transform=transform, download=True)
train_loader = DataLoader(trainset, batch_size=BATCH_SIZE, shuffle=True)
test_loader = DataLoader(testset, batch_size=BATCH_SIZE, shuffle=False)
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
return train_loader, test_loader, classes
def train():
trainset_loader, testset_loader, _ = getData()
net = vgg16()
# net.train()
print(net)
# Loss and Optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=LR)
# Train the model
for epoch in range(1):
for step, (inputs, labels) in enumerate(trainset_loader):
optimizer.zero_grad() # 梯度清零
output = net(inputs)
loss = criterion(output, labels)
loss.backward()
optimizer.step()
if step % 10 == 9:
acc = test(net, testset_loader)
print('Epoch', epoch, '|step ', step, 'loss: %.4f' % loss.item(), 'test accuracy:%.4f' % acc)
print('Finished Training')
return net
def test(net, testdata):
correct, total = .0, .0
for inputs, labels in testdata:
net.eval()
outputs = net(inputs)
_, predicted = torch.max(outputs, 1)
total += labels.size(0)
correct += (predicted == labels).sum()
# net.train()
return float(correct) / total
if __name__ == '__main__':
net = train()
参考博文
Very Deep Convolutional Networks for Large-Scale Image Recognition