学习笔记(2)VGG16

学习笔记(2)VGG16

(更详细的请查看论文原文

VGG16简介

  • VGGNet由牛津大学计算机视觉组合和Google DeepMind公司研究员一起研发的深度卷积神经网络。它探索了卷积神经网络的深度和其性能之间的关系,通过反复的堆叠33的小型卷积核和22的最大池化层,成功的构建了16~19层深的卷积神经网络。VGGNet获得了ILSVRC 2014年比赛的亚军和定位项目的冠军,在top5上的错误率为7.5%。目前为止,VGGNet依然被用来提取图像的特征。
  • VGG在文章《Very deep convolutional networks for large-scale image recognition》中提出,为了解决ImageNet大赛上1000类图像分类和定位问题,在网络深度不断加深的过程中,文章的实验表明,16层和19层在该任务上效果最好。

网络结构

网络特点

  1. 小卷积核:相比AlexNet,将卷积核全部替换为 3 × 3,极少用了 1 × 1 ;
  2. 小池化层:相比AlexNe, 3 × 3 的池化核全部换为 2 × 2 的池化核;
  3. 层数更深:VGG16为例, 3 → 64 → 126 → 256 → 512 卷积核专注于扩大通道数,3个通道的特征经过经过卷积层的提取扩散到了512个通道;
  4. 特征图更窄:VGG16为例, 224 → 112 → 56 → 28 → 14 → 7 ,池化层专注于缩小宽和高;
  5. 全连接转 1 × 1 卷积:测试阶段可以接收任意宽或高为的输入

VGG16结构图

VGG16结构图:
在这里插入图片描述
ConvNet配置:
表1:ConvNet配置(以列显示)。随着更多的层被添加,配置的深度从左(A)增加到右(E)(添加的层以粗体显示)。卷积层参数表示为“conv⟨感受野大小⟩-通道数⟩”。为了简洁起见,不显示ReLU激活功能。
表1:ConvNet配置Table 2: Number of parameters (in millions).
表2在表2中,我们报告了每个配置的参数数量。尽管深度很大,我们的网络中权重数量并不大于具有更大卷积层宽度和感受野的较浅网络中的权重数量。

基于VGG16实现cifar10分类

基于Pytorch框架

# Pytorch 0.4.0 VGG16实现cifar10分类.
# @Time: 2018/6/23
# @Author: xfLi

import torch
import torch.nn as nn
import math
import torchvision.transforms as transforms
import torchvision as tv
from torch.utils.data import DataLoader

model_path = './model_pth/vgg16_bn-6c64b313.pth'
BATCH_SIZE = 1
LR = 0.01
EPOCH = 1


class VGG(nn.Module):
    def __init__(self, features, num_classes=10):  # 类
        super(VGG, self).__init__()
        # 网络结构(仅包含卷积层和池化层,不包含分类器)
        self.features = features
        self.classifier = nn.Sequential(  # 分类器结构
            # fc6
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(),
            nn.Dropout(),

            # fc7
            nn.Linear(4096, 4096),
            nn.ReLU(),
            nn.Dropout(),

            # fc8
            nn.Linear(4096, num_classes))
        # 初始化权重
        self._initialize_weights()

    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), -1)
        x = self.classifier(x)
        return x

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
                if m.bias is not None:
                    m.bias.data.zero_()
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()
            elif isinstance(m, nn.Linear):
                m.weight.data.normal_(0, 0.01)
                m.bias.data.zero_()


cfg = [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M']


# 生成网络每层的信息
def make_layers(cfg, batch_norm=False):
    layers = []
    in_channels = 3
    for v in cfg:
        if v == 'M':
            layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
        else:
            # 设定卷积层的输出数量
            conv2d = nn.Conv2d(in_channels, v, 3, padding=1)
            if batch_norm:
                layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)]
            else:
                layers += [conv2d, nn.ReLU(inplace=True)]
            in_channels = v
    return nn.Sequential(*layers)  # 返回一个包含了网络结构的时序容器


def vgg16(**kwargs):
    model = VGG(make_layers(cfg, batch_norm=True), **kwargs)
    # model.load_state_dict(torch.load(model_path))
    return model


def getData():  # 定义数据预处理
    transform = transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406],
                             std=[0.229, 0.224, 0.225])])
    trainset = tv.datasets.CIFAR10(root='data/', train=True, transform=transform, download=True)
    testset = tv.datasets.CIFAR10(root='data/', train=False, transform=transform, download=True)

    train_loader = DataLoader(trainset, batch_size=BATCH_SIZE, shuffle=True)
    test_loader = DataLoader(testset, batch_size=BATCH_SIZE, shuffle=False)
    classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
    return train_loader, test_loader, classes


def train():
    trainset_loader, testset_loader, _ = getData()
    net = vgg16()
    # net.train()
    print(net)

    # Loss and Optimizer
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(net.parameters(), lr=LR)

    # Train the model
    for epoch in range(1):
        for step, (inputs, labels) in enumerate(trainset_loader):
            optimizer.zero_grad()  # 梯度清零
            output = net(inputs)
            loss = criterion(output, labels)
            loss.backward()
            optimizer.step()
            if step % 10 == 9:
                acc = test(net, testset_loader)
                print('Epoch', epoch, '|step ', step, 'loss: %.4f' % loss.item(), 'test accuracy:%.4f' % acc)
    print('Finished Training')
    return net


def test(net, testdata):
    correct, total = .0, .0
    for inputs, labels in testdata:
        net.eval()
        outputs = net(inputs)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum()
    # net.train()
    return float(correct) / total


if __name__ == '__main__':
    net = train()


参考博文

Very Deep Convolutional Networks for Large-Scale Image Recognition

VGG-论文解读

【深度神经网络】三、VGG网络架构详解

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值