通俗易懂理解VGG网络模型

花花少年

已于 2024-02-05 09:36:50 修改

阅读量2.6k

点赞数 25

分类专栏：深度学习文章标签： VGG

于 2024-01-23 08:48:00 首次发布

本文链接：https://blog.csdn.net/m0_37605642/article/details/135747233

版权

深度学习专栏收录该内容

133 篇文章 126 订阅

订阅专栏

温故而知新，可以为师矣！

一、参考资料

【论文阅读】VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

一文读懂VGG网络

二、VGG网络模型

原始论文：[1]

1. VGG简介

VGG是Oxford的 Visual Geometry Group的组提出的（大家应该能看出VGG名字的由来了）。

VGG首次探究了网络的深度对于网络预测精度的影响，发现使用尺寸更小的卷积核和更深的网络可以达到更好的预测精度，我们常听说的 VGG-16, VGG-19中16和19就是只网络的深度有16层和19层。VGGNet的结构非常清晰，在整个网络中使用的卷积核尺寸均为 3x3，这样做不仅能够大幅度的减少参数量，还相当于进行了更多的非线性映射，使模型的表达能力更强。

不同深度的VGG网络，全连接层的数量相同，不同的是卷积层的数量。例如，VGG-16 包含13个卷积层和3个全连接层，VGG-19则包含16个卷积层和3个全连接层。VGG-19 比 VGG-16 多了三个卷积层，两者并没有本质上的区别，只是网络深度不一样。

VGG版本	CNN层	FC层
VGG11	8	3
VGG13	10	3
VGG16	13	3
VGG19	16	3

2. VGG vs AlexNet

与AlexNet(ILSVRC2012)、ZFNet(ILSVRC2013)、OverFeat(ILSVRC2013)这些已经在ILSVRC2012-2013中取得优异成绩的网络结构相比，VGG并没有在第一个卷积层就使用很大的卷积核（AlexNet在第一个卷积层使用的卷积核大小为 11×11，步长为4；ZFNet和OverFeat在第一个卷积层使用的卷积核大小为7×7，步长为2）。VGG网络全部使用 3×3 的卷积核，步长均为1，并且在输入层之后，可能连续会有多个卷积层的堆叠（中间不再pooling），这样做的效果是：当有两个卷积层堆叠时，第二个卷积层的 3×3 感受野映射到输入层就是 5×5；当有3个卷积层堆叠时，第三个卷积层的 3×3 感受野映射到输入层就是 7×7。简单理解，3个步长为1的 3x3卷积核的一层层叠加作用可看成一个大小为7的感受野，也就是说3个 3x3 连续卷积相当于一个 7x7 卷积。

将大的卷积核拆分成小的卷积核搭配多个堆叠的卷积层，这样做的好处有以下几点：

增强特征学习能力。多个堆叠的卷积层可以使用多次ReLU激活函数，相比只有一层使用一次ReLU激活函数，前者使得网络对特征的学习能力更强；
降低网络参数数量。比如，3个堆叠的卷积层搭配 3×3 的卷积核的参数数量为 $3×3×3×C×C=27C^2$ ，而一个卷积层搭配 7×7 的卷积核的参数数量为 $7×7×C×C=49C^2$ ，其中C表示输入数据的通道数和卷积核的个数。

以 5×5 的卷积核为例，可通过堆叠两个 3×3 的卷积核实现 5×5 的卷积效果，具体过程如下图所示：

在这里插入图片描述

3. VGG网络结构

vgg-16

VGG包含5个卷积块(Conv block)，卷积通道数随着卷积块的增加依次变为原来的两倍，每个卷积块之后都要进行一次 max pooling 操作，网络末尾是三个全连接层。网络层次越靠后，其参数量就越大，而全连接层的参数量占整个网络的比例更高。具体的网络结构，如下图所示：

在这里插入图片描述

3.1 VGG-16网络结构

在这里插入图片描述

3.2 VGG-16输入输出尺寸

以VGG-16为例，输入图像尺寸为(3, 224, 224)，则输出特征图的尺寸为：

在这里插入图片描述

4. VGG优势

VGGNet的结构非常简洁，整个网络都使用 3x3 的卷积和 2x2 的 max pooling；
采用堆叠几个小卷积核的卷积层优于采用大卷积核的卷积层；
验证了通过不断加深网络结构可以提升性能。

三、相关经验

1. (PyTorch)代码实现

github代码：vgg.py

以 torchvision 提供的VGG-16为例，介绍VGG代码实现，其核心代码如下：

import torch
import torch.nn as nn


class VGG(nn.Module):
    def __init__(
        self, features: nn.Module, num_classes: int = 1000, init_weights: bool = True, dropout: float = 0.5
    ) -> None:
        super().__init__()
        _log_api_usage_once(self)
        self.features = features
        # 如果输入图像尺寸不为(3, 224, 224)，则输出特征图的尺寸不为(512, 7, 7), 则采用自适应池化，使得输出特征图的尺寸变为(512, 7, 7)
        self.avgpool = nn.AdaptiveAvgPool2d((7, 7))  
        self.classifier = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(True),
            nn.Dropout(p=dropout),
            nn.Linear(4096, 4096),
            nn.ReLU(True),
            nn.Dropout(p=dropout),
            nn.Linear(4096, num_classes),
        )
        if init_weights:
            for m in self.modules():
                if isinstance(m, nn.Conv2d):
                    nn.init.kaiming_normal_(m.weight, mode="fan_out", nonlinearity="relu")
                    if m.bias is not None:
                        nn.init.constant_(m.bias, 0)
                elif isinstance(m, nn.BatchNorm2d):
                    nn.init.constant_(m.weight, 1)
                    nn.init.constant_(m.bias, 0)
                elif isinstance(m, nn.Linear):
                    nn.init.normal_(m.weight, 0, 0.01)
                    nn.init.constant_(m.bias, 0)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x
    
   
cfg = {
    'A': [64,     'M', 128,      'M', 256, 256,           'M', 512, 512,           'M', 512, 512,           'M'],   # 11 layers
    'B': [64, 64, 'M', 128, 128, 'M', 256, 256,           'M', 512, 512,           'M', 512, 512,           'M'],   # 13 layers
    'D': [64, 64, 'M', 128, 128, 'M', 256, 256, 256,      'M', 512, 512, 512,      'M', 512, 512, 512,      'M'],   # 16 layers out_channels for encoder, input_channels for decoder
    'E': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],   # 19 layers
}


def make_layers(cfg: List[Union[str, int]], batch_norm: bool = False) -> nn.Sequential:
    layers: List[nn.Module] = []
    in_channels = 3
    for v in cfg:
        if v == "M":
            layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
        else:
            v = cast(int, v)
            conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
            if batch_norm:
                layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)]
            else:
                layers += [conv2d, nn.ReLU(inplace=True)]
            in_channels = v
    return nn.Sequential(*layers)

    
def _vgg(cfg: str, batch_norm: bool, weights: Optional[WeightsEnum], progress: bool, **kwargs: Any) -> VGG:
    if weights is not None:
        kwargs["init_weights"] = False
        if weights.meta["categories"] is not None:
            _ovewrite_named_param(kwargs, "num_classes", len(weights.meta["categories"]))
    model = VGG(make_layers(cfgs[cfg], batch_norm=batch_norm), **kwargs)
    if weights is not None:
        model.load_state_dict(weights.get_state_dict(progress=progress, check_hash=True))
    return model


@register_model()
@handle_legacy_interface(weights=("pretrained", VGG16_Weights.IMAGENET1K_V1))
def vgg16(*, weights: Optional[VGG16_Weights] = None, progress: bool = True, **kwargs: Any) -> VGG:
    """VGG-16 from `Very Deep Convolutional Networks for Large-Scale Image Recognition <https://arxiv.org/abs/1409.1556>`__.

    Args:
        weights (:class:`~torchvision.models.VGG16_Weights`, optional): The
            pretrained weights to use. See
            :class:`~torchvision.models.VGG16_Weights` below for
            more details, and possible values. By default, no pre-trained
            weights are used.
        progress (bool, optional): If True, displays a progress bar of the
            download to stderr. Default is True.
        **kwargs: parameters passed to the ``torchvision.models.vgg.VGG``
            base class. Please refer to the `source code
            <https://github.com/pytorch/vision/blob/main/torchvision/models/vgg.py>`_
            for more details about this class.

    .. autoclass:: torchvision.models.VGG16_Weights
        :members:
    """
    weights = VGG16_Weights.verify(weights)

    return _vgg("D", False, weights, progress, **kwargs)