[ 图像分类 ] 经典网络模型3——VGG 详解与复现

Horizon John

已于 2024-05-17 15:46:38 修改

阅读量2.1k

点赞数 2

分类专栏：经典网络模型文章标签：人工智能深度学习图像分类 VGG 神经网络

于 2022-04-15 15:16:38 首次发布

本文链接：https://blog.csdn.net/weixin_45084253/article/details/124174109

版权

经典网络模型专栏收录该内容

29 篇文章 181 订阅

订阅专栏

🤵 Author ：Horizon John

✨ 编程技巧篇：各种操作小结

🎇 机器视觉篇：会变魔术 OpenCV

💥 深度学习篇：简单入门 PyTorch

🏆 神经网络篇：经典网络模型

💻 算法篇：再忙也别忘了 LeetCode

🚀 Visual Geometry Group

Visual Geometry Group 简称 VGG，是牛津大学视觉几何组 Oxford Visual Geometry Group 的缩写；

在2014年 ImageNet 挑战赛(ILSVRC) localisation 和 classification 任务中分别获得冠军和亚军；

VGG网络的提出证明了增加 网络深度 能够在一定程度上影响网络最终的性能；

🔗 论文地址：Very Deep Convolutional Networks for Large Scale Image Recognition

VGG

🚀 VGG16 详解

🎨 VGG16 网络特点

（1）小卷积核，卷积核大多采用 3x3 大小，部分采用 1x1 大小；
（2）小池化核，池化核都采用 2x2 大小，步长为 2；
（3）网络更深，3个3x3 卷积核来代替 7x7 卷积核，2个3x3 卷积核来代替 5x5 卷积核，减小参数量；
（4）全连接转卷积，网络测试阶段将训练阶段的三个全连接替换为三个卷积；

C 中采用的1x1卷积核：增加 决策函数的非线性 的同时不影响转换层的接受域；

🎨 VGG16 网络结构

输入大小为（224 x 224 x 3）
conv1：两次卷积（3，3），输出通道（3 → 64），池化（2，2），输出（112 x 112 x 64）
conv2：两次卷积（3，3），输出通道（64 → 128），池化（2，2），输出（56 x 56 x 128）
conv3：三次卷积（3，3），输出通道（128 → 256），池化（2，2），输出（28 x 28 x 256）
conv4：三次卷积（3，3），输出通道（256 → 512），池化（2，2），输出（14 x 14 x 512）
conv5：三次卷积（3，3），输出通道（512 → 512），池化（2，2），输出（7 x 7 x 512）
FC：而后通过三次全连接层，输出（1 x num_classes）
网络测试阶段将训练阶段的三个全连接替换为三个卷积：（1 x 1 x num_classes）

VGG16 C和D的区别：在每次卷积后有没有添加 BatchNorm2d()
C：Conv2d → ReLU
D：Conv2d → BatchNorm2d → ReLU

网络结构图

VGG

🚀 VGG16 复现

# Here is the code ：

import torch
import torch.nn as nn
from torchinfo import summary

vgg_layer = {
    'VGG11': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'VGG13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'VGG16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
    'VGG19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],
}


class VGG(nn.Module):
    def __init__(self, vgg_name, num_classes=1000):
        super(VGG, self).__init__()
        self.features = self._make_layers(vgg_layer[vgg_name])
        self.avgpool = nn.AdaptiveAvgPool2d((7, 7))
        self.classifier = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, num_classes),
        )

    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        out = self.classifier(x)
        return out

    def _make_layers(self, vgg_layer):
        layers = []
        in_channels = 3
        for x in vgg_layer:
            if x == 'M':
                layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
            else:
                layers += [nn.Conv2d(in_channels, x, kernel_size=3, padding=1, bias=False),
                           nn.BatchNorm2d(x),
                           nn.ReLU(inplace=True)]
                in_channels = x
        return nn.Sequential(*layers)

def VGG11():
    return VGG('VGG11')


def VGG13():
    return VGG('VGG13')


def VGG16():
    return VGG('VGG16')


def VGG19():
    return VGG('VGG19')


def test():
    net = VGG16()
    y = net(torch.randn(1, 3, 224, 224))
    print(y.size())
    summary(net, (1, 3, 224, 224))


if __name__ == '__main__':
    test()

输出结果：

torch.Size([1, 1000])
==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
VGG                                      --                        --
├─Sequential: 1-1                        [1, 512, 7, 7]            --
│    └─Conv2d: 2-1                       [1, 64, 224, 224]         1,728
│    └─BatchNorm2d: 2-2                  [1, 64, 224, 224]         128
│    └─ReLU: 2-3                         [1, 64, 224, 224]         --
│    └─Conv2d: 2-4                       [1, 64, 224, 224]         36,864
│    └─BatchNorm2d: 2-5                  [1, 64, 224, 224]         128
│    └─ReLU: 2-6                         [1, 64, 224, 224]         --
│    └─MaxPool2d: 2-7                    [1, 64, 112, 112]         --
│    └─Conv2d: 2-8                       [1, 128, 112, 112]        73,728
│    └─BatchNorm2d: 2-9                  [1, 128, 112, 112]        256
│    └─ReLU: 2-10                        [1, 128, 112, 112]        --
│    └─Conv2d: 2-11                      [1, 128, 112, 112]        147,456
│    └─BatchNorm2d: 2-12                 [1, 128, 112, 112]        256
│    └─ReLU: 2-13                        [1, 128, 112, 112]        --
│    └─MaxPool2d: 2-14                   [1, 128, 56, 56]          --
│    └─Conv2d: 2-15                      [1, 256, 56, 56]          294,912
│    └─BatchNorm2d: 2-16                 [1, 256, 56, 56]          512
│    └─ReLU: 2-17                        [1, 256, 56, 56]          --
│    └─Conv2d: 2-18                      [1, 256, 56, 56]          589,824
│    └─BatchNorm2d: 2-19                 [1, 256, 56, 56]          512
│    └─ReLU: 2-20                        [1, 256, 56, 56]          --
│    └─Conv2d: 2-21                      [1, 256, 56, 56]          589,824
│    └─BatchNorm2d: 2-22                 [1, 256, 56, 56]          512
│    └─ReLU: 2-23                        [1, 256, 56, 56]          --
│    └─MaxPool2d: 2-24                   [1, 256, 28, 28]          --
│    └─Conv2d: 2-25                      [1, 512, 28, 28]          1,179,648
│    └─BatchNorm2d: 2-26                 [1, 512, 28, 28]          1,024
│    └─ReLU: 2-27                        [1, 512, 28, 28]          --
│    └─Conv2d: 2-28                      [1, 512, 28, 28]          2,359,296
│    └─BatchNorm2d: 2-29                 [1, 512, 28, 28]          1,024
│    └─ReLU: 2-30                        [1, 512, 28, 28]          --
│    └─Conv2d: 2-31                      [1, 512, 28, 28]          2,359,296
│    └─BatchNorm2d: 2-32                 [1, 512, 28, 28]          1,024
│    └─ReLU: 2-33                        [1, 512, 28, 28]          --
│    └─MaxPool2d: 2-34                   [1, 512, 14, 14]          --
│    └─Conv2d: 2-35                      [1, 512, 14, 14]          2,359,296
│    └─BatchNorm2d: 2-36                 [1, 512, 14, 14]          1,024
│    └─ReLU: 2-37                        [1, 512, 14, 14]          --
│    └─Conv2d: 2-38                      [1, 512, 14, 14]          2,359,296
│    └─BatchNorm2d: 2-39                 [1, 512, 14, 14]          1,024
│    └─ReLU: 2-40                        [1, 512, 14, 14]          --
│    └─Conv2d: 2-41                      [1, 512, 14, 14]          2,359,296
│    └─BatchNorm2d: 2-42                 [1, 512, 14, 14]          1,024
│    └─ReLU: 2-43                        [1, 512, 14, 14]          --
│    └─MaxPool2d: 2-44                   [1, 512, 7, 7]            --
├─AdaptiveAvgPool2d: 1-2                 [1, 512, 7, 7]            --
├─Sequential: 1-3                        [1, 1000]                 --
│    └─Linear: 2-45                      [1, 4096]                 102,764,544
│    └─ReLU: 2-46                        [1, 4096]                 --
│    └─Dropout: 2-47                     [1, 4096]                 --
│    └─Linear: 2-48                      [1, 4096]                 16,781,312
│    └─ReLU: 2-49                        [1, 4096]                 --
│    └─Dropout: 2-50                     [1, 4096]                 --
│    └─Linear: 2-51                      [1, 1000]                 4,097,000
==========================================================================================
Total params: 138,361,768
Trainable params: 138,361,768
Non-trainable params: 0
Total mult-adds (G): 15.47
==========================================================================================
Input size (MB): 0.60
Forward/backward pass size (MB): 216.83
Params size (MB): 553.45
Estimated Total Size (MB): 770.88
==========================================================================================

Horizon John

关注

2
点赞
踩
18

收藏

觉得还不错? 一键收藏
打赏
1
评论
[ 图像分类 ] 经典网络模型3——VGG 详解与复现

[ 图像分类 ] 经典网络模型3——VGG 详解与复现1、Visual Geometry Group2、VGG16 详解3、VGG16 复现Visual Geometry Group简称 VGG，是牛津大学视觉几何组 Oxford Visual Geometry Group 的缩写；在2014年 ImageNet 比赛 localisation 和 classification 任务上分别获得第一名和第二名；证明了增加网络深度能够在一定程度上影响网络最终的性能；🔗 论文地址：...
复制链接

扫一扫