【动手复现经典模型】VGG

理论知识

AlexNet 虽然很有效,但是并没有提供一个通用的模板来指导后续的研究人员设计新的网络

经典卷积神经网络的基本组成部分是下面这个序列:

  • 带填充以保持分辨率的卷积层
  • 非线性激活函数 ReLU
  • 池化层
    在一个 VGG 块中,由一系列卷积层组成,后面再加上用于空间下采样的最大池化层,在最初的 VGG 论文中,作者使用了 3x3 卷积核,填充为 1(保持尺寸)的卷积层,和带有 2x2 池化窗口,步幅为 2(每个块后的分辨率减半)的最大池化层

与 AlexNet、LeNet 一样, VGG 网络可以分为两个部分, 第一部分主要是由卷积层和池化层组成, 第二部分主要由全连接层组成
在这里插入图片描述
原始的 VGG 网络有五个卷积块, 其中前两个块中各有一个卷积层, 后三个块中包含两个卷积层, 第一个模块中有 64 个输出通道, 后续模块将输出通道翻倍, 直到达到 512, 该网络使用了 8 个卷积和 3 个全连接层, 也称为 VGG-11

代码实现

这里用到的数据集是FashionMNIST,但是做了一些小处理,将原来28x28的图片放大到了224x224,这是因为AlexNet用在ImageNet数据集上的,仅为了简单复现,不需要用到ImageNet数据集

import torch
import torchvision
import torch.nn as nn
from torch.utils.data import DataLoader
from tqdm import tqdm
from torchinfo import summary
import matplotlib.pyplot as plt

epochs = 10
batch_size = 128
lr = 0.001
device = 'cuda:0' if torch.cuda.is_available() else "cpu"


data_trans = torchvision.transforms.Compose([torchvision.transforms.ToTensor(),torchvision.transforms.Resize((224, 224))])
# toTensor = (torchvision.transforms.ToTensor(), torchvision.transforms.Resize((224, 224)))
train_dataset = torchvision.datasets.FashionMNIST("../00data", True, data_trans, download=True)
test_dataset = torchvision.datasets.FashionMNIST("../00data", False, data_trans, download=True)
train_dataloader = DataLoader(train_dataset, batch_size, True)
test_dataloader = DataLoader(test_dataset, batch_size, True)


# VGG块
def vgg_block(num_conv, input_channal, output_channal):
    layer = []
    for _ in range(num_conv):
        layer.append(nn.Conv2d(input_channal, output_channal, 3, padding=1))
        layer.append(nn.ReLU())
        input_channal = output_channal
    layer.append(nn.MaxPool2d(2, 2))
    # return layer
    return nn.Sequential(*layer)
blocks = ((1, 16), (1, 32), (2, 64), (2, 128), (2, 128))

def vgg():
    last_output = 1
    layer = []
    for block in blocks:
        layer.append(vgg_block(block[0], last_output, block[1]))
        last_output = block[1]

    # nn.Sequential()参数传的是list的时候,需要用*转化
    return nn.Sequential(*layer, nn.Flatten(1), nn.Linear(last_output * 7 * 7, 4096), nn.ReLU(), nn.Dropout(),
                    nn.Linear(4096, 4096), nn.ReLU(), nn.Dropout(), nn.Linear(4096, 10))
vgg11 = vgg()
vgg11 = vgg11.to(device)
celoss = torch.nn.CrossEntropyLoss()
optimer = torch.optim.Adam(vgg11.parameters(), lr=lr)

train_loss_all = []
test_loss_all = []
train_acc = []
test_acc = []
for epoch in range(epochs):

    test_loss = 0.0
    train_loss = 0.0
    right = 0.0
    right_num = 0.0

    for inputs, labels in tqdm(train_dataloader):
        inputs = inputs.to(device)
        labels = labels.to(device)
        outputs = vgg11(inputs)
        loss = celoss(outputs, labels)
        train_loss += loss.detach().cpu().numpy()

        optimer.zero_grad()
        loss.backward()
        optimer.step()

        right = outputs.argmax(dim=1) == labels
        right_num += right.sum().detach().cpu().numpy()
    train_loss_all.append(train_loss / float(len(train_dataloader)))
    train_acc.append(right_num / len(train_dataset))
    with torch.no_grad():
        right = 0.0
        right_num = 0.0
        for inputs, labels in tqdm(test_dataloader):
            inputs = inputs.to(device)
            labels = labels.to(device)
            outputs = vgg11(inputs)
            loss = celoss(outputs, labels)
            test_loss += loss.detach().cpu().numpy()

            right = outputs.argmax(dim=1) == labels
            right_num += right.sum().detach().cpu().numpy()
        test_loss_all.append(test_loss / float(len(test_dataloader)))
        test_acc.append(right_num / len(test_dataset))
        print(f'eopch: {epoch + 1}, train_loss: {train_loss / len(train_dataloader)}, test_loss: {test_loss / len(test_dataloader) }, acc: {right_num / len(test_dataset) * 100}%')
x = range(1, epochs + 1)
plt.plot(x, train_loss_all, label = 'train_loss', linestyle='--')
plt.plot(x, test_loss_all, label = 'test_loss', linestyle='--')
plt.plot(x, train_acc, label = 'train_acc', linestyle='--')
plt.plot(x, test_acc, label = 'test_acc', linestyle='--')
plt.legend()
plt.show()

结合之前的博客–打印神经网络各层的输出

vgg11 = vgg()
print(summary(vgg11 , (1, 1, 224, 224)))
==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
Sequential                               [1, 10]                   --
├─Sequential: 1-1                        [1, 64, 112, 112]         --
│    └─Conv2d: 2-1                       [1, 64, 224, 224]         640
│    └─ReLU: 2-2                         [1, 64, 224, 224]         --
│    └─MaxPool2d: 2-3                    [1, 64, 112, 112]         --
├─Sequential: 1-2                        [1, 128, 56, 56]          --
│    └─Conv2d: 2-4                       [1, 128, 112, 112]        73,856
│    └─ReLU: 2-5                         [1, 128, 112, 112]        --
│    └─MaxPool2d: 2-6                    [1, 128, 56, 56]          --
├─Sequential: 1-3                        [1, 256, 28, 28]          --
│    └─Conv2d: 2-7                       [1, 256, 56, 56]          295,168
│    └─ReLU: 2-8                         [1, 256, 56, 56]          --
│    └─Conv2d: 2-9                       [1, 256, 56, 56]          590,080
│    └─ReLU: 2-10                        [1, 256, 56, 56]          --
│    └─MaxPool2d: 2-11                   [1, 256, 28, 28]          --
├─Sequential: 1-4                        [1, 512, 14, 14]          --
│    └─Conv2d: 2-12                      [1, 512, 28, 28]          1,180,160
│    └─ReLU: 2-13                        [1, 512, 28, 28]          --
│    └─Conv2d: 2-14                      [1, 512, 28, 28]          2,359,808
│    └─ReLU: 2-15                        [1, 512, 28, 28]          --
│    └─MaxPool2d: 2-16                   [1, 512, 14, 14]          --
├─Sequential: 1-5                        [1, 512, 7, 7]            --
│    └─Conv2d: 2-17                      [1, 512, 14, 14]          2,359,808
│    └─ReLU: 2-18                        [1, 512, 14, 14]          --
│    └─Conv2d: 2-19                      [1, 512, 14, 14]          2,359,808
│    └─ReLU: 2-20                        [1, 512, 14, 14]          --
│    └─MaxPool2d: 2-21                   [1, 512, 7, 7]            --
├─Flatten: 1-6                           [1, 25088]                --
├─Linear: 1-7                            [1, 4096]                 102,764,544
├─ReLU: 1-8                              [1, 4096]                 --
├─Dropout: 1-9                           [1, 4096]                 --
├─Linear: 1-10                           [1, 4096]                 16,781,312
├─ReLU: 1-11                             [1, 4096]                 --
├─Dropout: 1-12                          [1, 4096]                 --
├─Linear: 1-13                           [1, 10]                   40,970
==========================================================================================
Total params: 128,806,154
Trainable params: 128,806,154
Non-trainable params: 0
Total mult-adds (G): 7.55
==========================================================================================
Input size (MB): 0.20
Forward/backward pass size (MB): 59.47
Params size (MB): 515.22
Estimated Total Size (MB): 574.90
==========================================================================================

训练结果

在colab上用T4这块GPU,跑了10代的训练结果,可以看到:大力出奇迹!

eopch: 1, train_loss: 0.8505474812567615, test_loss: 0.38755199426337134, acc: 85.19%
eopch: 2, train_loss: 0.3051844020959919, test_loss: 0.3352961511928824, acc: 88.36%
eopch: 3, train_loss: 0.2583148484545222, test_loss: 0.26205789797668216, acc: 90.5%
eopch: 4, train_loss: 0.2272117935232262, test_loss: 0.25101515688473663, acc: 90.85%
eopch: 5, train_loss: 0.2091959472944233, test_loss: 0.24549187106799475, acc: 91.53%
eopch: 6, train_loss: 0.1918035917905475, test_loss: 0.24119293491674376, acc: 91.59%
eopch: 7, train_loss: 0.17655339621023328, test_loss: 0.24114930950388125, acc: 91.39%
eopch: 8, train_loss: 0.1605123550589405, test_loss: 0.25649984765656386, acc: 91.21%
eopch: 9, train_loss: 0.14848462215809424, test_loss: 0.23790175958147533, acc: 91.98%
eopch: 10, train_loss: 0.1359619579351406, test_loss: 0.2570774865489972, acc: 91.96%
  • 14
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值