Pytorch搭建VGG系列网络

最新推荐文章于 2024-07-04 09:40:43 发布

金渐层猫

最新推荐文章于 2024-07-04 09:40:43 发布

阅读量816

点赞数 1

分类专栏：深度学习小笔记

本文链接：https://blog.csdn.net/weixin_43917574/article/details/114692077

版权

深度学习小笔记专栏收录该内容

7 篇文章 1 订阅

订阅专栏

Pytorch搭建VGG系列网络

前言

前言

VGG网络采用重复堆叠的小卷积核替代大卷积核，在保证具有相同感受野的条件下，提升了网络的深度，从而提升网络特征提取的能力。
可以把VGG网络看成是数个vgg_block的堆叠，每个vgg_block由几个卷积层+ReLU层，最后加上一层池化层组成。VGG网络名称后面的数字表示整个网络中包含参数层的数量（卷积层或全连接层，不含池化层）。
这里展示了一张经典的VGG16的网络结构图，该系列其他网络结构图可以参照这里。
在这里插入图片描述

VGG块

由网络结构图可以看出，VGG块的组成规律是：连续使用数个相同的填充为1、窗口形状为3×3的卷积层后接上一个步幅为2、窗口形状为2×2的最大池化层。卷积层保持输入的高和宽不变，而池化层则对宽高减半。因此先使用vgg_block函数来实现这个基础的VGG块，它可以指定卷积层的数量和输入输出通道数：

def vgg_block(num_convs, in_channels, out_channels):
    blk = []
    for i in range(num_convs):
        if i == 0:
            blk.append(nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1))
        else:
            blk.append(nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1))
        blk.append(nn.ReLU(inplace=True))
    blk.append(nn.MaxPool2d(kernel_size=2, stride=2))  # 宽高减半
    return nn.Sequential(*blk)

VGG网络

继续由网络结构图可以看出，VGG网络由5个VGG块连接而成，然后铺平连接到三个全连接层。对于每一个VGG块，其卷积层的数量是不同的，这也是VGG系列网络结构之间的差异所在。以图示VGG16为例，5个VGG块的卷积层数量分别为(2, 2, 3, 3, 3)，再加上3个全连接层，总的参数层数量为16，因此命名为VGG16。
5个VGG块的输入输出通道数分别为：(3, 64), (64, 128), (128, 256), (256, 512), (512, 512)。
下面实现VGGNet，其中block表示VGG块，参数layers是一个列表，指定了每一个VGG块包含卷积层的数量:

class VGGNet(nn.Module):
    def __init__(self, block, layers, num_classes=1000):
        super(VGGNet, self).__init__()
        self.vgg_block_1 = block(layers[0], 3, 64)
        self.vgg_block_2 = block(layers[1], 64, 128)
        self.vgg_block_3 = block(layers[2], 128, 256)
        self.vgg_block_4 = block(layers[3], 256, 512)
        self.vgg_block_5 = block(layers[4], 512, 512)
        self.fc = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(True),
            nn.Dropout(0.5),
            nn.Linear(4096, 4096),
            nn.ReLU(True),
            nn.Dropout(0.5),
            nn.Linear(4096, num_classes)
        )
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)

    def forward(self, x):
        x = self.vgg_block_1(x)
        x = self.vgg_block_2(x)
        x = self.vgg_block_3(x)
        x = self.vgg_block_4(x)
        x = self.vgg_block_5(x)
        x = x.view(x.size(0), -1)
        return self.fc(x)

注意到，对__init__中对权重进行了合适的初始化，否则网络很难训练。

VGG系列

针对VGG系列网络的差异（每个VGG块使用的卷积层数量不同），接下来分别定义VGG11、VGG13、VGG16、VGG19网络：

def vgg11(**kwargs):
    model = VGGNet(vgg_block, [1, 1, 2, 2, 2], **kwargs)
    return model

def vgg13(**kwargs):
    model = VGGNet(vgg_block, [2, 2, 2, 2, 2], **kwargs)
    return model

def vgg16(**kwargs):
    model = VGGNet(vgg_block, [2, 2, 3, 3, 3], **kwargs)
    return model

def vgg19(**kwargs):
    model = VGGNet(vgg_block, [2, 2, 4, 4, 4], **kwargs)
    return model

以VGG16为例，用torchsummary输出一下网络结构：

if __name__ == "__main__":
    net = vgg16(num_classes=2)
    from torchsummary import summary
    net.cuda()
    summary(net, (3, 224, 224))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 224, 224]           1,792
              ReLU-2         [-1, 64, 224, 224]               0
            Conv2d-3         [-1, 64, 224, 224]          36,928
              ReLU-4         [-1, 64, 224, 224]               0
         MaxPool2d-5         [-1, 64, 112, 112]               0
            Conv2d-6        [-1, 128, 112, 112]          73,856
              ReLU-7        [-1, 128, 112, 112]               0
            Conv2d-8        [-1, 128, 112, 112]         147,584
              ReLU-9        [-1, 128, 112, 112]               0
        MaxPool2d-10          [-1, 128, 56, 56]               0
           Conv2d-11          [-1, 256, 56, 56]         295,168
             ReLU-12          [-1, 256, 56, 56]               0
           Conv2d-13          [-1, 256, 56, 56]         590,080
             ReLU-14          [-1, 256, 56, 56]               0
           Conv2d-15          [-1, 256, 56, 56]         590,080
             ReLU-16          [-1, 256, 56, 56]               0
        MaxPool2d-17          [-1, 256, 28, 28]               0
           Conv2d-18          [-1, 512, 28, 28]       1,180,160
             ReLU-19          [-1, 512, 28, 28]               0
           Conv2d-20          [-1, 512, 28, 28]       2,359,808
             ReLU-21          [-1, 512, 28, 28]               0
           Conv2d-22          [-1, 512, 28, 28]       2,359,808
             ReLU-23          [-1, 512, 28, 28]               0
        MaxPool2d-24          [-1, 512, 14, 14]               0
           Conv2d-25          [-1, 512, 14, 14]       2,359,808
             ReLU-26          [-1, 512, 14, 14]               0
           Conv2d-27          [-1, 512, 14, 14]       2,359,808
             ReLU-28          [-1, 512, 14, 14]               0
           Conv2d-29          [-1, 512, 14, 14]       2,359,808
             ReLU-30          [-1, 512, 14, 14]               0
        MaxPool2d-31            [-1, 512, 7, 7]               0
           Linear-32                 [-1, 4096]     102,764,544
             ReLU-33                 [-1, 4096]               0
          Dropout-34                 [-1, 4096]               0
           Linear-35                 [-1, 4096]      16,781,312
             ReLU-36                 [-1, 4096]               0
          Dropout-37                 [-1, 4096]               0
           Linear-38                    [-1, 2]           8,194
================================================================
Total params: 134,268,738
Trainable params: 134,268,738
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 218.58
Params size (MB): 512.19
Estimated Total Size (MB): 731.35
----------------------------------------------------------------

与网络结构图比较一致。

读取数据和训练模型

从头开始训练一个VGG16模型对热狗数据集进行分类吧：

import torch
from torch import nn, optim
from torch.utils.data import DataLoader
from torchvision import transforms, datasets, models
import time

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

train_dir = "../data/hotdog/train"
test_dir = "../data/hotdog/test"

# 将图像调整为224×224尺寸并归一化
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]
train_augs = transforms.Compose([
    transforms.RandomResizedCrop(size=224),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean, std)
])
test_augs = transforms.Compose([
    transforms.Resize(size=256),
    transforms.CenterCrop(size=224),
    transforms.ToTensor(),
    transforms.Normalize(mean, std)
])
train_set = datasets.ImageFolder(train_dir, transform=train_augs)
test_set = datasets.ImageFolder(test_dir, transform=test_augs)

batch_size = 32
train_iter = DataLoader(train_set, batch_size=batch_size, shuffle=True)
test_iter = DataLoader(test_set, batch_size=batch_size)

def train(net, train_iter, test_iter, criterion, optimizer, num_epochs):
    net = net.to(device)
    print("training on", device)
    for epoch in range(num_epochs):
        start = time.time()
        net.train()  # 训练模式
        train_loss_sum, train_acc_sum, n, batch_count = 0.0, 0.0, 0, 0
        for X, y in train_iter:
            X, y = X.to(device), y.to(device)
            optimizer.zero_grad()  # 梯度清零
            y_hat = net(X)
            loss = criterion(y_hat, y)
            loss.backward()
            optimizer.step()

            train_loss_sum += loss.cpu().item()
            train_acc_sum += (y_hat.argmax(dim=1) == y).sum().cpu().item()
            n += y.shape[0]
            batch_count += 1

        with torch.no_grad():
            net.eval()  # 评估模式
            test_acc_sum, n2 = 0.0, 0
            for X, y in test_iter:
                test_acc_sum += (net(X.to(device)).argmax(dim=1) == y.to(device)).float().sum().cpu().item()
                n2 += y.shape[0]

        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f, time %.1f sec'
              % (epoch + 1, train_loss_sum / batch_count, train_acc_sum / n, test_acc_sum / n2, time.time() - start))


from vgg import vgg16
net = vgg16(num_classes=2)

optimizer = optim.SGD(net.parameters(), lr=0.01)

loss = nn.CrossEntropyLoss()
train(net, train_iter, test_iter, loss, optimizer, num_epochs=5)

训练过程：

training on cuda
epoch 1, loss 0.6647, train acc 0.583, test acc 0.636, time 43.1 sec
epoch 2, loss 0.5443, train acc 0.750, test acc 0.824, time 43.1 sec
epoch 3, loss 0.4223, train acc 0.809, test acc 0.836, time 43.5 sec
epoch 4, loss 0.4130, train acc 0.822, test acc 0.815, time 43.3 sec
epoch 5, loss 0.3920, train acc 0.830, test acc 0.843, time 43.3 sec