Pytorch搭建VGG系列网络

前言

VGG网络采用重复堆叠的小卷积核替代大卷积核,在保证具有相同感受野的条件下,提升了网络的深度,从而提升网络特征提取的能力。
可以把VGG网络看成是数个vgg_block的堆叠,每个vgg_block由几个卷积层+ReLU层,最后加上一层池化层组成。VGG网络名称后面的数字表示整个网络中包含参数层的数量(卷积层或全连接层,不含池化层)。
这里展示了一张经典的VGG16的网络结构图,该系列其他网络结构图可以参照这里
在这里插入图片描述

VGG块

由网络结构图可以看出,VGG块的组成规律是:连续使用数个相同的填充为1、窗口形状为3×3的卷积层后接上一个步幅为2、窗口形状为2×2的最大池化层。卷积层保持输入的高和宽不变,而池化层则对宽高减半。因此先使用vgg_block函数来实现这个基础的VGG块,它可以指定卷积层的数量输入输出通道数

def vgg_block(num_convs, in_channels, out_channels):
    blk = []
    for i in range(num_convs):
        if i == 0:
            blk.append(nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1))
        else:
            blk.append(nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1))
        blk.append(nn.ReLU(inplace=True))
    blk.append(nn.MaxPool2d(kernel_size=2, stride=2))  # 宽高减半
    return nn.Sequential(*blk)

VGG网络

继续由网络结构图可以看出,VGG网络由5个VGG块连接而成,然后铺平连接到三个全连接层。对于每一个VGG块,其卷积层的数量是不同的,这也是VGG系列网络结构之间的差异所在。以图示VGG16为例,5个VGG块的卷积层数量分别为(2, 2, 3, 3, 3),再加上3个全连接层,总的参数层数量为16,因此命名为VGG16。
5个VGG块的输入输出通道数分别为:(3, 64), (64, 128), (128, 256), (256, 512), (512, 512)。
下面实现VGGNet,其中block表示VGG块,参数layers是一个列表,指定了每一个VGG块包含卷积层的数量:

class VGGNet(nn.Module):
    def __init__(self, block, layers, num_classes=1000):
        super(VGGNet, self).__init__()
        self.vgg_block_1 = block(layers[0], 3, 64)
        self.vgg_block_2 = block(layers[1], 64, 128)
        self.vgg_block_3 = block(layers[2], 128, 256)
        self.vgg_block_4 = block(layers[3], 256, 512)
        self.vgg_block_5 = block(layers[4], 512, 512)
        self.fc = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(True),
            nn.Dropout(0.5),
            nn.Linear(4096, 4096),
            nn.ReLU(True),
            nn.Dropout(0.5),
            nn.Linear(4096, num_classes)
        )
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)

    def forward(self, x):
        x = self.vgg_block_1(x)
        x = self.vgg_block_2(x)
        x = self.vgg_block_3(x)
        x = self.vgg_block_4(x)
        x = self.vgg_block_5(x)
        x = x.view(x.size(0), -1)
        return self.fc(x)

注意到,对__init__中对权重进行了合适的初始化,否则网络很难训练。

VGG系列

针对VGG系列网络的差异(每个VGG块使用的卷积层数量不同),接下来分别定义VGG11、VGG13、VGG16、VGG19网络:

def vgg11(**kwargs):
    model = VGGNet(vgg_block, [1, 1, 2, 2, 2], **kwargs)
    return model

def vgg13(**kwargs):
    model = VGGNet(vgg_block, [2, 2, 2, 2, 2], **kwargs)
    return model

def vgg16(**kwargs):
    model = VGGNet(vgg_block, [2, 2, 3, 3, 3], **kwargs)
    return model

def vgg19(**kwargs):
    model = VGGNet(vgg_block, [2, 2, 4, 4, 4], **kwargs)
    return model

以VGG16为例,用torchsummary输出一下网络结构:

if __name__ == "__main__":
    net = vgg16(num_classes=2)
    from torchsummary import summary
    net.cuda()
    summary(net, (3, 224, 224))
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 224, 224]           1,792
              ReLU-2         [-1, 64, 224, 224]               0
            Conv2d-3         [-1, 64, 224, 224]          36,928
              ReLU-4         [-1, 64, 224, 224]               0
         MaxPool2d-5         [-1, 64, 112, 112]               0
            Conv2d-6        [-1, 128, 112, 112]          73,856
              ReLU-7        [-1, 128, 112, 112]               0
            Conv2d-8        [-1, 128, 112, 112]         147,584
              ReLU-9        [-1, 128, 112, 112]               0
        MaxPool2d-10          [-1, 128, 56, 56]               0
           Conv2d-11          [-1, 256, 56, 56]         295,168
             ReLU-12          [-1, 256, 56, 56]               0
           Conv2d-13          [-1, 256, 56, 56]         590,080
             ReLU-14          [-1, 256, 56, 56]               0
           Conv2d-15          [-1, 256, 56, 56]         590,080
             ReLU-16          [-1, 256, 56, 56]               0
        MaxPool2d-17          [-1, 256, 28, 28]               0
           Conv2d-18          [-1, 512, 28, 28]       1,180,160
             ReLU-19          [-1, 512, 28, 28]               0
           Conv2d-20          [-1, 512, 28, 28]       2,359,808
             ReLU-21          [-1, 512, 28, 28]               0
           Conv2d-22          [-1, 512, 28, 28]       2,359,808
             ReLU-23          [-1, 512, 28, 28]               0
        MaxPool2d-24          [-1, 512, 14, 14]               0
           Conv2d-25          [-1, 512, 14, 14]       2,359,808
             ReLU-26          [-1, 512, 14, 14]               0
           Conv2d-27          [-1, 512, 14, 14]       2,359,808
             ReLU-28          [-1, 512, 14, 14]               0
           Conv2d-29          [-1, 512, 14, 14]       2,359,808
             ReLU-30          [-1, 512, 14, 14]               0
        MaxPool2d-31            [-1, 512, 7, 7]               0
           Linear-32                 [-1, 4096]     102,764,544
             ReLU-33                 [-1, 4096]               0
          Dropout-34                 [-1, 4096]               0
           Linear-35                 [-1, 4096]      16,781,312
             ReLU-36                 [-1, 4096]               0
          Dropout-37                 [-1, 4096]               0
           Linear-38                    [-1, 2]           8,194
================================================================
Total params: 134,268,738
Trainable params: 134,268,738
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 218.58
Params size (MB): 512.19
Estimated Total Size (MB): 731.35
----------------------------------------------------------------

与网络结构图比较一致。

读取数据和训练模型

从头开始训练一个VGG16模型对热狗数据集进行分类吧:

import torch
from torch import nn, optim
from torch.utils.data import DataLoader
from torchvision import transforms, datasets, models
import time

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

train_dir = "../data/hotdog/train"
test_dir = "../data/hotdog/test"

# 将图像调整为224×224尺寸并归一化
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]
train_augs = transforms.Compose([
    transforms.RandomResizedCrop(size=224),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean, std)
])
test_augs = transforms.Compose([
    transforms.Resize(size=256),
    transforms.CenterCrop(size=224),
    transforms.ToTensor(),
    transforms.Normalize(mean, std)
])
train_set = datasets.ImageFolder(train_dir, transform=train_augs)
test_set = datasets.ImageFolder(test_dir, transform=test_augs)

batch_size = 32
train_iter = DataLoader(train_set, batch_size=batch_size, shuffle=True)
test_iter = DataLoader(test_set, batch_size=batch_size)

def train(net, train_iter, test_iter, criterion, optimizer, num_epochs):
    net = net.to(device)
    print("training on", device)
    for epoch in range(num_epochs):
        start = time.time()
        net.train()  # 训练模式
        train_loss_sum, train_acc_sum, n, batch_count = 0.0, 0.0, 0, 0
        for X, y in train_iter:
            X, y = X.to(device), y.to(device)
            optimizer.zero_grad()  # 梯度清零
            y_hat = net(X)
            loss = criterion(y_hat, y)
            loss.backward()
            optimizer.step()

            train_loss_sum += loss.cpu().item()
            train_acc_sum += (y_hat.argmax(dim=1) == y).sum().cpu().item()
            n += y.shape[0]
            batch_count += 1

        with torch.no_grad():
            net.eval()  # 评估模式
            test_acc_sum, n2 = 0.0, 0
            for X, y in test_iter:
                test_acc_sum += (net(X.to(device)).argmax(dim=1) == y.to(device)).float().sum().cpu().item()
                n2 += y.shape[0]

        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f, time %.1f sec'
              % (epoch + 1, train_loss_sum / batch_count, train_acc_sum / n, test_acc_sum / n2, time.time() - start))


from vgg import vgg16
net = vgg16(num_classes=2)

optimizer = optim.SGD(net.parameters(), lr=0.01)

loss = nn.CrossEntropyLoss()
train(net, train_iter, test_iter, loss, optimizer, num_epochs=5)

训练过程:

training on cuda
epoch 1, loss 0.6647, train acc 0.583, test acc 0.636, time 43.1 sec
epoch 2, loss 0.5443, train acc 0.750, test acc 0.824, time 43.1 sec
epoch 3, loss 0.4223, train acc 0.809, test acc 0.836, time 43.5 sec
epoch 4, loss 0.4130, train acc 0.822, test acc 0.815, time 43.3 sec
epoch 5, loss 0.3920, train acc 0.830, test acc 0.843, time 43.3 sec
  • 1
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 3
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值