VGG网络介绍及Pytorch实现

最新推荐文章于 2024-01-28 19:38:16 发布

积雨辋川

最新推荐文章于 2024-01-28 19:38:16 发布

阅读量335

点赞数

文章标签： pytorch 深度学习

本文链接：https://blog.csdn.net/weixin_53065229/article/details/132612459

版权

VGG网络介绍及Pytorch实现

VGG简介

VGGNet 是牛津大学计算机视觉组（Visual Geometry Group）和谷歌 DeepMind 一起研究出来的深度卷积神经网络，因而冠名为 VGG。VGG是一种被广泛使用的卷积神经网络结构，其在在2014年的 ImageNet 大规模视觉识别挑战(ILSVRC -2014)中获得了亚军，不是VGG不够强，而是对手太强，因为当年获得冠军的是GoogLeNet。

论文下载地址：https://arxiv.org/pdf/1409.1556.pdf

VGG块

经典卷积神经网络的基本组成部分：

带填充以保持分辨率的卷积层；
非线性激活函数，如ReLU；
汇聚层，如最大汇聚层。

在最初的VGG论文中，作者使用了带有(3 \times 3)卷积核、填充为1（保持高度和宽度）的卷积层，和带有(2 \times 2)汇聚窗口、步幅为2（每个块后的分辨率减半）的最大汇聚层。

从AlexNet到VGGNet:
在这里插入图片描述
原始VGG网络有5个卷积块，其中前两个块各有一个卷积层，后三个块各包含两个卷积层。第一个模块有64个输出通道，每个后续模块将输出通道数量翻倍，直到该数字达到512。由于该网络使用8个卷积层和3个全连接层，因此它通常被称为VGG-11。

Pytorch实现

Pytorch自带有一些常见的模型，因此我们可以通过pytorch调用对应网络模型。
官方文档：https://pytorch.org/vision/stable/models.html

vgg11 = torchvision.models.vgg11()
print(vgg11)

VGG-11网络结构如下：

VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (4): ReLU(inplace=True)
    (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace=True)
    (8): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace=True)
    (10): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (11): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (12): ReLU(inplace=True)
    (13): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (14): ReLU(inplace=True)
    (15): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (16): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (17): ReLU(inplace=True)
    (18): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (19): ReLU(inplace=True)
    (20): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
  (classifier): Sequential(
    (0): Linear(in_features=25088, out_features=4096, bias=True)
    (1): ReLU(inplace=True)
    (2): Dropout(p=0.5, inplace=False)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU(inplace=True)
    (5): Dropout(p=0.5, inplace=False)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

模型修改

在实际使用中，我们可能需要对现有模型进行修改。

新增

vgg11.classifier.add_module('add_linear', nn.Linear(1000, 10))
print(vgg11.classifier)

输出结果为：

Sequential(
  (0): Linear(in_features=25088, out_features=4096, bias=True)
  (1): ReLU(inplace=True)
  (2): Dropout(p=0.5, inplace=False)
  (3): Linear(in_features=4096, out_features=4096, bias=True)
  (4): ReLU(inplace=True)
  (5): Dropout(p=0.5, inplace=False)
  (6): Linear(in_features=4096, out_features=1000, bias=True)
  (add_linear): Linear(in_features=1000, out_features=10, bias=True)
)

修改

vgg11.classifier[6] = nn.Linear(4096, 10)
print(vgg11.classifier)

输出结果为：

Sequential(
  (0): Linear(in_features=25088, out_features=4096, bias=True)
  (1): ReLU(inplace=True)
  (2): Dropout(p=0.5, inplace=False)
  (3): Linear(in_features=4096, out_features=4096, bias=True)
  (4): ReLU(inplace=True)
  (5): Dropout(p=0.5, inplace=False)
  (6): Linear(in_features=4096, out_features=10, bias=True)
)

VGG16实现CIFA10图像分类

import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import numpy as np
import matplotlib.pyplot as plt

transform_train = transforms.Compose(
    [transforms.Pad(4),
     transforms.ToTensor(),
     transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
     transforms.RandomHorizontalFlip(),
     transforms.RandomGrayscale(),
     transforms.RandomCrop(32, padding=4),
     ])

transform_test = transforms.Compose(
    [
        transforms.ToTensor(),
        transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))]
)

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

trainset = torchvision.datasets.CIFAR10(root='./data/', train=True, download=True, transform=transform_train)
trainLoader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

testset = torchvision.datasets.CIFAR10(root='./data/', train=False, download=True, transform=transform_test)
testLoader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=False)

vgg = [96, 96, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M']


class VGG(nn.Module):
    def __init__(self, vgg):
        super(VGG, self).__init__()
        self.features = self._make_layers(vgg)
        self.dense = nn.Sequential(
            nn.Linear(512, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(0.4),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(0.4),
        )
        self.classifier = nn.Linear(4096, 10)

    def forward(self, x):
        out = self.features(x)
        out = out.view(out.size(0), -1)
        out = self.dense(out)
        out = self.classifier(out)
        return out

    def _make_layers(self, vgg):
        layers = []
        in_channels = 3
        for x in vgg:
            if x == 'M':
                layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
            else:
                layers += [nn.Conv2d(in_channels, x, kernel_size=3, padding=1),
                           nn.BatchNorm2d(x),
                           nn.ReLU(inplace=True)]
                in_channels = x

        layers += [nn.AvgPool2d(kernel_size=1, stride=1)]
        return nn.Sequential(*layers)


model = VGG(vgg)
# model.load_state_dict(torch.load('CIFAR-model/VGG16.pth'))
optimizer = optim.SGD(model.parameters(), lr=0.01, weight_decay=5e-3)
loss_func = nn.CrossEntropyLoss()
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.4, last_epoch=-1)

total_times = 40
total = 0
accuracy_rate = []


def test():
    model.eval()
    correct = 0  # 预测正确的图片数
    total = 0  # 总共的图片数
    with torch.no_grad():
        for data in testLoader:
            images, labels = data
            images = images.to(device)
            outputs = model(images).to(device)
            outputs = outputs.cpu()
            outputarr = outputs.numpy()
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum()
    accuracy = 100 * correct / total
    accuracy_rate.append(accuracy)
    print(f'准确率为:{accuracy}%'.format(accuracy))


for epoch in range(total_times):
    model.train()
    model.to(device)
    running_loss = 0.0
    total_correct = 0
    total_trainset = 0

    for i, (data, labels) in enumerate(trainLoader, 0):
        data = data.to(device)
        outputs = model(data).to(device)
        labels = labels.to(device)
        loss = loss_func(outputs, labels).to(device)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
        _, pred = outputs.max(1)
        correct = (pred == labels).sum().item()
        total_correct += correct
        total_trainset += data.shape[0]
        if i % 1000 == 0 and i > 0:
            print(f"正在进行第{i}次训练, running_loss={running_loss}".format(i, running_loss))
            running_loss = 0.0
    test()
    scheduler.step()

# torch.save(model.state_dict(), 'CIFAR-model/VGG16.pth')
accuracy_rate = np.array(accuracy_rate)
times = np.linspace(1, total_times, total_times)
plt.xlabel('times')
plt.ylabel('accuracy rate')
plt.plot(times, accuracy_rate)
plt.show()

print(accuracy_rate)

运行结果如下：
在这里插入图片描述
准确率为：

[50.66 51.24 59.11 62.97 65.52 79.23 81.28 78.06 82.6  69.56 86.08 84.69
 84.23 82.41 86.03 87.25 87.71 87.7  87.24 87.91 88.43 88.28 88.3  88.4
 88.2  88.43 88.41 88.5  88.35 88.31 88.32 88.41 88.54 88.54 88.47 88.37
 88.36 88.51 88.46 88.51]

积雨辋川

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
VGG网络介绍及Pytorch实现

VGGNet 是牛津大学计算机视觉组（Visual Geometry Group）和谷歌 DeepMind 一起研究出来的深度卷积神经网络，因而冠名为 VGG。VGG是一种被广泛使用的卷积神经网络结构，其在在2014年的 ImageNet 大规模视觉识别挑战(ILSVRC -2014)中获得了亚军，不是VGG不够强，而是对手太强，因为当年获得冠军的是GoogLeNet。论文下载地址：https://arxiv.org/pdf/1409.1556.pdf。
复制链接

扫一扫