加载VGG模型进行图像分类

最新推荐文章于 2024-05-11 01:44:27 发布

qq_27481087

最新推荐文章于 2024-05-11 01:44:27 发布

阅读量1.3k

点赞数

文章标签：分类深度学习 pytorch

本文链接：https://blog.csdn.net/qq_27481087/article/details/126466627

版权

1、ImageNet数据集与VGG-16模型

ImageNet数据集是斯坦福大学从互联网上收集大量图片后，并对其进行分类整理而成的图像数据集合。在ILSVRC（ImageNet Large Scale Vision Recognition Challenge)竞赛中经常使用这一数据集。
在Pytorch中可以轻松地使用ImageNet数据集中的ILSVRC2012数据集（分数类：1000种；训练数据：120万张；验证数据：5万张；测试数据：10万张），以及各种已经训练过的神经网络连接参数和已经完成学习的模型。
VGG-16模型是在2014年的ILSVRC竞赛分类任务排名第二的卷积神经网络模型。VGG-16模型是由牛津大学的VGG团队设计的16层神经网络模型，因此也称为VGG-16模型。此外还有层数为11、13、19的VGG模型版本。

2、代码实战

2.1 软件包导入及Pytorch版本确认

import numpy as np
import json
from PIL import Image
import matplotlib.pyplot as plt


import torch
import torchvision
from torchvision import models, transforms

print('PyTorch Version: ', torch.__version__)
print('TorchVision Version: ', torchvision.__version__)

PyTorch Version: 1.9.0
TorchVision Version: 0.10.0

2.2 VGG-16模型的载入

# VGG-16已完成训练模型的载入
use_pretrained = True
net = models.vgg16(pretrained=use_pretrained)
# net = models.resnet50(pretrained=use_pretrained)
net.eval()

# 输出模型的网络结构
print(net)

VGG(
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace=True)
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(6): ReLU(inplace=True)
(7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(8): ReLU(inplace=True)
(9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): ReLU(inplace=True)
(14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(15): ReLU(inplace=True)
(16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(18): ReLU(inplace=True)
(19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(20): ReLU(inplace=True)
(21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(22): ReLU(inplace=True)
(23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(25): ReLU(inplace=True)
(26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(27): ReLU(inplace=True)
(28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(29): ReLU(inplace=True)
(30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
(classifier): Sequential(
(0): Linear(in_features=25088, out_features=4096, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.5, inplace=False)
(3): Linear(in_features=4096, out_features=4096, bias=True)
(4): ReLU(inplace=True)
(5): Dropout(p=0.5, inplace=False)
(6): Linear(in_features=4096, out_features=1000, bias=True)
)
)
从输出结果中可以看到，VGG-16模型的网络结构是由features和classifier两个模块组成的。在这两个模块中，又分别包含卷积神经网络层和全连接层。
我们可以看到， VGG-16模型虽然名字是16，但实际上由38层网络组成的，而不是16层。这是因为16层指的只是其中卷积神经网络层（13个）和全连接层（3个）的数量（其中不包含ReLU激活函数、MaxPool2d池化层和Dropout层）。
网络输入的图像尺寸是颜色通道数为3的RGB格式，图像的宽度和高度均为224像素（batch, 3, 224, 224)。

2.3 输入图片的预处理

# 输入图片的预处理类的编写
class BaseTransform():
    def __init__(self, resize, mean, std):
        self.base_transform = transforms.Compose([
            transforms.Resize(resize),
            transforms.CenterCrop(resize),
            transforms.ToTensor(),
            transforms.Normalize(mean, std)
        ])

    # 允许一个类的实例像函数一样被调用。
    # 实质上说，这意味着 x() 与 x.__call__() 是相同的
    def __call__(self, img):
        return self.base_transform(img)

2.4 显示预处理结果

# 1.读取图片
image_file_path = './data/goldenretriever-3724972_1280.jpg'

img = Image.open(image_file_path)

plt.imshow(img)
plt.show()

resize = 224
mean = (0.485, 0.456, 0.406)
std = (0.229, 0.224, 0.225)

transform = BaseTransform(resize, mean, std)

img_transformed = transform(img)

img_transformed2 = img_transformed.numpy().transpose((1, 2, 0))
img_transformed2 = np.clip(img_transformed2, 0, 1)

plt.imshow(img_transformed2)
plt.show()

在这里插入图片描述

2.5 根据输出结果预测标签的后处理

# 根据输出结果预测标签的后处理类的编写
ILSVRC_class_index = json.load(open('./data/imagenet_class_index.json', 'r'))
# print(ILSVRC_class_index)


class ILSVRCPredictor():
    def __init__(self, class_index):
        self.class_index = class_index

    def predict_max(self, out):
        maxid = np.argmax(out.detach().numpy())

        predicted_label_name = self.class_index[str(maxid)][1]

        return predicted_label_name

2.6 使用已完成学习的VGG模型对图片进行预测

predictor = ILSVRCPredictor(ILSVRC_class_index)

inputs = img_transformed.unsqueeze_(0)

out = net(inputs)

result = predictor.predict_max(out)

print('输入图像的预测结果：', result)

输入图像的预测结果： golden_retriever
可以看到程序准确的将图片归类到金毛巡回犬的类别中。