CNN经典模型(3)--VGG16

最新推荐文章于 2024-09-29 20:15:10 发布

心动雨崽

最新推荐文章于 2024-09-29 20:15:10 发布

阅读量3.3k

点赞数 10

文章标签： cnn 人工智能神经网络

本文链接：https://blog.csdn.net/qq_74722169/article/details/134654675

版权

一、VGG家族

以下是VGG家族成员

conv3-64 ：是指第三层卷积后维度变成64，同样地，conv3-128指的是第三层卷积后维度变成128；
FC-4096 :指的是全连接层中有4096个节点，同样地，FC-1000为该层全连接层有1000个节点；
maxpool ：是指最大池化，在vgg16中，pooling采用的是2*2的最大池化方法
soft-max：全连接层

其性能如下：

什么是top-1跟top-5？、

top1就是你预测的label取最后概率向量里面最大的那一个作为预测结果，你的预测结果中概率最大的那个类必须是正确类别才算预测正确。
而top5就是最后概率向量最大的前五名中出现了正确概率即为预测正确。

Top-1和Top-5 error 是深度学习中评价模型预测错误率的两个指标，在VGG论文中是这样解释这两个指标的：
The former is a multi-class classification error, i.e. the proportion of incorrectly classified images; the latter is the main evaluation criterion used in ILSVRC, and is computed as the proportion of images such that the ground-truth category is outside the top-5 predicted categories.

Top-1 error 的意思是：假如模型预测某张动物图片（一只猫）的类别，且模型只输出1个预测结果，那么这一个结果正好能猜出来这个动物是只猫的概率就是Top-1正确率。猜出来的结果不是猫的概率则成为Top-1错误率。简单来说就是模型猜错的概率。

Top-5 error 的意思是：假如模型预测某张动物图片（还是刚才那只猫），但模型会输出来5个预测结果，那么这五个结果中有猫这个分类的概率成为Top-5正确率，相反，预测输出的这五个结果里没有猫这个分类的概率则成为Top-5错误率。

一般来说，Top-1和Top-5错误率越低，模型的性能也就越好。且Top-5 error 在数值上会比Top-1 error 的数值要小，毕竟从1个结果猜对的几率总会比从5个结果里猜对的几率要小！

经过速度跟精度发现VGG16和VGG19是最优化的层

二、什么是VGG16

1、定义

VGG16是一种深度卷积神经网络模型，用于图像分类和识别任务。它是由牛津大学的研究团队开发的，命名为Visual Geometry Group（VGG），并在2014年的ImageNet图像识别挑战中取得了很好的成绩。

VGG16模型具有13个卷积层和3个全连接层，总共有约138百万个可训练参数。该模型的核心思想是通过堆叠多个小尺寸的卷积核和池化层来增加网络的深度，从而提高图像特征的表示能力。它采用了相对较小的3x3卷积核和2x2最大池化核，每个卷积层后都使用了ReLU激活函数。

VGG16的结构相对简单而经典，是深度学习中常用的基准模型之一。它在图像分类任务中表现出色，能够有效地识别和区分不同的物体类别。由于其简单的结构和可扩展性，VGG16也常被用作迁移学习的基础模型，在各种计算机视觉任务中发挥重要作用，如目标检测、图像分割等。

2、网络模型

以下是VGG16的网络结构

Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         (None, 32, 32, 3)         0
_________________________________________________________________
conv1_1 (Conv2D)             (None, 32, 32, 64)        1792
_________________________________________________________________
conv1_2 (Conv2D)             (None, 32, 32, 64)        36928
_________________________________________________________________
batch_normalization_1 (Batch (None, 32, 32, 64)        256
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 16, 16, 64)        0
_________________________________________________________________
dropout_1 (Dropout)          (None, 16, 16, 64)        0
_________________________________________________________________
conv2_1 (Conv2D)             (None, 16, 16, 128)       73856
_________________________________________________________________
conv2_2 (Conv2D)             (None, 16, 16, 128)       147584
_________________________________________________________________
batch_normalization_2 (Batch (None, 16, 16, 128)       512
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 8, 8, 128)         0
_________________________________________________________________
dropout_2 (Dropout)          (None, 8, 8, 128)         0
_________________________________________________________________
conv3_1 (Conv2D)             (None, 8, 8, 256)         295168
_________________________________________________________________
conv3_2 (Conv2D)             (None, 8, 8, 256)         590080
_________________________________________________________________
conv3_3 (Conv2D)             (None, 8, 8, 256)         590080
_________________________________________________________________
batch_normalization_3 (Batch (None, 8, 8, 256)         1024
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 4, 4, 256)         0
_________________________________________________________________
dropout_3 (Dropout)          (None, 4, 4, 256)         0
_________________________________________________________________
conv4_1 (Conv2D)             (None, 4, 4, 512)         1180160
_________________________________________________________________
conv4_2 (Conv2D)             (None, 4, 4, 512)         2359808
_________________________________________________________________
conv4_3 (Conv2D)             (None, 4, 4, 512)         2359808
_________________________________________________________________
batch_normalization_4 (Batch (None, 4, 4, 512)         2048
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 2, 2, 512)         0
_________________________________________________________________
dropout_4 (Dropout)          (None, 2, 2, 512)         0
_________________________________________________________________
conv5_1 (Conv2D)             (None, 2, 2, 512)         2359808
_________________________________________________________________
conv5_2 (Conv2D)             (None, 2, 2, 512)         2359808
_________________________________________________________________
conv5_3 (Conv2D)             (None, 2, 2, 512)         2359808
_________________________________________________________________
batch_normalization_5 (Batch (None, 2, 2, 512)         2048
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 1, 1, 512)         0
_________________________________________________________________
dropout_5 (Dropout)          (None, 1, 1, 512)         0
_________________________________________________________________
flatten_1 (Flatten)          (None, 512)               0
_________________________________________________________________
dense_1 (Dense)              (None, 4096)              2101248
_________________________________________________________________
activation_1 (Activation)    (None, 4096)              0
_________________________________________________________________
dropout_6 (Dropout)          (None, 4096)              0
_________________________________________________________________
dense_2 (Dense)              (None, 10)                40970
_________________________________________________________________
activation_2 (Activation)    (None, 10)                0
=================================================================

VGG16由5层卷积层、3层全连接层、softmax输出层构成，层与层之间使用max-pooling（最大化池）分开，所有隐层的激活单元都采用ReLU函数，如图所示

输入图像尺寸为224x224x3，经64个通道为3的3x3的卷积核，步长为1，padding=same填充，卷积两次，再经ReLU激活，输出的尺寸大小为224x224x64

经max pooling（最大化池化），滤波器为2x2，步长为2，图像尺寸减半，池化后的尺寸变为112x112x64

经128个3x3的卷积核，两次卷积，ReLU激活，尺寸变为112x112x128
max pooling池化，尺寸变为56x56x128

经256个3x3的卷积核，三次卷积，ReLU激活，尺寸变为56x56x256
max pooling池化，尺寸变为28x28x256

经512个3x3的卷积核，三次卷积，ReLU激活，尺寸变为28x28x512
max pooling池化，尺寸变为14x14x512

经512个3x3的卷积核，三次卷积，ReLU，尺寸变为14x14x512
max pooling池化，尺寸变为7x7x512

然后Flatten()，将数据拉平成向量，变成一维51277=25088。

再经过两层1x1x4096，一层1x1x1000的全连接层（共三层），经ReLU激活

最后通过softmax输出1000个预测结果

AlexNet中的每个卷积层只包含一个卷积，卷积核的大小为7.7，在VGGNet中，每个卷积层包含2-4个卷积操作。卷积核的大小为3.3，卷积步长为1，池核为2*2，步长为2。VGGNET最明显的改进是减小卷积核的大小，增加卷积层的数目。

以下是对整个网络架构的逐层次分析

采用多个卷积层，用较小的卷积核代替具有卷积核的较大卷积层，一方面可以减少参数，而且作者认为它等价于更多的非线性映射，提高了拟合的表达能力。

三、使用 CRFAI-10数据集进行训练

如果按照上面的网络架构来写代码，一层一层往下递进，所示就是该网络模型

class _VGG16_(nn.Module):

    def __init__(self):
        super(_VGG16_, self).__init__()
        self.conv1_1 = nn.Conv2d(3, 64, 3)
        self.conv1_2 = nn.Conv2d(64, 64, 3, stride=1, padding=1)            # 假设输入图像的尺寸为7*224*224
        self.max_pooling_1 = nn.MaxPool2d(2, stride=2, padding=1)           # 112 * 64 * 64

        self.conv2_1 = nn.Conv2d(64, 128, 3)
        self.conv2_2 = nn.Conv2d(128, 128, 3, stride=1, padding=1)
        self.max_pooling_2 = nn.MaxPool2d(2, stride=2, padding=1)           # 56 * 128 * 128

        self.conv3_1 = nn.Conv2d(128, 256, 3)
        self.conv3_2 = nn.Conv2d(256, 256, 3, stride=1, padding=1)
        self.conv3_3 = nn.Conv2d(256, 256, 3, stride=1, padding=1)
        self.max_pooling_3 = nn.MaxPool2d(2, stride=2, padding=1)           # 28 * 256 * 256

        self.conv4_1 = nn.Conv2d(256, 512, 3)
        self.conv4_2 = nn.Conv2d(512, 512, 3, stride=1, padding=1)
        self.conv4_3 = nn.Conv2d(512, 512, 3, stride=1, padding=1)
        self.max_pooling_4 = nn.MaxPool2d(2, stride=2, padding=1)           # 14 * 512 * 512

        self.conv5_1 = nn.Conv2d(512, 512, 3)
        self.conv5_2 = nn.Conv2d(512, 512, 3, stride=1, padding=1)
        self.conv5_3 = nn.Conv2d(512, 512, 3, stride=1, padding=1)
        self.max_pooling_5 = nn.MaxPool2d(2, stride=2, padding=1)           # 7 * 512 * 512

        self.fc1 = nn.Linear(7 * 7 * 512, 4096)
        self.fc2 = nn.Linear(4096, 4096)
        self.fc3 = nn.Linear(4096, 10)

    def forward(self, x):
        x = self.conv1_1(x)
        x = F.relu(x)
        x = self.conv1_2(x)
        x = F.relu(x)
        x = self.max_pooling_1(x)

        x = self.conv2_1(x)
        x = F.relu(x)
        x = self.conv2_2(x)
        x = F.relu(x)
        x = self.max_pooling_2(x)

        x = self.conv3_1(x)
        x = F.relu(x)
        x = self.conv3_2(x)
        x = F.relu(x)
        x = self.conv3_3(x)
        x = F.relu(x)
        x = self.max_pooling_3(x)

        x = self.conv4_1(x)
        x = F.relu(x)
        x = self.conv4_2(x)
        x = F.relu(x)
        x = self.conv4_3(x)
        x = F.relu(x)
        x = self.max_pooling_4(x)

        x = self.conv5_1(x)
        x = F.relu(x)
        x = self.conv5_2(x)
        x = F.relu(x)
        x = self.conv5_3(x)
        x = F.relu(x)
        x = self.max_pooling_5(x)

        x = x.view(-1, 7 * 7 * 512)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.fc2(x)
        x = F.relu(x)
        x = self.fc3(x)

        x = F.softmax(x)

        return x

使用tensflow的完整代码，使用已经存在的VGG16的模型进行训练

import tensorflow as tf
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.applications import VGG16
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.models import Model
from tensorflow.keras.callbacks import ModelCheckpoint, TensorBoard

# 下载并加载CIFAR-10数据集
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# 对数据进行预处理，将像素值缩放到0到1之间
x_train = x_train / 255.0
x_test = x_test / 255.0

# 构建VGG16模型
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(32, 32, 3))

# 冻结VGG16的权重
for layer in base_model.layers:
    layer.trainable = False

# 在VGG16之上添加自定义的全连接层
x = Flatten()(base_model.output)
x = Dense(512, activation='relu')(x)
x = Dense(10, activation='softmax')(x)

# 创建新的模型
model = Model(base_model.input, x)

# 编译模型
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# 设置回调函数
checkpoint = ModelCheckpoint('vgg16_model.h5', save_best_only=True, save_weights_only=False, monitor='val_accuracy', mode='max')
tensorboard = TensorBoard(log_dir='./logs', histogram_freq=1)

# 训练模型
history = model.fit(x_train, y_train, batch_size=32, epochs=10, validation_data=(x_test, y_test), callbacks=[checkpoint, tensorboard])

# 打印训练结果
print("训练集准确率:", history.history['accuracy'][-1])
print("验证集准确率:", history.history['val_accuracy'][-1])

# 保存模型
model.save('vgg16_model.h5')

准确率有点低，可以自行调整参数

如下是使用训练好的参数去进行预测

import tensorflow as tf
import numpy as np
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.applications.vgg16 import preprocess_input
from tensorflow.keras.models import load_model
from PIL import Image

# 加载CIFAR-10数据集的类别标签
class_labels = [
     '狗', '蛙', '马', '船', '卡车'
    '飞机', '汽车', '鸟', '猫', '鹿',
]

# 加载训练好的VGG16模型
model = load_model('vgg16_model.h5')

# 加载待分类的图像
image_path = 'path'  # 替换为你自己的图像路径
image = Image.open(image_path)
image = image.resize((32, 32))  # 将图像调整为与训练数据相同的尺寸
image = np.array(image)
image = preprocess_input(image)  # 预处理图像数据

# 执行图像分类
predictions = model.predict(np.expand_dims(image, axis=0))
predicted_class_index = np.argmax(predictions)
predicted_class_label = class_labels[predicted_class_index]

# 输出预测结果
print("预测标签:", predicted_class_label)