神经网络—VGG19（pytorch）

辽逸

已于 2024-08-29 09:30:09 修改

阅读量472

点赞数 3

文章标签：深度学习 python 机器学习

于 2024-08-29 09:28:16 首次发布

本文链接：https://blog.csdn.net/2403_86447519/article/details/141665253

版权

VGG19网络结构‌主要由19层卷积层和3层全连接层组成。VGG19的设计理念是通过增加网络的深度来提高性能，采用了连续的小卷积核（主要是3x3的卷积核）来替代大卷积核，以减少计算量和模型复杂度，同时保持或提高网络的表示能力。这种设计策略不仅提高了网络的性能，还有助于减少过拟合，使模型更加健壮。

VGG19的网络结构大致可以分为几个部分：

‌输入层‌：接受RGB图像作为输入，尺寸为(224,224,3)。
‌卷积层‌：VGG19使用了多个卷积层，其中前5层卷积层使用的卷积核大小均为3x3，并且使用了2x2的最大池化层。这些卷积层被组织成有序的层级结构，如conv1_1、conv1_2、conv2_1等。随后的卷积层继续使用3x3的卷积核，但不使用最大池化层。
‌全连接层‌：在卷积层的后面，VGG19包含了3层全连接层，用于将提取到的特征进行整合，以进行最终的分类判决。
‌激活函数‌：在每个卷积层之后，使用了ReLU作为激活函数，以增加网络的非线性。
‌最大池化层‌：在网络的不同阶段，使用了最大池化层以减小特征的维度，同时保留最重要的特征。

通过这种设计，VGG19能够在图像识别等任务上取得较好的性能，尤其是在物体识别方面，其精度显著高于之前的算法。VGG19的成功也推动了深度学习在计算机视觉领域的应用和发展‌。

VGG19的实现：

import torch
import torch.nn as nn

class VGG19(nn.Module):
    def __init__(self):
        super(VGG19, self).__init__()
        
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 64, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0),
            
            nn.Conv2d(64, 128, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(128, 128, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0),
            
            nn.Conv2d(128, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0),
            
            nn.Conv2d(256, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0),
            
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        )
        
        self.classifier = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Linear(4096, 1000)  # VGG19 最后一层有 1000 个输出，代表 ImageNet 类别数
        )

    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), -1)  # 展平
        x = self.classifier(x)
        return x

# 创建模型实例
vgg19_custom = VGG19()

# 打印模型结构
print(vgg19_custom)

PyTorch 的 torchvision 库提供了预训练的 VGG19 模型，可以直接加载和使用它：

import torch
import torchvision.models as models
import torchvision.transforms as transforms
from PIL import Image

# 加载预训练的 VGG19 模型
vgg19 = models.vgg19(pretrained=True)

# 切换到评估模式
vgg19.eval()

# 打印模型结构
print(vgg19)

# 准备图像并进行预测
def preprocess_image(image_path):
    transform = transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])
    image = Image.open(image_path).convert('RGB')
    image = transform(image).unsqueeze(0)  # 增加一个维度（batch size）
    return image

# 预测
def predict(image_path):
    image = preprocess_image(image_path)
    with torch.no_grad():
        outputs = vgg19(image)
    return outputs

# 示例
image_path = 'path_to_your_image.jpg'  # 替换为你自己的图像路径
outputs = predict(image_path)
print(outputs)

*以上内容仅供参考