手写 VGG 网络模型实现 CIFAR10 数据集分类

码流怪侠

已于 2024-12-15 23:59:41 修改

阅读量1k

点赞数 19

分类专栏：深度学习-PyTorch 文章标签： vgg 图像分类深度学习人工智能 cifar10

于 2024-12-15 23:56:46 首次发布

本文链接：https://blog.csdn.net/yanceyxin/article/details/144494577

版权

深度学习-PyTorch 专栏收录该内容

62 篇文章

订阅专栏

VGG 网络

VGG是一个经典的卷积神经网络（CNN）架构，由牛津大学的视觉几何组（Visual Geometry Group）在2014年提出。VGG网络因其简单而有效的设计而闻名，在图像识别领域取得了很好的效果。VGG网络的主要特点是：

深度：VGG网络非常深，原始的VGG网络有16层（包括卷积层和全连接层），后来简化为19层和11层的版本。
小卷积核：VGG网络只使用3x3的卷积核，这与其他网络（如AlexNet）使用的更大卷积核不同。
连续的卷积层：VGG网络在最大池化层之间堆叠多个卷积层，这样可以增加网络的深度。
全连接层：在卷积层之后，VGG网络使用几个全连接层来进一步处理特征。
激活函数：VGG网络使用ReLU作为激活函数。
归一化：VGG网络在每个卷积层之后使用局部响应归一化（Local Response Normalization, LRN），但在后来的一些实现中，这个步骤被省略了。

VGG网络的架构如下：

VGG11：11个卷积层，3个全连接层。
VGG13：13个卷积层，3个全连接层。
VGG16：16个卷积层，3个全连接层。
VGG19：19个卷积层，3个全连接层。

VGG网络的一个关键贡献是证明了网络的深度对于性能的重要性，并且通过实验表明，更深的网络可以通过堆叠简单的3x3卷积层来构建。

关于VGG网络更详细的说明和网络模型构建可以参考：PyTorch复现网络模型VGG

CIFAR10 数据集

CIFAR-10数据集是一个广泛用于机器学习和计算机视觉研究的图像数据集。它由加拿大高级研究院（Canadian Institute For Advanced Research）和多伦多大学（University of Toronto）的计算机科学系提供。CIFAR-10数据集包含10个类别的60000张32x32的RGB彩色图像，每个类别有6000张图像。这些类别包括：

飞机（Airplane）
汽车（Automobile）
鸟类（Bird）
猫（Cat）
鹿（Deer）
狗（Dog）
蛙（Frog）
马（Horse）
船（Ship）
卡车（Truck）

每个类别的图像都是随机拍摄的，并且图像中的对象可能会出现不同的方向、大小和背景。CIFAR-10数据集被分为训练集和测试集，其中训练集包含50000张图像，测试集包含10000张图像。

CIFAR-10数据集的特点包括：

多样性：图像来自多个类别，并且每个类别中的图像在外观上具有多样性。
复杂性：图像中的对象可能会与其他对象重叠，或者部分被遮挡。
高难度：由于图像尺寸较小（32x32像素），分类任务具有一定的挑战性。
平衡性：每个类别的图像数量相同，这使得数据集在类别上是平衡的。

CIFAR-10数据集常用于评估图像识别算法的性能，尤其是在卷积神经网络（CNN）的研究中。由于其图像尺寸较小，CIFAR-10也适合用于测试算法的计算效率和内存使用情况。此外，CIFAR-10数据集也是许多深度学习框架和库中内置的标准数据集，方便研究人员和开发者快速开始实验和模型训练。

使用CIFAR-10数据集时，研究人员通常会进行数据增强、归一化处理等预处理步骤，以提高模型的性能和泛化能力。此外，CIFAR-10数据集也常用于比较不同深度学习架构的效果，如VGG、ResNet、AlexNet等。

关于 CIFAR10 数据集的更详细说明可以参考：CIFAR10 数据集介绍并转化为图片

VGG 网络模型实现 CIFAR10 数据集分类

keras 中已经有CIFAR10 数据集。
PyTorch的torchvision.models模块已经实现了VGG网络，可以直接调用。
手写 vgg 网络训练CIFAR10 数据集完整Python代码：

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
import time

# ===========================================训练===================================================
# 数据预处理：标准化到[0, 1]范围，并做数据增强
transform_train = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.RandomCrop(32, padding=4),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

# 加载训练集和测试集
trainset = torchvision.datasets.CIFAR10(root='E:/datasets/cifar-10-batches-py', train=True, download=True, transform=transform_train)
testset = torchvision.datasets.CIFAR10(root='E:/datasets/cifar-10-batches-py', train=False, download=True, transform=transform_test)

trainloader = DataLoader(trainset, batch_size=128, shuffle=True, num_workers=2)
testloader = DataLoader(testset, batch_size=100, shuffle=False, num_workers=2)

# 定义vgg16 网络
class VGG(nn.Module):
    def __init__(self, num_classes=10):
        super(VGG, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 64, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0),
            
            nn.Conv2d(64, 128, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(128, 128, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0),
            
            nn.Conv2d(128, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0),
            
            nn.Conv2d(256, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0),
            
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0),
        )
        self.classifier = nn.Sequential(
            nn.Linear(512 * 1 * 1, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5),
            nn.Linear(4096, num_classes),
        )

    def forward(self, x):
        x = self.features(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

# 初始化模型，损失函数，优化器
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = VGG(num_classes=10).to(device)

# 定义损失函数和优化器
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# 训练模型并保存模型文件
def train_model(model, trainloader, criterion, optimizer, epochs=1, model_save_path='vgg_cifar10.pth'):
    model.train()
    for epoch in range(epochs):
        running_loss = 0.0
        start_time = time.time()
        for i, (inputs, labels) in enumerate(trainloader, 0):
            inputs, labels = inputs.to(device), labels.to(device)

            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            running_loss += loss.item()
            if i % 100 == 99:  # 每100个小批量打印一次
                print(f"[{epoch + 1}, {i + 1}] loss: {running_loss / 100:.3f}")
                running_loss = 0.0
        print(f"Epoch {epoch + 1} took {time.time() - start_time:.2f} seconds")

    # 保存训练好的模型
    torch.save(model.state_dict(), model_save_path)
    print(f"Model saved to {model_save_path}")
    
# =========================================测试====================================================
# 加载保存的模型
#model = VGG(num_classes=10).to(device)
#model.load_state_dict(torch.load('vgg_cifar10.pth'))
#model.eval()  # 设置为评估模式

def test_model(model, testloader):
    correct = 0
    total = 0
    with torch.no_grad():  # 在测试时不需要计算梯度
        for inputs, labels in testloader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    print(f"Accuracy on the 10000 test images: {100 * correct / total:.2f}%")



# ===========================================推理===============================================
# 1. 加载模型
def load_model(model_path):
    # 加载预训练的VGG16模型，替换最后的分类层以适应CIFAR-10的10个类别
    model = models.vgg16(pretrained=False)
    num_features = model.classifier[6].in_features
    model.classifier[6] = torch.nn.Linear(num_features, 10)
    model.load_state_dict(torch.load(model_path))
    model.eval()  # 设置为评估模式
    return model
# 2. 定义预处理步骤
def transform_image(image_path):
    preprocess = transforms.Compose([
        transforms.Resize((224, 224)),  # 调整图像大小以匹配VGG输入
        transforms.ToTensor(),  # 将图像转换为Tensor
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),  # 归一化
    ])
    image = Image.open(image_path).convert('RGB')  # 打开图像并转换为RGB
    image_t = preprocess(image)  # 预处理图像
    batch_t = torch.unsqueeze(image_t, 0)  # 创建一个批次作为模型输入
    return batch_t

# 3. 进行推理
def infer(model, input_tensor):
    with torch.no_grad():  # 在这个上下文中，不计算梯度
        output = model(input_tensor)
    _, predicted_class = torch.max(output, 1)  # 获取预测结果
    return predicted_class.item()

# 主函数
def main():
    model_path = 'vgg_cifar10.pth'  # 模型文件路径
    image_path = 'path_to_your_image.jpg'  # 替换为你的图片文件路径

    model = load_model(model_path)  # 加载模型
    input_tensor = transform_image(image_path)  # 图像预处理
    predicted_class = infer(model, input_tensor)  # 推理

    print(f'Predicted class index: {predicted_class}')


# ===============================================主函数=====================================================

if __name__ == '__main__':
    # 训练
    train_model(model, trainloader, criterion, optimizer, epochs=1, model_save_path='vgg_cifar10.pth')
    # 测试
    # test_model(model, testloader)
    # 推理
    # main()