week7：马铃薯病害识别（VGG-16复现）

irving#11

已于 2024-06-28 21:54:33 修改

阅读量776

点赞数 15

文章标签： pytorch 人工智能 python

于 2024-06-28 21:53:16 首次发布

本文链接：https://blog.csdn.net/weixin_43690449/article/details/140053695

版权

马铃薯病害识别（VGG-16复现）

🍨 本文为🔗365天深度学习训练营中的学习记录博客
🍖 原作者：K同学啊

$Ⅰ$ Introduction：

本次实验是对土豆图片训练能够准确识别病害的VGG-16模型，该数据集包含表现出各种疾病的马铃薯植物的高分辨率图像，包括早期疫病、晚期疫病和健康叶子。它旨在帮助开发和测试图像识别模型，以实现准确的疾病检测和分类，从而促进农业诊断的进步。
实验目标：
- 学会手动搭建VGG-16
- 熟练掌握相关的pytorch接口

$Ⅱ$ Experiment：

数据准备与任务分析：

任务是多分类问题，需要将输入的土豆图片分类为不同类型的病毒感染或健康状态。
数据准备：
网络获取数据集

配置环境：

实验环境：
- 语言环境：python 3.8
- 编译器: pycharm
- 深度学习环境：
  - torch==2.11
  - cuda12.1
  - torchvision==0.15.2a0

构建网络：
VGG-16是一种深层卷积神经网络，由16个卷积层和全连接层组成。它通过使用小卷积核（3x3）和深度卷积层，取得了高效的特征提取效果。VGG-16在多个图像识别任务中表现出色，是许多计算机视觉应用的基础。

先导入需要的包

import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision
from torchvision import transforms, datasets
import torchsummary as summary
import copy
import os,PIL,pathlib,warnings,random
import matplotlib.pyplot as plt
warnings.filterwarnings("ignore")             #忽略警告信息
from PIL import Image 
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

import torch.nn.functional as F

构建VGG网络：

class vgg16(nn.Module):
    def __init__(self):
        super(vgg16, self).__init__()
        # 卷积块1
        self.block1 = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(),
            nn.Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=(2, 2), stride=(2, 2))
        )
        # 卷积块2
        self.block2 = nn.Sequential(
            nn.Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(),
            nn.Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=(2, 2), stride=(2, 2))
        )
        # 卷积块3
        self.block3 = nn.Sequential(
            nn.Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(),
            nn.Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(),
            nn.Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=(2, 2), stride=(2, 2))
        )
        # 卷积块4
        self.block4 = nn.Sequential(
            nn.Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(),
            nn.Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(),
            nn.Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=(2, 2), stride=(2, 2))
        )
        # 卷积块5
        self.block5 = nn.Sequential(
            nn.Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(),
            nn.Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(),
            nn.Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=(2, 2), stride=(2, 2))
        )
      

        # 全连接网络层，用于分类
        self.classifier = nn.Sequential(
            nn.Linear(in_features=512*7*7, out_features=4096),
            nn.ReLU(),
            nn.Linear(in_features=4096, out_features=4096),
            nn.ReLU(),
            nn.Linear(in_features=4096, out_features=3)
        )

    def forward(self, x):

        x = self.block1(x)
        x = self.block2(x)
        x = self.block3(x)
        x = self.block4(x)
        x = self.block5(x)
        x = torch.flatten(x, start_dim=1)
        x = self.classifier(x)

        return x

加载并打印模型:

    model = vgg16().to(device)
    print(model)
    summary.summary(model, (3, 224, 224))

    >>>
    vgg16(
  (block1): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) 
    (1): ReLU()
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
  )
  (block2): Sequential(
    (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
  )
  (block3): Sequential(
    (0): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (5): ReLU()
    (6): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
  )
  (block4): Sequential(
    (0): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (5): ReLU()
    (6): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
  )
  (block5): Sequential(
    (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU()
    (4): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (5): ReLU()
    (6): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): Linear(in_features=25088, out_features=4096, bias=True)
    (1): ReLU()
    (2): Linear(in_features=4096, out_features=4096, bias=True)
    (3): ReLU()
    (4): Linear(in_features=4096, out_features=3, bias=True)
  )
)
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 224, 224]           1,792
              ReLU-2         [-1, 64, 224, 224]               0
            Conv2d-3         [-1, 64, 224, 224]          36,928
              ReLU-4         [-1, 64, 224, 224]               0
         MaxPool2d-5         [-1, 64, 112, 112]               0
            Conv2d-6        [-1, 128, 112, 112]          73,856
              ReLU-7        [-1, 128, 112, 112]               0
            Conv2d-8        [-1, 128, 112, 112]         147,584
              ReLU-9        [-1, 128, 112, 112]               0
        MaxPool2d-10          [-1, 128, 56, 56]               0
           Conv2d-11          [-1, 256, 56, 56]         295,168
             ReLU-12          [-1, 256, 56, 56]               0
           Conv2d-13          [-1, 256, 56, 56]         590,080
             ReLU-14          [-1, 256, 56, 56]               0
           Conv2d-15          [-1, 256, 56, 56]         590,080
             ReLU-16          [-1, 256, 56, 56]               0
        MaxPool2d-17          [-1, 256, 28, 28]               0
           Conv2d-18          [-1, 512, 28, 28]       1,180,160
             ReLU-19          [-1, 512, 28, 28]               0
           Conv2d-20          [-1, 512, 28, 28]       2,359,808
             ReLU-21          [-1, 512, 28, 28]               0
           Conv2d-22          [-1, 512, 28, 28]       2,359,808
             ReLU-23          [-1, 512, 28, 28]               0
        MaxPool2d-24          [-1, 512, 14, 14]               0
           Conv2d-25          [-1, 512, 14, 14]       2,359,808
             ReLU-26          [-1, 512, 14, 14]               0
           Conv2d-27          [-1, 512, 14, 14]       2,359,808
             ReLU-28          [-1, 512, 14, 14]               0
           Conv2d-29          [-1, 512, 14, 14]       2,359,808
             ReLU-30          [-1, 512, 14, 14]               0
        MaxPool2d-31            [-1, 512, 7, 7]               0
           Linear-32                 [-1, 4096]     102,764,544
             ReLU-33                 [-1, 4096]               0
           Linear-34                 [-1, 4096]      16,781,312
             ReLU-35                 [-1, 4096]               0
           Linear-36                    [-1, 3]          12,291
================================================================
Total params: 134,272,835
Trainable params: 134,272,835
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 218.52
Params size (MB): 512.21
Estimated Total Size (MB): 731.30
----------------------------------------------------------------

训练模型：

首先需要设置损失函数，超参数和优化器：

    optimizer  = torch.optim.Adam(model.parameters(), lr= 1e-4)
    loss_fn    = nn.CrossEntropyLoss() # 创建损失函数

    epochs     = 40

划分数据集用于训练和测试：

    data_dir = './PotatoPlants/'
    data_dir = pathlib.Path(data_dir)

    data_paths  = list(data_dir.glob('*'))
    classeNames = [str(path).split("\\")[1] for path in data_paths]

    train_transforms = transforms.Compose([
    transforms.Resize([224, 224]),  # 将输入图片resize成统一尺寸
    # transforms.RandomHorizontalFlip(), # 随机水平翻转
    transforms.ToTensor(),          # 将PIL Image或numpy.ndarray转换为tensor，并归一化到[0,1]之间
    transforms.Normalize(           # 标准化处理-->转换为标准正太分布（高斯分布），使模型更容易收敛
        mean=[0.485, 0.456, 0.406], 
        std=[0.229, 0.224, 0.225])  # 其中 mean=[0.485,0.456,0.406]与std=[0.229,0.224,0.225] 从数据集中随机抽样计算得到的。
    ])

    test_transform = transforms.Compose([
        transforms.Resize([224, 224]),  # 将输入图片resize成统一尺寸
        transforms.ToTensor(),          # 将PIL Image或numpy.ndarray转换为tensor，并归一化到[0,1]之间
        transforms.Normalize(           # 标准化处理-->转换为标准正太分布（高斯分布），使模型更容易收敛
            mean=[0.485, 0.456, 0.406], 
            std=[0.229, 0.224, 0.225])  # 其中 mean=[0.485,0.456,0.406]与std=[0.229,0.224,0.225] 从数据集中随机抽样计算得到的。
    ])

    total_data = datasets.ImageFolder("./PotatoPlants/",transform=train_transforms)
    train_size = int(0.8 * len(total_data))
    test_size  = len(total_data) - train_size
    train_dataset, test_dataset = torch.utils.data.random_split(total_data, [train_size, test_size])

    batch_size = 32

    train_dl = torch.utils.data.DataLoader(train_dataset,
                                            batch_size=batch_size,
                                            shuffle=True,
                                            num_workers=1)
    test_dl = torch.utils.data.DataLoader(test_dataset,
                                            batch_size=batch_size,
                                            shuffle=True,
                                            num_workers=1)

然后编写训练函数，使用反向传播计算梯度，再通过优化器更新

def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)  # 训练集的大小，一共60000张图片
    num_batches = len(dataloader)  # 批次数目，1875（60000/32）

    train_loss, train_acc = 0, 0  # 初始化训练损失和正确率

    for X, y in dataloader:  # 获取图片及其标签
        X, y = X.to(device), y.to(device)

        print(X.shape)
        # 计算预测误差
        pred = model(X)  # 网络输出
        loss = loss_fn(pred, y)  # 计算网络输出和真实值之间的差距，targets为真实值，计算二者差值即为损失

        # 反向传播
        optimizer.zero_grad()  # grad属性归零
        loss.backward()  # 反向传播
        optimizer.step()  # 每一步自动更新

        # 记录acc与loss
        train_acc += (pred.argmax(1) == y).type(torch.float).sum().item()
        train_loss += loss.item()

    train_acc /= size
    train_loss /= num_batches

    return train_acc, train_loss

测试模型：

测试函数与训练函数大致相同，去掉优化器与梯度传播即可：

def test (dataloader, model, loss_fn):
    size        = len(dataloader.dataset)  # 测试集的大小
    num_batches = len(dataloader)          # 批次数目, (size/batch_size，向上取整)
    test_loss, test_acc = 0, 0
  

    # 当不进行训练时，停止梯度更新，节省计算内存消耗
    with torch.no_grad():
        for imgs, target in dataloader:
            imgs, target = imgs.to(device), target.to(device)
          

            # 计算loss
            target_pred = model(imgs)
            loss        = loss_fn(target_pred, target)
          

            test_loss += loss.item()
            test_acc  += (target_pred.argmax(1) == target).type(torch.float).sum().item()

    test_acc  /= size
    test_loss /= num_batches

    return test_acc, test_loss

实验结果及可视化：
目标是尽可能提高在测试集上分类的准确率，即test_acc:

    for epoch in range(epochs):
    

        model.train()
        epoch_train_acc, epoch_train_loss = train(train_dl, model, loss_fn, optimizer)
    

        model.eval()
        epoch_test_acc, epoch_test_loss = test(test_dl, model, loss_fn)
    

        # 保存最佳模型到 best_model
        if epoch_test_acc > best_acc:
            best_acc   = epoch_test_acc
            best_model = copy.deepcopy(model)
    

        train_acc.append(epoch_train_acc)
        train_loss.append(epoch_train_loss)
        test_acc.append(epoch_test_acc)
        test_loss.append(epoch_test_loss)
    

        # 获取当前的学习率
        lr = optimizer.state_dict()['param_groups'][0]['lr']
    

        template = ('Epoch:{:2d}, Train_acc:{:.1f}%, Train_loss:{:.3f}, Test_acc:{:.1f}%, Test_loss:{:.3f}, Lr:{:.2E}')
        print(template.format(epoch+1, epoch_train_acc*100, epoch_train_loss, 
                            epoch_test_acc*100, epoch_test_loss, lr))
    

    # 保存最佳模型到文件中
    PATH = './best_model.pth'  # 保存的参数文件名
    torch.save(model.state_dict(), PATH)

    print('Done')
    plt.rcParams['font.sans-serif']    = ['SimHei'] # 用来正常显示中文标签
    plt.rcParams['axes.unicode_minus'] = False      # 用来正常显示负号
    plt.rcParams['figure.dpi']         = 100        #分辨率

    epochs_range = range(epochs)

    plt.figure(figsize=(12, 3))
    plt.subplot(1, 2, 1)

    plt.plot(epochs_range, train_acc, label='Training Accuracy')
    plt.plot(epochs_range, test_acc, label='Test Accuracy')
    plt.legend(loc='lower right')
    plt.title('Training and Validation Accuracy')

    plt.subplot(1, 2, 2)
    plt.plot(epochs_range, train_loss, label='Training Loss')
    plt.plot(epochs_range, test_loss, label='Test Loss')
    plt.legend(loc='upper right')
    plt.title('Training and Validation Loss')
    plt.show()

经过40轮训练，得到结果如下：
在这里插入图片描述

结果如图：
在这里插入图片描述

保存与加载模型并测试本地图片：

# 模型保存
PATH = './model.pth'  # 保存的参数文件名
torch.save(model.state_dict(), PATH)

# 将参数加载到model当中
model.load_state_dict(torch.load(PATH, map_location=device))

预测单张图片：

def predict_one_image(image_path, model, transform, classes):
  

    test_img = Image.open(image_path).convert('RGB')
    plt.imshow(test_img)  # 展示预测的图片

    test_img = transform(test_img)
    img = test_img.to(device).unsqueeze(0)
  

    model.eval()
    output = model(img)

    _,pred = torch.max(output,1)
    pred_class = classes[pred]
    print(f'预测结果是：{pred_class}')

predict_one_image(image_path='./4-data/Monkeypox/M01_01_00.jpg', 
                  model=model, 
                  transform=train_transforms, 
                  classes=classes)

调用函数即模型测试：

classes = list(total_data.class_to_idx)
 predict_one_image(image_path='./PotatoPlants/Early_blight/1.JPG', 
                  model=model, 
                  transform=train_transforms, 
                  classes=classes)

>>>得到结果：
预测结果是：Early_blight

$Ⅲ$ Conclusion：

总结：

在本实验中，成功复现了VGG-16模型，并应用于土豆病毒图片的识别任务。通过搭建前馈神经网络，深入理解了卷积层和池化层的作用及其在图像特征提取中的重要性。使用卷积层提取图片的低级和高级特征，并通过池化层进行特征降维，有效减少了计算量并提升了模型的泛化能力。实验结果表明，经过训练的VGG-16模型在土豆病毒图片的分类任务中表现优异，验证了卷积神经网络在图像识别领域的强大能力。

idx)
predict_one_image(image_path=‘./PotatoPlants/Early_blight/1.JPG’,
model=model,
transform=train_transforms,
classes=classes)

得到结果：
预测结果是：Early_blight

$ conclusion：

总结：

通过本实验，我进一步理解关于VGG-16与池化卷积等这些关键知识点，并掌握了如何利用卷积神经网络进行图像分类任务。VGG-16模型的复现和应用不仅加深了对CNN的理解，也为实际应用中的图像识别提供了宝贵的经验。

irving#11

关注

15
点赞
踩
22

收藏

觉得还不错? 一键收藏
0
评论
week7：马铃薯病害识别（VGG-16复现）

在本实验中，成功复现了VGG-16模型，并应用于土豆病毒图片的识别任务。通过搭建前馈神经网络，深入理解了卷积层和池化层的作用及其在图像特征提取中的重要性。使用卷积层提取图片的低级和高级特征，并通过池化层进行特征降维，有效减少了计算量并提升了模型的泛化能力。实验结果表明，经过训练的VGG-16模型在土豆病毒图片的分类任务中表现优异，验证了卷积神经网络在图像识别领域的强大能力。预测结果是：Early_blight。
复制链接

扫一扫