pytorch,手写数字识别,使用lenet算法实现，并对单个图片进行测试

最新推荐文章于 2024-09-16 19:52:18 发布

DaGod123

最新推荐文章于 2024-09-16 19:52:18 发布

阅读量544

点赞数 16

分类专栏： pytorch 文章标签： pytorch 算法人工智能

本文链接：https://blog.csdn.net/fengzhongye51460/article/details/141184361

版权

pytorch 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

LeNet-5

LeNet-5 是由 Yann LeCun 等人在 1998 年提出的一种经典卷积神经网络（CNN）模型，主要用于手写数字识别任务。它在 MNIST 数据集上表现出色，并且是深度学习历史上的一个重要里程碑。

LeNet-5 结构

LeNet-5 的结构包括以下几个层次：

输入层: 32x32 的灰度图像。
卷积层 C1: 包含 6 个 5x5 的滤波器，输出尺寸为 28x28x6。
池化层 S2: 平均池化层，输出尺寸为 14x14x6。
卷积层 C3: 包含 16 个 5x5 的滤波器，输出尺寸为 10x10x16。
池化层 S4: 平均池化层，输出尺寸为 5x5x16。
卷积层 C5: 包含 120 个 5x5 的滤波器，输出尺寸为 1x1x120。
全连接层 F6: 包含 84 个神经元。
输出层: 包含 10 个神经元，对应于 10 个类别。

mnist手写数字识别

Mnist数据集可以算是学习深度学习最常用到的了。

这个数据集包含70000张手写数字图片，分别是60000张训练图片和10000张测试图片，训练集由来自250个不同人手写的数字构成，一般来自高中生，一半来自工作人员，测试集（test set）也是同样比例的手写数字数据，并且保证了测试集和训练集的作者不同。

每个图片都是28*28个像素点，数据集/会把一张图片的数据转成一个2828=784的一维向量存储起来。

里面的图片数据如下所示，每张图是0-9的手写数字黑底白字的图片，存储时，黑色用0表示，白色用0-1的浮点数表示。

pytorch实现

lenet模型

import torch.nn as nn
import torch.nn.functional as func


class LeNet5(nn.Module):
    def __init__(self):
        super(LeNet5, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, kernel_size=5)
        self.conv2 = nn.Conv2d(6, 16, kernel_size=5)
        self.fc1 = nn.Linear(16*4*4, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = func.relu(self.conv1(x))
        x = func.max_pool2d(x, 2)

        x = func.relu(self.conv2(x))
        x = func.max_pool2d(x, 2)

        x = x.view(x.size(0), -1)
        
        x = func.relu(self.fc1(x))
        x = func.relu(self.fc2(x))
        x = self.fc3(x)
        return x

训练模型

导入数据，并训练模型

import torch
from torch import nn
from torch import optim
from models import *
from torchvision import transforms
from torchvision import datasets
from torch.utils.data import DataLoader

if __name__ == '__main__':
    # Define the image transformations: convert to grayscale and then to tensor
    transform = transforms.Compose([
        transforms.Grayscale(num_output_channels=1),
        transforms.ToTensor()
    ])

    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    # Load the training dataset from the specified directory and apply transformations
    train_dataset = datasets.ImageFolder(root='./mnist_train', transform=transform)
    # Load the test dataset from the specified directory and apply transformations
    test_dataset = datasets.ImageFolder(root='./mnist_test', transform=transform)
    # Print the length of the training dataset
    print("train_dataset length: ", len(train_dataset))
    # Print the length of the test dataset
    print("test_dataset length: ", len(test_dataset))

    # Create a DataLoader for the training dataset with batch size of 64 and shuffling enabled
    train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
    # Print the number of batches in the training DataLoader
    print("train_loader length: ", len(train_loader))

    # Iterate over the first few batches of the training DataLoader
    # for batch_idx, (data, label) in enumerate(train_loader):
    #     # Uncomment the following lines to break after 3 batches
    #     # if batch_idx == 3:
    #     #     break
    #     # Print the batch index
    #     print("batch_idx: ", batch_idx)
    #     # Print the shape of the data tensor
    #     print("data.shape: ", data.shape)
    #     # Print the shape of the label tensor
    #     print("label.shape: ", label.shape)
    #     # Print the labels
    #     print(label)

    # Initialize the neural network model
    #model = PreNetwork().to(device)
    model = LeNet5().to(device)
    #model = AlexNet().to(device)
    # self.model = VGG11().to(self.device)
    # self.model = VGG13().to(self.device)
    # self.model = VGG16().to(self.device)
    # self.model = VGG19().to(self.device)
    # self.model = GoogLeNet().to(self.device)
    # self.model = resnet18().to(self.device)
    # self.model = resnet34().to(self.device)
    # self.model = resnet50().to(self.device)
    # self.model = resnet101().to(self.device)
    # self.model = resnet152().to(self.device)
    # self.model = DenseNet121().to(self.device)
    # self.model = DenseNet161().to(self.device)
    # self.model = DenseNet169().to(self.device)
    # self.model = DenseNet201().to(self.device)
    # self.model = WideResNet(depth=28, num_classes=10).to(self.device)
    # Initialize the Adam optimizer with the model's parameters
    optimizer = optim.Adam(model.parameters())
    # Define the loss function as cross-entropy loss
    criterion = nn.CrossEntropyLoss().to(device)

    # Train the model for 10 epochs
    for epoch in range(10):
        # Iterate over the batches in the training DataLoader
        for batch_idx, (data, label) in enumerate(train_loader):
            data, label = data.to(device), label.to(device)
            print(data.shape)
            print(label.shape)
            # Forward pass: compute the model output
            output = model(data)
            # Compute the loss
            loss = criterion(output, label)
            # Backward pass: compute the gradients
            loss.backward()
            # Update the model parameters
            optimizer.step()
            # Zero the gradients for the next iteration
            optimizer.zero_grad()
            # Print the loss every 100 batches
            if batch_idx % 100 == 0:
                print(f"Epoch {epoch + 1}/10 "
                      f"| Batch {batch_idx}/{len(train_loader)} "
                      f"| Loss: {loss.item():.4f}")

    torch.save(model, 'mnist.pth')

单张图片测试

导入单张图片

导入模型，并测试

import torch
import cv2
import torch.nn.functional as F
#from model import Net  ##重要，虽然显示灰色(即在次代码中没用到)，但若没有引入这个模型代码，加载模型时会找不到模型
from torch.autograd import Variable
from torchvision import datasets, transforms
import numpy as np
 
if __name__ == '__main__':
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

    model = torch.load('./mnist.pth')  # 加载模型
    model = model.to(device)
    model.eval()  # 把模型转为test模式
 
    img = cv2.imread("1.png",0)  # 读取要预测的图片

    trans = transforms.Compose(
    [
        transforms.ToTensor()
    ])
 
    img = trans(img)
    img = img.to(device)
    img = img.unsqueeze(0)  # 图片扩展多一维,因为输入到保存的模型中是4维的[batch_size,通道,长，宽]，而普通图片只有三维，[通道,长，宽]
    # 扩展后，为[1，1，28，28]
    output = model(img)
    prob = F.softmax(output,dim=1) #prob是10个分类的概率
    print(prob)
    value, predicted = torch.max(output.data, 1)
    print(predicted.item())

测试结果

可以到测试结果正确。

tensor([[1.0468e-24, 2.3783e-21, 1.0000e+00, 2.5184e-12, 1.1491e-20, 1.0807e-24,
         6.6296e-21, 7.4025e-21, 3.6534e-18, 2.3143e-35]], device='cuda:0',
       grad_fn=<SoftmaxBackward0>)
2

结论：

从前面的前馈神经网络实现mnist，到现在用lenet实现mnist，可以看到应该所有的分类性算法都可以用来实现mnist。