【卷积神经网络系列】二、LeNet

travellerss

已于 2023-09-17 09:01:30 修改

阅读量52

点赞数

分类专栏： # 卷积神经网络文章标签：网络深度学习神经网络

于 2022-09-12 15:56:50 首次发布

本文链接：https://blog.csdn.net/qq_30196905/article/details/126817346

版权

卷积神经网络专栏收录该内容

20 篇文章 2 订阅

订阅专栏

一、简介

LeNet5源自Yann LeCun的论文“Gradient-Based Learning Applied to Document Recognition”,是一种用于手写体字符识别的非常高效的卷积神经网络。
在这里插入图片描述
LeNet5网络虽然很小，但是包含了深度学习的基本模块：卷积层、池化层、全连接层。LeNet5共有七层，不包含输入，每层都包含可训练参数，每个层有多个Feature Map（经过卷积操作之后的结果叫特征映射），每个Feature Map通过一种卷积滤波器提取输入的一种特征，每个Feature Map有多个神经元。

输入： 32 * 32 * 1 的手写字体图片（灰度），这些手写字体包含0-9数字，也就是相当于10个类别的图片。
输出： 分类结果，0-9之间的一个数（softmax）。

二、网络结构

1. INPUT（输入层）

32 * 32 * 1的图片，共有1024个神经元。

2. C1(卷积层)

选取6个5 * 5的卷积核(不包含偏置)，得到6个特征图，每个特征图的大小为32 − 5 + 1 = 28，神经元的个数由1024变为28 * 28 * 6 = 4704。

输入层与C1层之间的参数：6 * ( 5 * 5 + 1 ) = 156，对于卷积层C1，每个像素都与前一层的5 * 5像素和1一个bias有连接，有**6 * ( 5 * 5 + 1 ) * ( 28 * 28 )**个连接。（参考邱锡鹏NNDL的P129页）

3. S2（池化层）

池化层（这里和传统的最大池化层有区别）是一个下采样层，有6个14 * 14的特征图，特征图中的每个单元与C1中相对应特征图的2 * 2邻域连接。S2层每个单元对应C1中4个求和，乘以一个可训练参数，再加上一个可训练偏置。

C1与S2之间的参数：每一个2 * 2 求和，然后乘以一个参数，加上一个偏置，共计2 * 6 = 12 个参数。S2中的每个像素都与C1中的2 * 2个像素和1个偏置相连接，所以有6 * （2 * 2 + 1） * 14 * 14 = 5880 个连接。

4. C3（卷积层）

S2与C3之间的组合：采用连接表来定义输入和输出特征映射之间的依赖关系。
在这里插入图片描述

如图所示，前6个feature map与S2层相连的3个feature map相连接，后面6个feature map与S2层相连的4个feature map相连接，后面3个feature map与S2层部分不相连的4个feature map相连接，最后一个与S2层的所有feature map相连。

选取卷积核大小为5 * 5，共使用60个5 * 5的卷积核，得到16组大小为10 * 10的特征映射。神经元数量为16 * 100 = 1600，可训练参数量为6 * ( 3 * 5 * 5 + 1 ) + 6 * ( 4 * 5 * 5 + 1 ) + 3 * ( 4 * 5 * 5 + 1 ) + 1 * ( 6 * 5 * 5 + 1 ) = 1516，连接数为1516 * 10 * 10 = 151600。

如果不使用连接表，需要16 * 6 * 5 * 5个卷积核。

5. C5（卷积层）

总共120个feature map，每个feature map与S4层所有的feature map相连接，卷积核大小为5 * 5，一共需要120 * 16 = 1920个卷积核，而S4层的feature map的大小也是5 * 5，所以C5的feature map就变成一个点，得到120组大小为1 X 1的特征映射，有120 * ( 25 * 16 + 1 ) = 48120个参数，连接数也为48120。

6. F6（全连接层）

F6相当于MLP（Multi-Layer Perceptron，多层感知机）中的隐含层，有84个节点，所以有84 * ( 120 + 1 ) = 10164个参数，连接数和可训练参数相同，都为10164。

F6采用了sigmoid（用的是tanh）函数。

7. Output（输出层）

全连接层，共有10个节点，采用的是**径向基函数（RBF）**的网络连接方式。

三、总结

LeNet5是一种用于手写体字符识别的非常高效的卷积神经网络。
卷积神经网络能够很好的利用图像的结构信息。
卷积层的参数较少，这也是由卷积层的主要特性即局部连接和共享权重所决定。

四、论文复现

1. 数据集

MINIST 数据集下载地址

文章中已经指明了使用minist数据集：就是一群人手写的数字0-9，然后用于训练让机器识别。数据包：训练集+测试集，训练样本：共60000个，其中55000个用于训练，另外5000个用于验证；测试样本：共10000个。里面的图片像素大小是28 * 28，灰度像素范围[0,255]，且全是整数。像下面这样：

在这里插入图片描述

CIFAR-10 是由 Hinton 的学生 Alex Krizhevsky 和 Ilya Sutskever 整理的一个用于识别普适物体的小型数据集。一共包含 10 个类别的 RGB 彩色图片：飞机（ airlane ）、汽车（ automobile ）、鸟类（ bird ）、猫（ cat ）、鹿（ deer ）、狗（ dog ）、蛙类（ frog ）、马（ horse ）、船（ ship ）和卡车（ truck ）。图片的尺寸为 32×32 ，数据集中一共有 50000 张训练圄片和 10000 张测试图片。
在这里插入图片描述

2. 具体代码

2.1版本一：

（1）数据输入部分

import torch
from torch import optim
from torchvision import transforms
from torchvision import datasets
from torch.utils.data import DataLoader
import torch.nn as nn
import matplotlib.pyplot as plt

batch_size = 64     # 设置批量大小

# transforms相当于是对图片的一个处理工具箱，比如剪辑，旋转，填充变换等等。
# 而 Compose相当于一个集合，将所有对图片的预处理操作放到一起，按步执行。
transform = transforms.Compose([
    #将图片尺寸resize到32x32
    transforms.Resize((32,32)),
    # 就是改变图像类型和数据，变成tensor类型
    # 图像(0-255，像素值 28 * 28)值变为图像张量（映射0-1，像素值 1 * 28 * 28）
    # 就是 W * H * C 变为 C * W * H
    transforms.ToTensor(),                      
    transforms.Normalize((0.1307,), (0.3081,))  # 标准化数据，方便后期数据处理以及加快收敛速度。
])

# 导入MINIST数据集
train_dataset = datasets.MNIST(root='../dataset/mnist/', train=True, 
                               download=True, transform=transform)
test_dataset = datasets.MNIST(root='../dataset/mnist/', train=False, 
                              download=True, transform=transform)

# shuffle = True 打乱样本顺序
# train = True 作为训练集
# download= True 从网上下载
train_loader = DataLoader(train_dataset, shuffle=True, batch_size=batch_size)
test_loader = DataLoader(test_dataset, shuffle=False, batch_size=batch_size)

print("训练集长度",len(train_dataset))
print("测试集长度",len(test_dataset))

（2）网络结构设计

# 模型类
class LeNet5(nn.Module):
    def __init__(self):
        super(LeNet5, self).__init__()
        self.mode1 = nn.Sequential(
            
            # 卷积层C1
            nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5, stride=1),
            
            # 最大池化层S2
            nn.MaxPool2d(kernel_size=2, stride=2),
            
            # 卷积层C3
            nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5, stride=1),
            
            # 最大池化层S4
            nn.MaxPool2d(kernel_size=2, stride=2),
            
            # 可以把C5当成线性层或者卷积层
            # 方式一：卷积层C5
            nn.Conv2d(in_channels=16, out_channels=120, kernel_size=5, stride=1),
            nn.Flatten(),  # 把它展平
            
            # 方式二：线性层C5
            # nn.Flatten()  # 把它展平
            # nn.Linear(in_features=5*5*16, out_features=120),  # 再通过线性层变换维度
            
            # 线性层F6
            nn.Linear(in_features=120, out_features=84),
            
            # 输出层
            nn.Linear(in_features=84, out_features=10),
        )

    # 前馈函数
    def forward(self, input):
        x = self.mode1(input)
        return x

# 设置GPU训练
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# 实例化模型对象
model = LeNet5()

# 把模型放到GPU上
model.to(device)

（3）构建优化准则与损失函数

# 损失函数
criterion = torch.nn.CrossEntropyLoss()

# 优化器
optimizer = optim.SGD(model.parameters(), lr=0.01)

（4）训练函数

# 每轮epoch一共训练1W个样本
# 这里的runing_loss是1W个样本的总损失值，要看每一个样本的平均损失值，记得除10000
def train(epoch):
    runing_loss = 0.0
    # 一共需要循环次数 = 样本数 / 批量大小
    for i, data in enumerate(train_loader):
        x, y = data                         # x是特征矩阵，y是标签向量
        x, y = x.to(device), y.to(device)   # 放到GPU上
        optimizer.zero_grad()               # 梯度先清零
        y_pre = model(x)                    # 通过模型预测输出y_pre
        loss = criterion(y_pre, y)          # 计算损失
        loss.backward()                     # 反向传播
        optimizer.step()                    # 优化参数
        runing_loss += loss.item()          # 计算损失和
    
    # 计算循环次数
    cycle = (len(train_dataset)/batch_size)
    
    # 输出每轮训练的平均损失
    # runing_loss是一轮训练过程中的损失之和，cycle是一轮训练中的迭代次数
    print("这是第%d轮训练，当前损失值:%.5f" % (epoch + 1, runing_loss/cycle))
    return runing_loss/cycle

（5）测试函数

def test(epoch):
    correct = 0
    total = 0
    with torch.no_grad():
        for data in test_loader:
            x, y = data
            x, y = x.to(device), y.to(device)
            pre_y = model(x)
            
            # 这里拿到的预测值 每一行都对应10个分类，这10个分类都有对应的概率，
            # 我们要拿到最大的那个概率和其对应的下标。
            # pre_y就是返回的下标
            j, pre_y = torch.max(pre_y.data, dim=1)  # dim = 1 列是第0个维度，行是第1个维度

            total += y.size(0)  # 统计列方向上的元素个数，即样本个数
            correct += (pre_y == y).sum().item()  # 张量之间的比较运算，比较值相同的部分并统计个数
            
    print("第%d轮测试结束，当前正确率:%d %%" % (epoch + 1, correct / total * 100))
    return correct / total * 100

（6）绘制结果

if __name__ == '__main__':
    plt_epoch = []
    loss_ll = []
    corr = []
    for epoch in range(20):
        plt_epoch.append(epoch+1) # 方便绘图
        loss_ll.append(train(epoch)) # 记录每一次的训练损失值 方便绘图
        corr.append(test(epoch)) # 记录每一次的正确率

    # 标题和XY轴都用的中文，所以写上这一句话，不然绘图出来中文会乱码。
    plt.rcParams['font.sans-serif'] = ['KaiTi']
    
    plt.figure(figsize=(12,6))
    plt.subplot(1,2,1)
    plt.title("训练模型")
    plt.plot(plt_epoch,loss_ll)
    plt.xlabel("循环次数")
    plt.ylabel("损失值loss")


    plt.subplot(1,2,2)
    plt.title("测试模型")
    plt.plot(plt_epoch,corr)
    plt.xlabel("循环次数")
    plt.ylabel("正确率")
    plt.show()

2.2改进版本：

（1）下载数据并预处理

import os
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
import time
from matplotlib import pyplot as plt
import math

"""
标准化（Normalization）是神经网络对数据的一种经常性操作。
标准化处理指的是：样本减去它的均值，再除以它的标准差，最终样本将呈现均值为0，方差为1的数据分布。
神经网络模型偏爱标准化数据，原因是均值为0方差为1的数据在sigmoid、tanh经过激活函数后求导得到的导数很大，
反之原始数据不仅分布不均（噪声大）而且数值通常都很大（本例中数值范围是 0~255），激活函数后求导得到的导数则接近与0，这也被称为梯度消失。
所以说，数据的标准化有利于加快神经网络的训练。 
"""
pipline_train = transforms.Compose([
    #随机旋转图片
    transforms.RandomHorizontalFlip(),
    #将图片尺寸resize到32x32
    transforms.Resize((32,32)),
    #将图片转化为Tensor格式
    transforms.ToTensor(),
    #正则化(当模型出现过拟合的情况时，用来降低模型的复杂度)
    transforms.Normalize((0.1307,),(0.3081,))    
])

pipline_test = transforms.Compose([
    #将图片尺寸resize到32x32
    transforms.Resize((32,32)),
    transforms.ToTensor(),
    transforms.Normalize((0.1307,),(0.3081,))
])

train_batchsize = 64    # 训练集batch大小
test_batchsize = 32     # 测试集batch大小

#下载数据集
train_set = datasets.MNIST(root="../dataset/mnist/", train=True, 
                           download=True, transform=pipline_train)
test_set = datasets.MNIST(root="../dataset/mnist/", train=False, 
                          download=True, transform=pipline_test)

#dataset中的数据个数可能不是batch_size的整数倍，drop_last为True会将多出来不足一个的batch的数据丢弃
#加载数据集
trainloader = torch.utils.data.DataLoader(train_set, batch_size=train_batchsize, 
                                          shuffle=True, drop_last=False)
testloader = torch.utils.data.DataLoader(test_set, batch_size=test_batchsize, 
                                         shuffle=False, drop_last=False)

（2）搭建网络结构

# 定义模型
class LeNet5(nn.Module):
    def __init__(self):
        super(LeNet5, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 5)         # 卷积层C1
        self.maxpool1 = nn.MaxPool2d(2, 2)      # 最大池化层S2
        self.conv2 = nn.Conv2d(6, 16, 5)        # 卷积层C3
        self.maxpool2 = nn.MaxPool2d(2, 2)      # 最大池化层S4
        self.fc1 = nn.Linear(16*5*5, 120)       # 线性层C5
        self.fc2 = nn.Linear(120, 84)           # 线性层F6
        self.fc3 = nn.Linear(84, 10)            # 输出层Out
        self.relu = nn.ReLU()                   # 激活层
 
    # 前馈计算
    def forward(self, x):
        x = self.conv1(x)
        x = self.relu(x)
        x = self.maxpool1(x)
        x = self.conv2(x)
        x = self.maxpool2(x)
        x = x.view(-1, 16*5*5)                  # 将三维张量展平成一维
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        output = self.fc3(x)
        # output = F.softmax(x, dim=1)            # 用交叉熵损失函数就不需要softmax了吧？
        return output
        
 # 设置GPU训练
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# 实例化模型对象
model = LeNet5()

# 把模型放到GPU上
model.to(device)

（3）定义优化器与损失函数

#定义优化器
optimizer = optim.Adam(model.parameters(), lr=0.001)

# 损失函数
criterion = torch.nn.CrossEntropyLoss()

（4）绘图参数列表

# 训练次数
epoch = 20

# 绘图所用
plt_epoch = []      # 横坐标，训练次数

Train_Loss = []     # 训练损失
Train_Accuracy = [] # 训练精度

Test_Loss = []      # 测试损失
Test_Accuracy = []  # 测试精度

（5）训练函数

def train_runner(model, epoch):
    total = 0       # 总样本数量
    correct =0.0    # 每轮epoch分类正确样本数量
    avg_loss = 0.0  # 每轮epoch的平均损失
 
    # 解包输入与标签向量
    for inputs, labels in trainloader:                           
        inputs, labels = inputs.to(device), labels.to(device)   # 把模型部署到device上  
        optimizer.zero_grad()                                   # 梯度清零        
        outputs = model(inputs)                                 # 保存训练结果
        loss = criterion(outputs, labels)                       # 计算损失和
        avg_loss += loss.item()                                 # 把损失累加
        loss.backward()                                         # 反向传播
        optimizer.step()                                        # 更新参数
        #dim=1表示返回每一行的最大值对应的列下标
        predict = outputs.argmax(dim=1)                         #获取最大概率的预测结果
        total += labels.size(0)                                 # 总样本数
        correct += (predict == labels).sum().item()             # 统计正确分类样本个数
    
    # 这里train_batchsize是64，向上取整，所有小数都是向着数值更大的方向取整
    batch_num = math.ceil(total/train_batchsize)
        
    # 每完成一次训练epoch，打印当前平均Loss和精度
    avg_loss /= batch_num 
    print("Train Epoch{} \t Avg_Loss: {:.6f}, accuracy: {:.6f}%".format(epoch, avg_loss, 100*(correct/total)))
    
    # 加入列表，以便于绘图
    Train_Loss.append(avg_loss)
    Train_Accuracy.append(correct/total)

（6）测试函数

def test_runner(model):
    #统计模型正确率, 设置初始值
    correct = 0.0
    test_loss = 0.0
    total = 0
    
    #torch.no_grad将不会计算梯度, 也不会进行反向传播
    with torch.no_grad():
        for data, label in testloader:
            data, label = data.to(device), label.to(device)
            output = model(data)
            test_loss += criterion(output, label).item()
            predict = output.argmax(dim=1)
            #计算正确数量
            total += label.size(0)
            correct += (predict == label).sum().item()
        
        # 这里test_batchsize是32，向上取整，所有小数都是向着数值更大的方向取整
        batch_num = math.ceil(total/test_batchsize)
            
        # 每完成一次训练epoch，打印当前平均Loss和精度
        test_loss /= batch_num 
            
        #计算损失值和精度
        print("test_avarage_loss: {:.6f}, accuracy: {:.6f}%".format(test_loss, 100*(correct/total)))
        
        # 加入列表，以便于绘图
        Test_Loss.append(test_loss)
        Test_Accuracy.append(correct/total)

（7）运行并绘图

if __name__ == '__main__':
    
    print("start_time",time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time())))
    for epoch in range(1, epoch+1):
        plt_epoch.append(epoch)
        train_runner(model, epoch)
        test_runner(model)
    print("end_time: ",time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time())),'\n')
 
    print('Finished Training')
    plt.subplot(2,2,1), plt.plot(plt_epoch, Train_Loss), plt.title('Train_Loss'), plt.grid()
    plt.subplot(2,2,2), plt.plot(plt_epoch, Train_Accuracy), plt.title('Train_Accuracy'), plt.grid()
    plt.subplot(2,2,3), plt.plot(plt_epoch, Test_Loss), plt.title('Test_Loss'), plt.grid()
    plt.subplot(2,2,4), plt.plot(plt_epoch, Test_Accuracy), plt.title('Test_Accuracy'), plt.grid()
    plt.tight_layout()
    plt.show()

（8）保存模型

print(model)
pathfile = 'C:\\Users\\LiZhangXun\\Desktop\\经典论文\Code\\1.LeNet\\models'
save_filename = 'model-mnist.pth'
model_path = os.path.join(pathfile, save_filename)
torch.save(model, model_path) #保存模型

（9）导入模型并验证

import cv2
 
if __name__ == '__main__':
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model = torch.load(model_path) #加载模型
    model = model.to(device)
    model.eval()    #把模型转为test模式
 
    #读取要预测的图片
    img = cv2.imread("./pic/test_mnist.jpg")
    
    # resize图片的大小
    img=cv2.resize(img, dsize=(32,32),interpolation=cv2.INTER_NEAREST)
    plt.imshow(img,cmap="gray") # 显示图片
    plt.axis('off') # 不显示坐标轴
    plt.show()
 
    # 导入图片，图片扩展后为[1，1，32，32]
    trans = transforms.Compose(
        [
            transforms.ToTensor(),
            transforms.Normalize((0.1307,), (0.3081,))
        ])
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)#图片转为灰度图，因为mnist数据集都是灰度图
    img = trans(img)
    img = img.to(device)    # 放到GPU上
    
    # 图片扩展多一维,因为输入到保存的模型中是4维的[batch_size,通道,长，宽]
    # 而普通图片只有三维，[通道,长，宽]
    img = img.unsqueeze(0)  
    
    # 预测 
    output = model(img)
    prob = F.softmax(output, dim=1) #prob是10个分类的概率
    print("概率：",prob)
    
    value, predicted = torch.max(output.data, 1)
    predict = output.argmax(dim=1)
    print("预测类别：",predict.item())

2.3在CIFAR-10上的实现：

（1）下载数据并预处理

import os
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
import time
from matplotlib import pyplot as plt
import math

pipline_train = transforms.Compose([
    #随机旋转图片
    transforms.RandomHorizontalFlip(),
    #将图片尺寸resize到32x32
    transforms.Resize((32,32)),
    #将图片转化为Tensor格式
    transforms.ToTensor(),
    #正则化(当模型出现过拟合的情况时，用来降低模型的复杂度)
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

pipline_test = transforms.Compose([
    #将图片尺寸resize到32x32
    transforms.Resize((32,32)),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

train_batchsize = 64    # 训练集batch大小
test_batchsize = 32     # 测试集batch大小

#下载数据集
train_set = datasets.CIFAR10(root="../dataset/CIFAR10/", train=True, 
                             download=True, transform=pipline_train)
test_set = datasets.CIFAR10(root="../dataset/CIFAR10/", train=False, 
                            download=True, transform=pipline_test)

#加载数据集
trainloader = torch.utils.data.DataLoader(train_set, batch_size=train_batchsize, 
                                          shuffle=True, drop_last=False)
testloader = torch.utils.data.DataLoader(test_set, batch_size=test_batchsize, 
                                         shuffle=False, drop_last=False)

# 类别信息也是需要我们给定的
classes = ('plane', 'car', 'bird', 'cat','deer', 'dog', 'frog', 'horse', 'ship', 'truck')
print("训练集长度",len(train_set))
print("测试集长度",len(test_set))

（2）搭建网络结构
由于 CIFAR10 数据集图像是 RGB 三通道的，因此 LeNet-5 网络 C1 层卷积选择的滤波器需要 3 通道，网络其它结构跟上文都是一样的。


class LeNetRGB(nn.Module):
    def __init__(self):
        super(LeNetRGB, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)   # 3表示输入是3通道
        self.relu = nn.ReLU()
        self.maxpool1 = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.maxpool2 = nn.MaxPool2d(2, 2)
 
 
        self.fc1 = nn.Linear(16*5*5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
 
 
    def forward(self, x):
        x = self.conv1(x)
        x = self.relu(x)
        x = self.maxpool1(x)
        x = self.conv2(x)
        x = self.maxpool2(x)
        x = x.view(-1, 16*5*5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        # output = F.log_softmax(x, dim=1)
        return x
# 设置GPU训练
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# 实例化模型对象
model = LeNetRGB()

# 把模型放到GPU上
model.to(device)

（3）定义优化器与损失函数

#定义优化器
optimizer = optim.Adam(model.parameters(), lr=0.001)

# 损失函数
criterion = torch.nn.CrossEntropyLoss()

（4）绘图参数列表

# 训练次数
epoch = 20

# 绘图所用
plt_epoch = []      # 横坐标，训练次数

Train_Loss = []     # 训练损失
Train_Accuracy = [] # 训练精度

Test_Loss = []      # 测试损失
Test_Accuracy = []  # 测试精度

（5）训练函数

def train_runner(model, epoch):
    total = 0       # 总样本数量
    correct =0.0    # 每轮epoch分类正确样本数量
    avg_loss = 0.0  # 每轮epoch的平均损失
 
    # 解包输入与标签向量
    for inputs, labels in trainloader:                           
        inputs, labels = inputs.to(device), labels.to(device)   # 把模型部署到device上  
        optimizer.zero_grad()                                   # 梯度清零        
        outputs = model(inputs)                                 # 保存训练结果
        loss = criterion(outputs, labels)                       # 计算损失和
        avg_loss += loss.item()                                 # 把损失累加
        loss.backward()                                         # 反向传播
        optimizer.step()                                        # 更新参数
        #dim=1表示返回每一行的最大值对应的列下标
        predict = outputs.argmax(dim=1)                         #获取最大概率的预测结果
        total += labels.size(0)                                 # 总样本数
        correct += (predict == labels).sum().item()             # 统计正确分类样本个数
    
    # 这里train_batchsize是64，向上取整，所有小数都是向着数值更大的方向取整
    batch_num = math.ceil(total/train_batchsize)
        
    # 每完成一次训练epoch，打印当前平均Loss和精度
    avg_loss /= batch_num 
    print("Train Epoch{} \t Avg_Loss: {:.6f}, accuracy: {:.6f}%".format(epoch, avg_loss, 100*(correct/total)))
    
    # 加入列表，以便于绘图
    Train_Loss.append(avg_loss)
    Train_Accuracy.append(correct/total)

（6）测试函数

def test_runner(model):
    #统计模型正确率, 设置初始值
    correct = 0.0
    test_loss = 0.0
    total = 0
    
    #torch.no_grad将不会计算梯度, 也不会进行反向传播
    with torch.no_grad():
        for data, label in testloader:
            data, label = data.to(device), label.to(device)
            output = model(data)
            test_loss += criterion(output, label).item()
            predict = output.argmax(dim=1)
            #计算正确数量
            total += label.size(0)
            correct += (predict == label).sum().item()
        
        # 这里test_batchsize是32，向上取整，所有小数都是向着数值更大的方向取整
        batch_num = math.ceil(total/test_batchsize)
            
        # 每完成一次训练epoch，打印当前平均Loss和精度
        test_loss /= batch_num 
            
        #计算损失值和精度
        print("test_avarage_loss: {:.6f}, accuracy: {:.6f}%".format(test_loss, 100*(correct/total)))
        
        # 加入列表，以便于绘图
        Test_Loss.append(test_loss)
        Test_Accuracy.append(correct/total)

（7）运行并绘图

if __name__ == '__main__':
    
    print("start_time",time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time())))
    for epoch in range(1, epoch+1):
        plt_epoch.append(epoch)
        train_runner(model, epoch)
        test_runner(model)
    print("end_time: ",time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time())),'\n')
 
    print('Finished Training')
    plt.subplot(2,2,1), plt.plot(plt_epoch, Train_Loss), plt.title('Train_Loss'), plt.grid()
    plt.subplot(2,2,2), plt.plot(plt_epoch, Train_Accuracy), plt.title('Train_Accuracy'), plt.grid()
    plt.subplot(2,2,3), plt.plot(plt_epoch, Test_Loss), plt.title('Test_Loss'), plt.grid()
    plt.subplot(2,2,4), plt.plot(plt_epoch, Test_Accuracy), plt.title('Test_Accuracy'), plt.grid()
    plt.tight_layout()
    plt.show()

（8）保存模型

print(model)
print(model)
pathfile = 'C:\\Users\\LiZhangXun\\Desktop\\经典论文\Code\\1.LeNet\\models'
save_filename = 'model-cifar10.pth'
model_path = os.path.join(pathfile, save_filename)
torch.save(model, model_path) #保存模型

（9）导入模型并验证

 
if __name__ == '__main__':
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model = torch.load(model_path) #加载模型
    model = model.to(device)
    model.eval()    #把模型转为test模式
    
    #读取要预测的图片
    # 读取要预测的图片
    img = Image.open("./pic/test_cifar10.jpg").convert('RGB') # 读取图像
    #img.show()
    plt.imshow(img) # 显示图片
    plt.axis('off') # 不显示坐标轴
    plt.show()
    
    # 导入图片，图片扩展后为[1，1，32，32]
    trans = transforms.Compose(
        [
            #将图片尺寸resize到32x32
            transforms.Resize((32,32)),
            transforms.ToTensor(),
            transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
        ])
    img = trans(img)
    img = img.to(device)
    img = img.unsqueeze(0)  #图片扩展多一维,因为输入到保存的模型中是4维的[batch_size,通道,长，宽]，而普通图片只有三维，[通道,长，宽]
    
    # 预测 
    classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
    output = model(img)
    prob = F.softmax(output,dim=1) #prob是10个分类的概率
    print("概率：",prob)
    
    value, predicted = torch.max(output.data, 1)
    predict = output.argmax(dim=1)
    pred_class = classes[predicted.item()]
    print("预测类别：",pred_class)

2.4在自定义数据集上的实现：

（1）制作图片数据的索引

对于训练集和测试集，要分别制作对应的图片数据索引，即 train.txt 和 test.txt两个文件，每个 txt 中包含每个图片的目录和对应类别class。示意图如下：
在这里插入图片描述
运行脚本之后就在 ./data/LEDNUM/ 目录下生成 train.txt 和 test.txt 两个索引文件。

import os

# # dir是当前文件的上级目录
dir1 = os.path.dirname(os.path.abspath("__file__")) # 当前文件的上级目录
dir2 = os.path.dirname(os.path.abspath(dir1)) # 当前文件的上上级目录
# print(dir1)
# print(dir2)

# 训练集文本地址 
train_txt_path = os.path.join(dir1, "data", "LEDNUM", "train.txt")
# 测试集文本地址
valid_txt_path = os.path.join(dir1, "data", "LEDNUM", "test.txt")

# 训练集data地址
train_dir = os.path.join(dir2, "dataset", "LEDNUM", "train_data")
# 测试集data地址
valid_dir = os.path.join(dir2, "dataset", "LEDNUM", "test_data")
 
 
def gen_txt(txt_path, img_dir):
    f = open(txt_path, 'w')
 
    for root, s_dirs, _ in os.walk(img_dir, topdown=True):  # 获取 train文件下各文件夹名称
        for sub_dir in s_dirs:
            i_dir = os.path.join(root, sub_dir)             # 获取各类的文件夹 绝对路径
            img_list = os.listdir(i_dir)                    # 获取类别文件夹下所有png图片的路径
            for i in range(len(img_list)):
                if not img_list[i].endswith('jpg'):         # 若不是png文件，跳过
                    continue
                label = img_list[i].split('_')[0]
                img_path = os.path.join(i_dir, img_list[i])
                line = img_path + ' ' + label + '\n'
                f.write(line)
    f.close()
 
 
if __name__ == '__main__':
    gen_txt(train_txt_path, train_dir)
    gen_txt(valid_txt_path, valid_dir)

（2）构建Dataset子类

pytorch 加载自己的数据集，需要写一个继承自 torch.utils.data 中 Dataset 类，并修改其中的 init 方法、getitem 方法、len 方法。
默认加载的都是图片，init 的目的是得到一个包含数据和标签的 list，每个元素能找到图片位置和其对应标签。
然后用 getitem 方法得到每个元素的图像像素矩阵和标签，返回 img 和 label。

getitem 是核心函数：

self.imgs 是一个 list，self.imgs[index] 是一个 str，包含图片路径，图片标签，这些信息是从上面生成的txt文件中读取；
利用 Image.open 对图片进行读取，注意这里的 img 是单通道还是三通道的；
self.transform(img) 对图片进行处理，这个 transform 里边可以实现减均值、除标准差、随机裁剪、旋转、翻转、放射变换等操作。

from PIL import Image
from torch.utils.data import Dataset
 
 
class MyDataset(Dataset):
    def __init__(self, txt_path, transform = None, target_transform = None):
        fh = open(txt_path, 'r')
        imgs = []
        for line in fh:
            line = line.rstrip()
            words = line.split()
            imgs.append((words[0], int(words[1])))
            self.imgs = imgs 
            self.transform = transform
            self.target_transform = target_transform
    def __getitem__(self, index):
        fn, label = self.imgs[index]
        #img = Image.open(fn).convert('RGB') 
        img = Image.open(fn)
        if self.transform is not None:
            img = self.transform(img) 
        return img, label
    def __len__(self):
        return len(self.imgs)

（3）导入数据并预处理
当 Mydataset构建好，剩下的操作就交给 DataLoder。
在 DataLoder 中，会触发 Mydataset 中的 getiterm 函数读取一张图片的数据和标签，并拼接成一个 batch 返回，作为模型真正的输入。

import os
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
import time
from matplotlib import pyplot as plt
import math
pipline_train = transforms.Compose([
    #随机旋转图片
    transforms.RandomHorizontalFlip(),
    #将图片尺寸resize到32x32
    transforms.Resize((32,32)),
    #将图片转化为Tensor格式
    transforms.ToTensor(),
    #正则化(当模型出现过拟合的情况时，用来降低模型的复杂度)
    transforms.Normalize((0.1307,),(0.3081,))    
])

pipline_test = transforms.Compose([
    #将图片尺寸resize到32x32
    transforms.Resize((32,32)),
    transforms.ToTensor(),
    transforms.Normalize((0.1307,),(0.3081,))
])

train_batchsize = 16    # 训练集batch大小
test_batchsize = 16     # 测试集batch大小

train_data = MyDataset('./data/LEDNUM/train.txt', transform=pipline_train)
test_data = MyDataset('./data/LEDNUM/test.txt', transform=pipline_test)
 
#加载数据集
trainloader = torch.utils.data.DataLoader(train_data, batch_size=train_batchsize, 
                                          shuffle=True, drop_last=False)
testloader = torch.utils.data.DataLoader(test_data, batch_size=test_batchsize, 
                                         shuffle=False, drop_last=False)

（4）搭建网络结构

class LeNet(nn.Module):
    def __init__(self):
        super(LeNet, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 5)   
        self.relu = nn.ReLU()
        self.maxpool1 = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.maxpool2 = nn.MaxPool2d(2, 2)
 
 
        self.fc1 = nn.Linear(16*5*5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
 
 
    def forward(self, x):
        x = self.conv1(x)
        x = self.relu(x)
        x = self.maxpool1(x)
        x = self.conv2(x)
        x = self.maxpool2(x)
        x = x.view(-1, 16*5*5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        # output = F.log_softmax(x, dim=1)
        return x
# 设置GPU训练
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# 实例化模型对象
model = LeNet()

# 把模型放到GPU上
model.to(device)

（5）定义优化器与损失函数

#定义优化器
optimizer = optim.Adam(model.parameters(), lr=0.001)

# 损失函数
criterion = torch.nn.CrossEntropyLoss()

（6）绘图参数列表


# 训练次数
epoch = 20

# 绘图所用
plt_epoch = []      # 横坐标，训练次数

Train_Loss = []     # 训练损失
Train_Accuracy = [] # 训练精度

Test_Loss = []      # 测试损失
Test_Accuracy = []  # 测试精度

（7）训练函数

def train_runner(model, epoch):
    total = 0       # 总样本数量
    correct =0.0    # 每轮epoch分类正确样本数量
    avg_loss = 0.0  # 每轮epoch的平均损失
 
    # 解包输入与标签向量
    for inputs, labels in trainloader:                           
        inputs, labels = inputs.to(device), labels.to(device)   # 把模型部署到device上  
        optimizer.zero_grad()                                   # 梯度清零        
        outputs = model(inputs)                                 # 保存训练结果
        loss = criterion(outputs, labels)                       # 计算损失和
        avg_loss += loss.item()                                 # 把损失累加
        loss.backward()                                         # 反向传播
        optimizer.step()                                        # 更新参数
        #dim=1表示返回每一行的最大值对应的列下标
        predict = outputs.argmax(dim=1)                         #获取最大概率的预测结果
        total += labels.size(0)                                 # 总样本数
        correct += (predict == labels).sum().item()             # 统计正确分类样本个数
    
    # 这里train_batchsize是8，向上取整，所有小数都是向着数值更大的方向取整
    batch_num = math.ceil(total/train_batchsize)
        
    # 每完成一次训练epoch，打印当前平均Loss和精度
    avg_loss /= batch_num 
    print("Train Epoch{} \t Avg_Loss: {:.6f}, accuracy: {:.6f}%".format(epoch, avg_loss, 100*(correct/total)))
    
    # 加入列表，以便于绘图
    Train_Loss.append(avg_loss)
    Train_Accuracy.append(correct/total)

（8）测试函数

def test_runner(model):
    #统计模型正确率, 设置初始值
    correct = 0.0
    test_loss = 0.0
    total = 0
    
    #torch.no_grad将不会计算梯度, 也不会进行反向传播
    with torch.no_grad():
        for data, label in testloader:
            data, label = data.to(device), label.to(device)
            output = model(data)
            test_loss += criterion(output, label).item()
            predict = output.argmax(dim=1)
            #计算正确数量
            total += label.size(0)
            correct += (predict == label).sum().item()
        
        # 这里test_batchsize是4，向上取整，所有小数都是向着数值更大的方向取整
        batch_num = math.ceil(total/test_batchsize)
            
        # 每完成一次训练epoch，打印当前平均Loss和精度
        test_loss /= batch_num 
            
        #计算损失值和精度
        print("test_avarage_loss: {:.6f}, accuracy: {:.6f}%".format(test_loss, 100*(correct/total)))
        
        # 加入列表，以便于绘图
        Test_Loss.append(test_loss)
        Test_Accuracy.append(correct/total)

（9）运行并绘图

if __name__ == '__main__':
    
    print("start_time",time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time())))
    for epoch in range(1, epoch+1):
        plt_epoch.append(epoch)
        train_runner(model, epoch)
        test_runner(model)
    print("end_time: ",time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time())),'\n')
 
    print('Finished Training')
    plt.subplot(2,2,1), plt.plot(plt_epoch, Train_Loss), plt.title('Train_Loss'), plt.grid()
    plt.subplot(2,2,2), plt.plot(plt_epoch, Train_Accuracy), plt.title('Train_Accuracy'), plt.grid()
    plt.subplot(2,2,3), plt.plot(plt_epoch, Test_Loss), plt.title('Test_Loss'), plt.grid()
    plt.subplot(2,2,4), plt.plot(plt_epoch, Test_Accuracy), plt.title('Test_Accuracy'), plt.grid()
    plt.tight_layout()
    plt.show()

（10）保存模型

print(model)
print(model)
pathfile = 'C:\\Users\\LiZhangXun\\Desktop\\经典论文\Code\\1.LeNet\\models'
save_filename = 'model-owndata.pth'
model_path = os.path.join(pathfile, save_filename)
torch.save(model, model_path) #保存模型
import cv2
import numpy as np

（11）导入模型并验证

if __name__ == '__main__':
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model = torch.load(model_path) #加载模型
    model = model.to(device)
    model.eval()    #把模型转为test模式
    
    #读取要预测的图片
    # 读取要预测的图片
    img = cv2.imread("./pic/test_led.png") # 读取图像
    
    # resize图片的大小
    img=cv2.resize(img, dsize=(32,32),interpolation=cv2.INTER_NEAREST)

    plt.imshow(img, cmap="gray") # 显示图片
    plt.axis('off') # 不显示坐标轴
    plt.show()
 
 
    # 导入图片，图片扩展后为[1，1，32，32]
    trans = transforms.Compose(
        [
            transforms.ToTensor(),
            transforms.Normalize((0.1307,), (0.3081,))
        ])
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) #图片转为灰度图，因为数据集都是灰度图
    img = trans(img)
    img = img.to(device)
    img = img.unsqueeze(0)  #图片扩展多一维,因为输入到保存的模型中是4维的[batch_size,通道,长，宽]，而普通图片只有三维，[通道,长，宽]
 
 
    # 预测 
    output = model(img)
    prob = F.softmax(output,dim=1) #prob是10个分类的概率
    print("概率：",prob)
    value, predicted = torch.max(output.data, 1)
    predict = output.argmax(dim=1)
    print("预测类别：",predict.item())