『深度学习模型实现与技术总结』——LeNet(Pytorch转化PaddlePaddle实现)

一、简介

 LeNet是最早的卷积神经网络之一,诞生于1994年。通常说的LeNet一般是指LeNet经过多次迭代后的LeNet-5,是由Yann LeCun在1998年的论文"GradientBased Learning Applied to DocumentRecognition"中提出的用于手写字符识别的高效卷积神经网络。它是卷积神经网络的开山之作,大大推动了深度学习领域的发展。


二、网络结构

论文中LeNet的网络结构图
 LeNet-5提出年代较早,面对当时计算能力的限制,为了降低参数量,减小算力需求,在一些细节实现上与现在有所区别。网络结构共有7层,主要包括卷积层、降采样层、全连接层。

1.网络相关参数计算公式

  • 输出尺寸计算:

( 1 )    C o n v   L a y e r   o u t p u t   s i z e = i n p u t   s i z e − k e r n e l   s i z e + 2 ∗ p a d d i n g s t r i d e + 1 (1)\ \ Conv\ Layer\ output\ size = \frac{input\ size - kernel\ size + 2*padding}{stride}+1 (1)  Conv Layer output size=strideinput sizekernel size+2padding+1
( 2 )    P o o l   L a y e r   o u t p u t   s i z e = i n p u t   s i z e − p o o l   s i z e s t r i d e + 1 (2)\ \ Pool\ Layer\ output\ size = \frac{input\ size - pool\ size}{stride}+1 (2)  Pool Layer output size=strideinput sizepool size+1

  • 模型参数计算:

( 1 )    C o n v   L a y e r   P a r a m e t e r s = k e r n e l   s i z e 2 ∗ k e r n e l   n u m ∗ i n p u t   c h a n n e l s + b i a s   n u m ( = k e r n e l   n u m ) (1)\ \ Conv\ Layer\ Parameters = kernel\ size^2*kernel\ num * input\ channels+bias\ num(=kernel\ num) (1)  Conv Layer Parameters=kernel size2kernel numinput channels+bias num(=kernel num)
( 2 )    F C   L a y e r   P a r a m e t e r s = i n p u t   s i z e 2 ∗ i n p u t   c h a n n e l s ∗ F + b i a s   n u m ( = F ) F : 全 连 接 层 神 经 元 数 量 (2)\ \ FC\ Layer\ Parameters = input\ size^2*input\ channels * F+bias\ num(=F)\\F:全连接层神经元数量 (2)  FC Layer Parameters=input size2input channelsF+bias num(=F)F:

2.模型构成

  • 模型输入
     32*32 手写字符图片

  • C1(卷积层)
     6个5*5卷积核 → feature maps:6*28*28(28=32-5+1)
     可训练参数:5*5*6*1+6=156

  • S2(降采样层)
     加和并乘上系数,加上bias,再通过Tanh激活 → feature maps:6*14*14( 14 = 28 − 2 2 + 1 14=\frac{28-2}{2}+1 14=2282+1)
     一般LeNet的实现以及现在的使用都是通过平均/最大池化层而非原论文的降采样层。但原论文中的降采样层具有很大的参考意义,此处完整实现原论文的降采样层。
     真正的池化层不存在可训练可训练参数,此处存在可训练参数(自定义的权重与偏置):(1+1)*6=12

  • C3(特殊的卷积层)
     6个5*5卷积核 → feature maps 16*10*10(14-5+1)
     此卷积层与现在我们常说的卷积层不同,为了降低参数量,减轻算力负担,作者采用的是特征图的互补子集(16种组合)来进行计算,而并非现在的多通道完全卷积,现在使用的卷积包含了原计算图。
     原论文实现可训练参数:6*(3*5*5+1)+6*(4*5*5+1)+3*(4*5*5+1)+1*(6*5*5+1)=1516
     现卷积实现可训练参数:5*5*16*6+16=2416

  • S4(降采样层)
     feature maps:16*5*5
     可训练参数:2*16=32

  • C5(与上一层全连接的卷积层)
     120个5*5卷积核 → feature maps:120*1*1
     可训练参数量:5*5*120*16+120=48120

  • F6(全连接层)
     前一层神经元数量:120 → 当前层神经元数量:84
     可训练参数:120*84+84=10164

  • OUTPUT(采用RBF的全连接层)
     Gaussian Connections:84 → 10(对应输出的10类字符)
     原文采用的是RBF径向基函数(Radial Basis Function)的网络连接方式,相关参数设置是特适用于论文数据集的编码,不做深入。
     可训练参数量为84*10+10=850

总可训练参数量 = 61750 = 0.06175M。通常参数为float32,即32bit。32bit = 8Byte。
模型大小 = 0.06175M*32bit = 1.976Mb=0.247MB


三、代码实现(pytorch)

1.包导入与基本参数设置

import torch
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
from torchsummary import summary
from torch import nn
import matplotlib.pyplot as plt

# 超参配置
epochs = 10
batch_size = 32
learn_rate = 1e-3

2.降采样实现

class DownSample(nn.Module):
    def __init__(self, in_channels, kernel_size=2, stride=2):
        super(DownSample, self).__init__()
        self.in_channels = in_channels
        self.sum_4 = nn.AvgPool2d(kernel_size=kernel_size, stride=stride)   # 使用平均池化代替求和,由于此权重可学习,可视为等效
        self.weight = nn.Parameter(torch.randn(in_channels), requires_grad=True)    #  添加in_channels个可学习权重参数
        self.bias = nn.Parameter(torch.randn(in_channels), requires_grad=True)  # 添加in_channels个可学习偏置参数

    def forward(self, feature):  # Eg.feature.shape(-1,6,28,28)
        sample_outputs = []
        feature = self.sum_4(feature)  # Eg.feature.shape(-1,6,14,14)

        for i in range(self.in_channels):
            sample_output = feature[:, i] * self.weight[i] + self.bias[i]  # Eg.sample_output.shape(-1,14,14)
            sample_output = sample_output.unsqueeze(1)  # Eg.sample_output.shape(-1,1,14,14)
            sample_outputs.append(sample_output)
        return torch.cat(sample_outputs, 1)  # Eg.sample_output.shape(-1,6,14,14)

3.LeNet网络实现

class LeNet(nn.Module):
    def __init__(self):
        super(LeNet, self).__init__()
        self.con_sam = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5),  # 1*32*32->6*28*28
            nn.Tanh(),
            DownSample(in_channels=6, kernel_size=2, stride=2),  # 6*28*28->6*14*14
            nn.Tanh(),
            nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5),  # 6*14*14->16*10*10
            nn.Tanh(),
            DownSample(in_channels=16, kernel_size=2, stride=2),  # 16*10*10->16*5*5
            nn.Tanh(),
            nn.Conv2d(in_channels=16, out_channels=120, kernel_size=5),  # 16*5*5->120*1*1
            nn.Tanh(),
            nn.Flatten(),  # 展平 120*1*1->120
        )
        self.fc = nn.Sequential(
            nn.Linear(in_features=120, out_features=84),
            nn.Tanh(),
            nn.Linear(in_features=84, out_features=10)
        )

    def forward(self, input):   # input.shape(-1, 1, 32, 32)
        output = self.con_sam(input)    # output.shape(-1, 120)
        # 展平操作也可采用下列几种方法
        # output = torch.squeeze(output)
        # output = output.reshape(120)
        # output = output.view(120)
        # output = output.flatten()
        # output = torch.flatten(output)
        output = self.fc(output)
        return output

4.查看网络结构与参数

net = LeNet().cuda()
summary(net, (1, 32, 32))   # 查看网络结构

在这里插入图片描述

5.训练测试


transform = transforms.Compose([torchvision.transforms.Resize(32), transforms.ToTensor()])
train_data = torchvision.datasets.MNIST('./mnist', train=True, transform=transform, download=True)
test_data = torchvision.datasets.MNIST('./mnist', train=False, transform=transform, download=True)   # 28*28

print('train_data:{}, test_data:{}'.format(len(train_data), len(test_data)))

# 查看一个数据
# data1 = train_data[0][0].numpy().squeeze()  # 需要去掉多余维度
# plt.imshow(data1)
# plt.show()

train_loader = DataLoader(dataset=train_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(dataset=test_data, batch_size=batch_size)

optimizer = torch.optim.Adam(net.parameters(), lr=learn_rate)
loss_func = torch.nn.CrossEntropyLoss()

for epoch in range(epochs):
    print('epoch {}'.format(epoch+1))

    # train
    net.train()  # 训练模型
    train_loss, train_correct = 0, 0
    for _, data in enumerate(train_loader, 0):
        batch_data, batch_label = data
        batch_data, batch_label = batch_data.cuda(), batch_label.cuda()  # 数据移至GPU batch_data.shape(1,1,32,32)
        batch_pred = net(batch_data)    # predict.shape(batch_size,10)  输出的每个样本的10个值代表10个输出类别的概率,取最大作为预测类别
        predict_correct = torch.max(batch_pred, 1)[1]  # 1:返回每一行的最大值 [1]:返回最大值的索引  train_correct: 预测结果的序列
        predict_correct = (predict_correct == batch_label).sum()    # 预测与真实比较,求和得到该批次正确预测的数量
        train_correct += predict_correct.item()   # 累加得此epoch正确预测的总数量,以计算准确率,使用item()获取具体数值

        loss = loss_func(batch_pred, batch_label)
        optimizer.zero_grad()   # 梯度清零
        loss.backward()  # 反向传播
        optimizer.step()    # 根据梯度更新网络参数
        train_loss += loss.item()   # batch累计loss
    print('Train Loss: {:.6f}, Acc: {:.6f}'.format(train_loss / (len(train_data)), train_correct / (len(train_data))))

    # test
    net.eval()  # 测试模型
    with torch.no_grad():   # 不计算梯度,进一步加速、节省显存
        test_loss, test_correct = 0, 0
        for _, data in enumerate(test_loader, 0):
            batch_data, batch_label = data
            batch_data, batch_label = batch_data.cuda(), batch_label.cuda()
            batch_pred = net(batch_data)
            predict_correct = torch.max(batch_pred, 1)[1]
            predict_correct = (predict_correct == batch_label).sum()
            test_correct += predict_correct.item()

            loss = loss_func(batch_pred, batch_label)
            test_loss += loss.item()
    print('Test Loss: {:.6f}, Acc: {:.6f}'.format(test_loss / (len(test_data)), test_correct / (len(test_data))))

print('End of the training')

四、PaddlePaddle实现

将Pytorch代码转化为PaddlePaddle实现

import paddle
import numpy
import paddle.nn as nn
from paddle.vision.datasets import MNIST
from paddle.vision.transforms import Compose, Resize, ToTensor
from paddle.io import DataLoader

epochs = 10
batch_size = 64
learning_rate = 1e-3

class DownSample(nn.Layer):
    def __init__(self, in_channels, kernel_size=2, stride=2):
        super(DownSample, self).__init__()
        self.in_channels = in_channels
        self.sum_4 = nn.AvgPool2D(kernel_size=kernel_size, stride=stride)   # 使用平均池化代替求和,由于此权重可学习,可视为等效
        self.weight = paddle.static.create_parameter(shape=[in_channels], dtype='float32')    #  添加in_channels个可学习权重参数
        self.bias = paddle.static.create_parameter(shape=[in_channels], dtype='float32', is_bias=True)  # 添加in_channels个可学习偏置参数

    def forward(self, feature):  # Eg.feature.shape(-1,6,28,28)
        sample_outputs = []
        feature = self.sum_4(feature)  # Eg.feature.shape(-1,6,14,14)

        for i in range(self.in_channels):
            sample_output = feature[:, i] * self.weight[i] + self.bias[i]  # Eg.sample_output.shape(-1,14,14)
            sample_output = sample_output.unsqueeze(1)  # Eg.sample_output.shape(-1,1,14,14)
            sample_outputs.append(sample_output)
        return paddle.concat(sample_outputs, 1)  # Eg.sample_output.shape(-1,6,14,14)

class LeNet(nn.Layer):
    def __init__(self):
        super(LeNet, self).__init__()
        self.con_sam = nn.Sequential(
            nn.Conv2D(in_channels=1, out_channels=6, kernel_size=5),  # 1*32*32->6*28*28
            nn.Tanh(),
            DownSample(in_channels=6, kernel_size=2, stride=2),  # 6*28*28->6*14*14
            nn.Tanh(),
            nn.Conv2D(in_channels=6, out_channels=16, kernel_size=5),  # 6*14*14->16*10*10
            nn.Tanh(),
            DownSample(in_channels=16, kernel_size=2, stride=2),  # 16*10*10->16*5*5
            nn.Tanh(),
            nn.Conv2D(in_channels=16, out_channels=120, kernel_size=5),  # 16*5*5->120*1*1
            nn.Tanh(),
            nn.Flatten(),  # 展平 120*1*1->120
        )
        self.fc = nn.Sequential(
            nn.Linear(in_features=120, out_features=84),
            nn.Tanh(),
            nn.Linear(in_features=84, out_features=10)
        )

    def forward(self, input):   # input.shape(-1, 1, 32, 32)
        output = self.con_sam(input)    # output.shape(-1, 120)
        # 展平操作也可采用下列几种方法
        # output = paddle.squeeze(output)
        # output = output.reshape(120)
        # output = output.view(120)
        # output = output.flatten()
        # output = paddle.flatten(output)
        output = self.fc(output)
        return output
net = LeNet()
paddle.summary(net, (-1, 1, 32, 32))   # 查看网络结构

transform = Compose([Resize(32), ToTensor()])
train_data = MNIST(mode='train', transform=transform, download=True)
test_data = MNIST(mode='test', transform=transform, download=True)   # 28*28

print('train_data:{}, test_data:{}'.format(len(train_data), len(test_data)))

# 查看一个数据
# data1 = train_data[0][0].numpy().squeeze()  # 需要去掉多余维度
# plt.imshow(data1)
# plt.show()

train_loader = DataLoader(dataset=train_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(dataset=test_data, batch_size=batch_size)

optimizer = paddle.optimizer.Adam(parameters=net.parameters(), learning_rate=learning_rate)
loss_func = paddle.nn.CrossEntropyLoss()

for epoch in range(epochs):
    print('epoch {}'.format(epoch+1))

    # train
    net.train()  # 训练模型
    train_loss, train_correct = 0, 0
    for _, data in enumerate(train_loader, 0):
        batch_data, batch_label = data
        batch_pred = net(batch_data)    # predict.shape(batch_size,10)  输出的每个样本的10个值代表10个输出类别的概率,取最大作为预测类别
        predict_correct = paddle.argmax(batch_pred, 1)
        batch_label = batch_label.squeeze()
        predict_correct = (predict_correct == batch_label).numpy().sum()    # 预测与真实比较,求和得到该批次正确预测的数量
        train_correct += predict_correct.item()   # 累加得此epoch正确预测的总数量,以计算准确率,使用item()获取具体数值
        loss = loss_func(batch_pred, batch_label)
        optimizer.clear_grad()   # 梯度清零
        loss.backward()  # 反向传播
        optimizer.step()    # 根据梯度更新网络参数
        train_loss += loss.numpy()   # batch累计loss
    print('Train Loss: {:.6f}, Acc: {:.6f}'.format(train_loss[0] / (len(train_data)), train_correct / (len(train_data))))

    # test
    net.eval()  # 测试模型
    with paddle.no_grad():   # 不计算梯度,进一步加速、节省显存
        test_loss, test_correct = 0, 0
        for _, data in enumerate(test_loader, 0):
            batch_data, batch_label = data
            batch_pred = net(batch_data)
            predict_correct = paddle.argmax(batch_pred, 1)
            batch_label = batch_label.squeeze()
            predict_correct = (predict_correct == batch_label).numpy().sum()
            test_correct += predict_correct.item()
            loss = loss_func(batch_pred, batch_label)
            test_loss += loss.numpy()
    print('Test Loss: {:.6f}, Acc: {:.6f}'.format(test_loss[0] / (len(test_data)), test_correct / (len(test_data))))

print('End of the training')


五、相关参考

  1. 论文原文:Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, “Gradient-based learning applied to document recognition,” in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998, doi: 10.1109/5.726791.
  2. 深度学习pytorch复现——模型篇——LeNet
  3. 关于LeNet-5的一些细节
  4. LeNet-5 demos
  5. 多通道图像卷积与参数计算
  6. 模型大小与参数量计算
  7. PaddlePaddle官方文档
  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值