[动手学深度学习]卷积神经网络LeNet学习笔记

夏莉莉iy

已于 2023-08-04 13:51:19 修改

阅读量141

点赞数

分类专栏：深度学习笔记文章标签：深度学习 cnn 学习人工智能笔记数据分析 python

于 2023-08-04 00:35:01 首次发布

本文链接：https://blog.csdn.net/Sherlily/article/details/132084036

版权

深度学习笔记专栏收录该内容

12 篇文章 4 订阅

订阅专栏

动手学深度学习-李沐：6.6. 卷积神经网络（LeNet） — 动手学深度学习 2.0.0 documentation (d2l.ai)

动手学深度学习-李沐（pdf）：zh-v2.d2l.ai/d2l-zh-pytorch.pdf

23 经典卷积神经网络 LeNet 【动手学深度学习】（bilibili视频）：23 经典卷积神经网络 LeNet【动手学深度学习v2】_哔哩哔哩_bilibili

LeCun, Y. et al. (1998) 'Gradient-Based Learning Applied to Document Recognition', Proceedings of the IEEE, vol. 86, issue 11, pp 2278-2324. doi: 10.1109/5.726791

1. LeNet

1.1. 整体实现步骤

1.2. LeNet理念

1.3. LeNet代码实现（动手学深度学习-李沐，pytorch）

2.4. Convolutional Neural Networks for Isolated Character Recognition

2.5. Results and Comparison with Other Methods

1. LeNet

1.1. 整体实现步骤

⭐这是李沐的代码步骤，但是因为结合了很多之前的东西观感不是很好，不如网上自己找整体代码

（1）导入需要的库

（2）定义LeNet类或nn.Sequential()

（3）模型训练

①定义batch大小

②导入数据集

（4）设置精度评估函数

（5）设置训练函数（不知道为什么感觉李沐的训练函数总是那么复杂..?可能调用了不同章节的函数，导致看着名字一头雾水）

（6）设置lr和epoch

（7）运行

1.2. LeNet理念

（1）LeNet是最早发布的卷积神经网络之一

（2）卷积层示意图

①变形后得到32*32的数字或字母图（识别数字和字母应该已经将图片二值化了，黑的是1白的是0）

②用六个5*5的卷积核，采用步长为一的方法得到6个28*28的通道

③⭐2*2平均池化（共享权重）

④用十六个5*5的卷积核，采用步长为一的方法得到16个10*10的通道（张老师说每次卷积都会把左边6个通道的值一一加总，所以用几个卷积核就会变出来几个新的通道）

④2*2平均池化

⑤展平，会展出来16*5*5=400个数字

⑥第一次全连接，输出通道设置为120

⑦Sigmoid()激活函数

⑧第二次全连接，输出通道设置为84

⑨Sigmoid()激活函数

⑩按理来说第三次是高斯连接，输出通道设置为10，但是李沐把高斯激活去掉了，因此在李沐的代码里是纯纯第三次全连接（可能是因为现在softmax取代了高斯连接）

（3）李沐的LeNet简化版，去掉了最后一层高斯激活

（4）采用卷积核卷积生成局部感受野（ps.增加卷积核大小、步长短的池化、空洞卷积，都能增大感受野）

（5）比多层感知机使用的参数更少（？）

1.3. LeNet代码实现（动手学深度学习-李沐，pytorch）

（1）LeNet本体

import torch
from torch import nn
from d2l import torch as d2l

net = nn.Sequential(
    """padding=2代表着对原始28*28的图外圈补两层0，这样就变成了32*32"""
    nn.Conv2d(1, 6, kernel_size=5, padding=2), nn.Sigmoid(),
    nn.AvgPool2d(kernel_size=2, stride=2),
    nn.Conv2d(6, 16, kernel_size=5), nn.Sigmoid(),
    nn.AvgPool2d(kernel_size=2, stride=2),
    nn.Flatten(),
    nn.Linear(16 * 5 * 5, 120), nn.Sigmoid(),
    nn.Linear(120, 84), nn.Sigmoid(),
    nn.Linear(84, 10))

（2）调用LeNet并显示每个卷积层形状（可以不用做这件事吧）

"""李沐在这里采用了28*28的方式，而LeNet是32*32。所以李沐在卷积层第一层补了padding"""
X = torch.rand(size=(1, 1, 28, 28), dtype=torch.float32)
for layer in net:
    X = layer(X)
    print(layer.__class__.__name__,'output shape: \t',X.shape)

1.4. LeNet的弊端和局限

（1）在大规模图像上效果不好

2. LeNet论文原文学习

2.1. Abstract

（1）Convolutional Neural Networks are useful for classification.

（2）Graph Transformer Networks (GTN) with gradient decreasing is able to recognize documents.

（3）Introduce two online handwriting recognition.

（4）A network for recognize checks.

2.2. Abbriviation

GT：Graph transformer.
GTN：Graph transformer network.
HMM：Hidden Markov model.
HOS：Heuristic oversegmentation.
K-NN：K-nearest neighbor.
NN：Neural network.
OCR：Optical character recognition.
PCA：Principal component analysis.
RBF：Radial basis function.
RS-SVM：Reduced-set support vector method.
SDNN：Space displacement neural network.
SVM：Support vector method.
TDNN：Time delay neural network.
V-SVM：Virtual support vector method.

2.3. Introduction

（1）Mention the success of machine learning.

（2）Argue automation is better than hand-designed.

（3）The combination of auto learning and hand-crafts is for handling variety of natural raw data. Moreover, give a general conceptual graph.

（4）Using three evidences to prove how important of feature extractors.

（5）A brief introduction of each section.

（6）Learning from data：approve the effectiveness of gradient, loss and weight and propose a formula. （机翻：其中P是训练样本的数量，h是“有效容量”或机器复杂性的度量，α是一个介于0:5和1:0之间的数字，k是一个常数。这个差距总是随着训练样本数量的增加而减小。此外，随着容量h的增加，Etrain减小。）（我不太能get到它右边那些参数）

（7）Gradient-Based Learning：refer to gradient algorithm.

（8）Gradient Back-Propagation：state the meaning of back propagation.

（9）Learning in Real Handwriting Recognition Systems：explain the importance of separate characters from sentences. However, separating single letter takes time and sometimes causes confused. Hence, training a whole character string might get better results.

（10）Globally Trainable Systems：quite complex the section is. Most of the time I believe it represents the relationship between graphical connections, backpropagation, and Graph Transformer Networks of multiple modules.

2.4. Convolutional Neural Networks for Isolated Character Recognition

（1）它在说啥啊？

（2）Convolutional Networks：local receptive fields can distinguish edges, end-points and corners

（3）LeNet-5：introduce each layer and unit in detail, but using overly complex and specialized vocabulary（这里敲敲中文，他意思是他数据库里面的字符是图像28*28并且字母只占20*20的，然后他就选了32*32，让字母在他感受野的中心，可以加速学习（这个可能需要理论和实践结合一下？））

（4）Loss Function：using Maximum likelihood estimation to calculate minimum mean square error.