文献阅读——Deep learning（一）

最新推荐文章于 2024-10-30 21:38:57 发布

Pluto_XH

最新推荐文章于 2024-10-30 21:38:57 发布

阅读量1k

点赞数

分类专栏： deep learning

deep learning 专栏收录该内容

3 篇文章

订阅专栏

这篇文献综述介绍了深度学习的基本概念，包括监督学习的目标函数和随机梯度下降法，以及卷积神经网络的局部连接、权值共享和池化操作。深度学习通过多层表示学习自动发现数据特征，其中BP算法在反向传播过程中计算梯度。文章还讨论了RNN和LSTM在处理序列数据时的角色。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

文章目录

文献介绍
总述
监督学习
- 目标函数（objective function）
- 随机梯度下降（stochastic gradient descent）
BP算法
卷积神经网络（Convolutional neural networks）
递归神经网络（Recurrent neural networks）
LSTM 神经网络（Long short-term memory networks）

文献介绍

Title: Deep learning
Author: Yann LeCun^1,2, Yoshua Bengio³, Geoffrey Hinton^4,5
¹Facebook AI Research, ²New York University,³Department of Computer Science and Operations Research Université de Montréal, ⁴Google,⁵Department of Computer Science, University of Toronto, ⁶King’s College Road
http://pages.cs.wisc.edu/~dyer/cs540/handouts/deep-learning-nature2015.pdf
A review of Deep learning. Introduce Supervised learning, Backpropagation, Convolutional neural networks and Recurrent neural networks

总述

机器学习应用非常广泛——Machine-learning technology powers many aspects of modern society.
传统机器学习有局限性——Conventional machine-learning techniques were limited in their ability to process natural data in their raw form. Constructing a pattern-recognition or machine-learning system required careful engineering and considerable domain expertise
表示学习可以自动发现数据特征——Representation learning is a set of methods that allows a machine to be fed with raw data and to automatically discover the representations needed for detection or classification.
深度学习就是多层表组学习——Deep-learning methods are representation-learning methods with multiple levels of representa- tion, obtained by composing simple but non-linear modules that each transform the representation at one level (starting with the raw input) into a representation at a higher, slightly more abstract level.
深度学习中的每层的特征不是人为规定的，而是通过学习过程从数据中学习到的——The key aspect of deep learning is that these layers of features are not designed by human engineers: they are learned from data using a general-purpose learning procedur

监督学习

目标函数（objective function）

引入目标函数，用来计算输出值与期望值/实际值之间的误差(或偏差）。通过修改模型内部可调参数来降低误差值。这些可调参数通常被称为权重，它们是实数，可以被视为定义机器输入输出功能的“旋钮”。

随机梯度下降（stochastic gradient descent）

利用梯度下降法寻找目标函数的最小值，通常采用随机梯度下降法（SGD)，每次在所有的样本中随机选一个样本，计算目标函数关于权重的偏导（梯度），让权重向负梯度的方向变化。多次重复，直到目标函数的值不变或者变化很小，才停止迭代。它之所以被称为随机，是因为每个小样本集都给出了所有样本平均梯度的噪声估计。

BP算法

计算目标函数梯度的反向传播过程只不过是导数链式法则的一个实际应用。输入模块的导数（或梯度）可以通过对输出模块（或后续输入模块）的导数进行反向计算得到，并可以进一步计算出与每个模块权重相关的梯度。

在上世纪90年代末，神经网络和反向传播在很大程度上被机器学习领域所抛弃。人们普遍认为，学习出多阶段的、几乎没有先验知识的特征提取器是不可行的。特别是，人们普遍认为，简单的梯度下降会陷入局部极小值权重配置不佳（poor local minima — weight configurations）的情况。在这种情况下，任何小的变化都不能减少平均误差。

直到2006，一种无监督学习方法（unsupervised learning procedures） 被提出，这种方法可以在不需要标记数据的情况下创建多层特征检测器。每一层特征检测器的学习目的是能够重构或模拟下一层特征检测器(或原始输入)。通过“预训练”出几层越来越复杂的特征检测器，可以将深度网络的权重初始化到合适的值，然后使用标准的反向传播算法进行调整。而且对于一些小数据集，无监督学习方法可以避免过拟合。

卷积神经网络（Convolutional neural networks）

卷积层（Convolutional layers）

局部连接（local connections）

Units in a convolutional layer are organized in feature maps, within which each unit is connected to local patches in the feature maps of the previous layer through a set of weights called a filter bank.

权值共享（shared weights）

The result of this local weighted sum is then passed through a non-linearity such as a ReLU. All units in a feature map share the same filter bank. Different feature maps in a layer use different filter banks.

采样层（Pooling layers）

池化（pooling）

The role of the pooling layer is to merge semantically similar features into one. A typical pooling unit computes the maximum of a local patch of units in one feature map (or in a few feature maps). Neighbouring pooling units take input from patches that are shifted by more than one row or column, thereby reducing the dimension of the representation and creating an invariance to small shifts and distortions.

多层（the use of many layers）

Two or three stages of convolution, non-linearity and pooling are stacked, followed by more convolutional and fully-connected layers.

递归神经网络（Recurrent neural networks）

RNNs process an input sequence one element at a time, maintaining in their hidden units a ‘state vector’ that implicitly contains information about the history of all the past elements of the sequence. When we consider the outputs of the hidden units at different discrete time steps as if they were the outputs of different neurons in a deep multilayer network.
在这里插入图片描述
Although their main purpose is to learn long-term dependencies, theoretical and empirical evidence shows that it is difficult to learn to store information for very long.

LSTM 神经网络（Long short-term memory networks）

A special unit called the memory cell acts like an accumulator or a gated leaky neuron: it has a connection to itself at the next time step that has a weight of one, so it copies its own real-valued state and accumulates the external signal, but this self-connection is multiplicatively gated by another unit that learns to decide when to clear the content of the memory.