5 神经网络（PRML）

最新推荐文章于 2021-09-01 22:47:44 发布

大浪淘沙1

最新推荐文章于 2021-09-01 22:47:44 发布

阅读量3.3k

点赞数

分类专栏：机器学习（machine learning）

本文链接：https://blog.csdn.net/StudyFromEveryOne/article/details/14104941

版权

机器学习（machine learning）专栏收录该内容

6 篇文章

订阅专栏

之前我们讨论的模型是对于分类的回归模型，包含了线性组合的多个基础函数。但是他的应用范围有一定的限制。另外一个方法在于事先限定基础函数的个数并且使得他可自适应的，也就是说使得他的参数值在训练当中是可以发生变化的，其中最成功的模型是前向神经网络(feed-forward network)，也称作多层认知模型(Multilayer perceptron)。

1、前向网络函数

在第三章和第四章中讨论的回归线性分类模型的原型为：

enter image description here

其中f()是一个用于分类非线性的激活函数，我们的目标在于将基本函数基于参数然后允许这些参数能够被调整，我们首先构建M个这些线性值的组合：

enter image description here

得到这个aj值之后，紧接着我们得到： enter image description here ，这个函数所作用的节点称作隐藏节点（hidden units）,而非线性函数h()通常使用sigmoid函数或者tanh函数，这些值通常被线性组合得到：

enter image description here

其中：

enter image description here

因此:

enter image description here

因为在神经网络图中存在一个直接的关系，我们可以使用一个普遍的函数关系，但是，这个函数关系仅限于前向神经网络，而对于每一层节点，函数为：

enter image description here

2、神经网络训练

对于前向网络模型，我们对于训练的一个目标在于减小误差：

enter image description here

我们首先通过讨论回归函数问题当我们讨论一个目标变量t时：

enter image description here

因此对于多个数据我们可得：

enter image description here

我们对上式取负对数，可得：

enter image description here

因此，去掉后面的常数去掉，需要最小化的项为：

enter image description here

当我们找到上式的最小值wML，可以得到：

enter image description here

因此，最小化总的式子的误差值为：

enter image description here

我们首先考虑从两个类的分类情形：当t=1为C1，t=0时为C2,我们考虑从一个网络模型，有一个单个输出值，他的激活函数为logistic sigmoid：

我们可以将y(x,w)解释为条件概率p(C1|x),而p(C2|x)为1-p(x,w),我们可以将概率写成如下的贝努利分布格式：

enter image description here

因此，误差函数可以取负对数：

enter image description here

同样，对于有多个训练样本，我们可以得到：

enter image description here

因此误差函数为：

enter image description here

因此，当遇到多个类的分类情况（K个类）时，我们应当将使用如下条件：tk∈｛0，1｝，输出函数被解释为：y(x,w)=p(tk=1|x)。

enter image description here

而在第四章中我们讨论得：

enter image description here

赫斯矩阵(Hessian Matrix)：

我们显示了错误回溯可以被用于错误函数的二次导数，由以下的式子显示： enter image description here

赫斯矩阵在神经网络计算中扮演这一个非常具有重要的位置：

1、一些非线性优化算法使用训练神经网络，神经网络用于基于被赫斯矩阵控制的错误函数的二次属性。

2、赫斯矩阵对前向神经网络的再训练生成一个快速的过程。

3、赫斯矩阵的逆置可以被用于识别最小神经网络权值。

4、赫斯矩阵在贝叶斯神经网络的拉普拉斯预测(Laplace approximation)，他的逆置可以被用于决定训练网络的预测分布，他的特征值决定了超参数的值，他的行列式被用于估算模型的证据。

对角线估计

赫斯矩阵的对角线为：

enter image description here

我们忽略非对角线的元素，可以获得：

enter image description here

外部结果预测（Outer product approximation）

我们可以写下赫斯矩阵为如下形式：

enter image description here

通过忽略上式的第二项我们可以得到一个成为Levenberg-Marquardt的预测或者outer product预测：

逆置赫斯矩阵:

首先我们写出outer product 的预测值为：

enter image description here

假设我们已经获得L个数据点的逆置赫斯矩阵，通过分离 enter image description here

因此我们考虑赫斯矩阵的逆置，我们可以得到：

enter image description here

最终导数

enter image description here

赫斯矩阵的精确预测

我们之前已经讨论了很多对于赫斯矩阵的估计，我们这里对赫斯矩阵做出精确的预测：首先我们预定义一下的标志：

enter image description here

对于两个都在第二层的：

enter image description here

两个权值都在第一层中：

enter image description here

其中一个在第一层另一个在第二层中：

enter image description here

赫斯矩阵的快速乘法

在很多的赫斯矩阵的应用中，我们所感兴趣的并不是赫斯矩阵本身，而是赫斯矩阵和某一个向量v的相乘的一个结果而 enter image description here 则是我们所希望得到的结果，为了做到这一点我们首先标记：

enter image description here

对于这个标记，我们使用R{.}来标识 enter image description here ，因此。

我们还可以得到多个关系式：

enter image description here

我们还可以得到一下的式子：

enter image description here

神经网络的正规化

我们在第一章中可以看到，为了规避“过度拟合(overfitting)"所带来的问题，我们可以在误差函数后面加一个正规化项：

enter image description here

但是，对于上面的式子，也是存在一定的误差的。就是因为它和具体的范围属性不一致，为了凸显这个问题，我们考虑一个两层模型，第一层隐含单元的激活函数为如下的形式：

enter image description here

假设我们使用一个转换模式：

enter image description here

然后我们可以做一下的转化：

enter image description here

因此可以把输出结果转化为：

enter image description here

因此，如果需要一个修正项能够对这些转化过程不发生变化，那么，这样的修正项可以被写为：

enter image description here

一旦发生上面的变量的变化，我们可以采取一下转化：

enter image description here

赫斯矩阵(Hessian Matrix)：

我们显示了错误回溯可以被用于错误函数的二次导数，由以下的式子显示： enter image description here

赫斯矩阵在神经网络计算中扮演这一个非常具有重要的位置：

1、一些非线性优化算法使用训练神经网络，神经网络用于基于被赫斯矩阵控制的错误函数的二次属性。

2、赫斯矩阵对前向神经网络的再训练生成一个快速的过程。

3、赫斯矩阵的逆置可以被用于识别最小神经网络权值。

对角线估计

赫斯矩阵的对角线为：

enter image description here

我们忽略非对角线的元素，可以获得：

enter image description here

外部结果预测（Outer product approximation）

我们可以写下赫斯矩阵为如下形式：

enter image description here

通过忽略上式的第二项我们可以得到一个成为Levenberg-Marquardt的预测或者outer product预测：

逆置赫斯矩阵:

首先我们写出outer product 的预测值为：

enter image description here

假设我们已经获得L个数据点的逆置赫斯矩阵，通过分离 enter image description here

因此我们考虑赫斯矩阵的逆置，我们可以得到：

enter image description here

最终导数

enter image description here

赫斯矩阵的精确预测

我们之前已经讨论了很多对于赫斯矩阵的估计，我们这里对赫斯矩阵做出精确的预测：首先我们预定义一下的标志：

enter image description here

对于两个都在第二层的：

enter image description here

两个权值都在第一层中：

enter image description here

其中一个在第一层另一个在第二层中：

enter image description here

赫斯矩阵的快速乘法

enter image description here

对于这个标记，我们使用R{.}来标识 enter image description here ，因此。

我们还可以得到多个关系式：

enter image description here

我们还可以得到一下的式子：

enter image description here

神经网络的正规化

我们在第一章中可以看到，为了规避“过度拟合(overfitting)"所带来的问题，我们可以在误差函数后面加一个正规化项：

enter image description here

假设我们使用一个转换模式：

enter image description here

然后我们可以做一下的转化：

enter image description here

因此可以把输出结果转化为：

enter image description here

因此，如果需要一个修正项能够对这些转化过程不发生变化，那么，这样的修正项可以被写为：

enter image description here

一旦发生上面的变量的变化，我们可以采取一下转化：

enter image description here

enter image description here 对于绝大多数的模式识别的应用中我们对于结果的预测是不变的，无论对于输入值采取的是怎样复杂的变化，考虑对于一个二维图像的分类（例如手写体），某一个图像应当被分类为一个特定的类别而不论这个图像具体存在于哪个位置。如果有一个足够大的数量的训练模版是有效的，那么自适应模型（例如神经网络模型）可以学习这个不变量，至少能够大约推测出来。

这种方法也许不实用，但是如果训练数量的个数是有限的，或者有一些不变量的存在，那么我们可以寻找一个另外的一种方法来鼓励自适应模型来凸显需要的不变量，这些可以被分为四大类：

1、训练集通过训练模版的副本进行放大，通过不变量的存在进行改变。

2、一个正规化项加到误差函数中来补偿由于输入变化造成模型输出值的改变，这个结果称作tangent propagation切线繁殖

3、不变量被预置在预处理过程中，通过在需要改变的条件下提取不变量的特征,任何随后的回归或者分类系统使用这样的特征作为输入，将会包含这些不变值。

4、最后的一个方法就是将这些不变量属性建立于神经网络的结构中，一个获得这个目的的方法就是通过使用局部接受域和共享权值。在卷积神经网络中将会谈论到。

Tangent propagation切线繁殖

我们可以通过正规化的手段来鼓励模型对于输入数据的变化，通过切线繁殖(Tangent Propagation)的手法达到目的，考虑一个特定的输入向量xn的转换的效果，考虑到这个变化是连续的，我们可以讨论引入一个量ξ主宰了这个变化，因此切线向量为： enter image description here