神经网络基础

最新推荐文章于 2024-10-11 21:17:25 发布

August_May

最新推荐文章于 2024-10-11 21:17:25 发布

阅读量597

点赞数 9

文章标签：神经网络人工智能深度学习

本文链接：https://blog.csdn.net/2301_79923000/article/details/137201823

版权

本文详细阐述了Dropout正则化技术如何通过随机抑制神经元来减少过拟合，以及BatchNormalization如何通过标准化输入和改善反向传播来加速训练并降低过拟合风险。介绍了这两种在深度学习中常用的技术及其在PyTorch中的应用实例。

摘要由CSDN通过智能技术生成

第六章神经网络基础

1.4 Dropout正则化

Dropout是一种正则化技术，通过防止特征的协同适应（co-adaptations），可用于减少神经网络中的过拟合。

Dropout正则化是一种在神经网络训练过程中防止过拟合的技术。它通过随机地将一部分神经元的输出设置为0来减少神经元之间的依赖关系，从而提高模型的泛化能力。在训练过程中，每个神经元都有一定的概率被丢弃，这个概率通常称为丢弃率（dropout rate）。

Dropout的实现方法非常简单，参考如下代码：

from torch.distributions import Bernoulli

activacion = torch.rand([5, 5])
m = Bernoulli(0.5)
mask = m.sample(activacion.shape)
activacion *= mask
print(activacion)

因为Dropout 对神经元的抑制是按照p的概率随机发生的，所以使用了Dropout 的神经网络在每次训练中，学习的几乎都是一个新的网络。另外的一种解释是Dropout 在训练一个共享部分参数的集成模型。为了模拟集成模型的方法，使用了Dropout 的网络需要使用到所有的神经元。所以在测试时，Dropout 将激活值乘上一个尺度缩放系数1-p以恢复在训练时按概率p随机地丢弃神经元所造成的尺度变换，其中的p就是在训练时抑制神经元的概率。在实践中(同时也是PyTorch的实现方式)，通常采用 Inverted Dropout的方式。在训练时对激活值乘上尺度缩放系数1/1-p，而在测试时则什么都不需要做。

Dropout 会在训练和测试时做出不同的行为，PyTorch的torch.nn.Module 提供了train方法和eval方法，通过调用这两个方法可以将网络设置为训练模式或测试模式。这两个方法只对Dropout 这种训练和测试不一致的网络层起作用，而不影响其他的网络层，后面介绍的BatchNormalization 也是训练和测试步骤不同的网络层。

下面通过两个实验说明Dropout 在训练模式和测试模式下的区别。

p, count, iters, shape = 0.5, 0., 50, (5, 5)
dropout = nn.Dropout(p)
dropout.train()

for _ in range(iters):
    activations = torch.rand(shape) + 1e-5
    output = dropout(activations)
    count += torch.sum(output == activations * (1/(1-p)))


print("train模式Dropout影响了{}的神经元".format(1-float(count)/(activations.nelement()*iters)))

count = 0
dropout.eval()
for _ in range(iters):
    activations = torch.rand(shape) + 1e-5
    output = dropout(activations)
    count += torch.sum(output == activations)

print("train模式Dropout影响了{}的神经元".format(1 - float(count) / (activations.nelement() * iters)))

1.5 Batch Normalization

在训练神经网络时，往往需要标准化（normalization）输入数据、使得网络的训练更加快速和有效，然而SGD（Stochastic Gradient Deseemt，随机梯度下降）等学习算法会在训练中不断改变网络的参数，隐藏层的激活值的分布会因此发生变化，而这一种变化就称内协变量偏移（Internal Covariate Shift, ICS)。

为了解决ICS问题，批标准化（Batch Normalization）固定激活函数的输入变量的均值和方差，使得网络的训练更快。除了加速训练这一优势，Batch Normalizatign还具备其他功能：首先，应用了Batch Normalization 的神经网络反向传播中有着非常好的梯度；这样，神经网络对权重的初值和尺度依赖性减少，能够使用更高的学习率，还降低了不收敛的风险。不仅如此，Batch Normalization 还具紧企则化的作用，Dropout 也就不再需要了。最后，Batch Normalization让深度神经网络使用饱和非线性函数成为可能。

1.5.1　Batch Normalization的实现方式

Batch Normalization在训练时，用当前训练批次的数据单独的估计每一激活值x“）的均值和方差。为了方便，我们接下来只关注某一个激活值x，并将k省略掉，现定义当前批次为具有m个激活值的β:

β=x; (i=1,….,m)

首先，计算当前批次激活值的均值和方差：

1.5.2　Batch Normalization的使用方法

在PyTorch中，nn.BatchNormld提供了Batch Normalization的实现，同样地，它也被当作神经网络中的层使用。它有两个十分关键的参数，num_features 确定特征的数量，affine 决定Batch Normalization 是否使用仿射映射。

import torch
from torch import nn

m = nn.BatchNorm1d(num_features=5, affine=False)
print("BEFORE")
print("running_mean:", m.running_mean)
print("running_var:", m.running_var)

for _ in range(100):
    input = torch.randn(20, 5)
    output = m(input)

print("AFTER:")
print("running_mean:", m.running_mean)
print("running_var:", m.running_var)

m.eval()
for _ in range(100):
    input = torch.randn(20, 5)
    output = m(input)

print("EVAL:")
print("running_mean:", m.running_mean)
print("running_var:", m.running_var)


print("#···················································#")
print("no affine, gamma:", m.weight)
print("no affine, beta:", m.bias)

m_affine = nn.BatchNorm1d(num_features=5, affine=True)
print('')
print("with affine, gamma:", m_affine.weight, type(m_affine.weight))
print("with affine, beta:", m_affine.bias, type(m_affine.bias))

感知器模型可以算得上是深度学习的基石。最初的单层感知器模型就是为了模拟人脑神经元而提出的，但是就连异或运算都无法模拟。经过多年的研究，人们终于提出了多层感知器模型，用于拟合任意函数。结合高效的反向传播算法，神经网络终于诞生。尽管目前看来，BP神经网络已经无法胜任许多工作；但是从发展的角度来看，BP神经网络仍是学习深度学习不可不知的重要部分。本章最后介绍了常用的训练技巧，这些技巧可以有效地提升模型表现，避免过拟合。