学习笔记 Dropout, DropConnect, DropPath

最新推荐文章于 2024-04-21 22:57:05 发布

CiLin-Yan

最新推荐文章于 2024-04-21 22:57:05 发布

阅读量692

点赞数 2

分类专栏：学习笔记机器学习文章标签：学习神经网络深度学习

本文链接：https://blog.csdn.net/weixin_43791477/article/details/124867485

版权

学习笔记同时被 2 个专栏收录

20 篇文章 0 订阅

订阅专栏

机器学习

8 篇文章 0 订阅

订阅专栏

Motivations

One of the major challenges when training a model in (Deep) Machine Learning is co-adaptation. This means that the neurons are very dependent on each other. They influence each other considerably and are not independent enough regarding their inputs. It is also common to find cases where some neurons have a predictive capacity that is more significant than others. In other words, we have an output that is excessively dependent on one neuron $^{[1]}$ .

These effects must be avoided and the weight must be distributed to prevent overfitting. The co-adaptation and the high predictive capacity of some neurons can be regulated with different regularization methods. One of the most used is the Dropout. Yet the full capabilities of dropout methods are rarely used $^{[1]}$ .

Standard Dropout

The most well known and used dropout method is the Standard Dropout introduced in 2012 by Hinton et al… Usually simply called “Dropout”, for obvious reasons, in this article we will call it Standard Dropout.

Dropout 操作是指在网络的训练阶段，每次迭代时会从基础网络中随机丢弃一定比例的神经元，然后在修改后的网络上进行数据的前向传播和误差的反向传播，如下图所示。注意，模型在测试阶段会恢复全部的神经元。Dropout是一种常用的正则化方法，可以缓解网络的过拟合问题 $^{[2]}$ 。

一方面，Dropout可以看作是集成了大量神经网络的 Bagging方法。Bagging是指用相同的数据训练若干个不同的模型，最终的预测结果是这些模型进行投票或取平均值而得到的。在训练阶段，Dropout 通过在每次迭代中随机丢弃一些神经元来改变网络的结构，以实现训练不同结构的神经网络的目的；而在测试阶段，Dropout则会使用全部的神经元，这相当于之前训练的不同结构的网络都参与了对最终结果的投票，以此获得较好的效果。Dropout通过这种方式提供了一种强大、快捷且易实现的近似Bagging方法。需要注意的是，在原始Bagging中所有模型是相互独立的，而Dropout则有所不同，这里不同的网络其实是共享了参数的 $^{[2]}$ 。
另一方面，Dropout 能够减少神经元之间复杂的共适应(co-adaptation)关系。由于Dropout每次丢弃的神经元是随机选择的，所以每次保留下来的网络会包含着不同的神经元，这样在训练过程中，网络权值的更新不会依赖于隐节点之间的固定关系（固定关系可能会产生一些共同作用从而影响网络的学习过程)。换句话说，网络中每个神经元不会对另一个特定神经元的激活非常敏感，这使得网络能够学习到—些更加泛化的特征 $^{[2]}$ 。

DropConnect

Perhaps you are already familiar with the Standard Dropout method. But there are many variations. To regularize the forward pass of a Dense network, you can apply a dropout on the neurons. The DropConnect introduced by L. Wan et al. does not apply a dropout directly on the neurons but on the weights and bias linking these neurons.