Motivations
One of the major challenges when training a model in (Deep) Machine Learning is co-adaptation. This means that the neurons are very dependent on each other. They influence each other considerably and are not independent enough regarding their inputs. It is also common to find cases where some neurons have a predictive capacity that is more significant than others. In other words, we have an output that is excessively dependent on one neuron [ 1 ] ^{[1]} [1].
These effects must be avoided and the weight must be distributed to prevent overfitting. The co-adaptation and the high predictive capacity of some neurons can be regulated with different regularization methods. One of the most used is the Dropout. Yet the full capabilities of dropout methods are rarely used [ 1 ] ^{[1]} [1].
Standard Dropout
The most well known and used dropout method is the Standard Dropout introduced in 2012 by Hinton et al… Usually simply called “Dropout”, for obvious reasons, in this article we will call it Standard Dropout.
Dropout
操作是指在网络的训练阶段,每次迭代时会从基础网络中随机丢弃一定比例的神经元,然后在修改后的网络上进行数据的前向传播和误差的反向传播,如下图所示。注意,模型在测试阶段会恢复全部的神经元。Dropout
是一种常用的正则化方法,可以缓解网络的过拟合问题
[
2
]
^{[2]}
[2]。
一方面,Dropout
可以看作是集成了大量神经网络的 Bagging
方法。Bagging
是指用相同的数据训练若干个不同的模型,最终的预测结果是这些模型进行投票或取平均值而得到的。在训练阶段,Dropout
通过在每次迭代中随机丢弃一些神经元来改变网络的结构,以实现训练不同结构的神经网络的目的;而在测试阶段,Dropout
则会使用全部的神经元,这相当于之前训练的不同结构的网络都参与了对最终结果的投票,以此获得较好的效果。Dropout
通过这种方式提供了一种强大、快捷且易实现的近似Bagging
方法。需要注意的是,在原始Bagging
中所有模型是相互独立的,而Dropout
则有所不同,这里不同的网络其实是共享了参数的
[
2
]
^{[2]}
[2]。
另一方面,Dropout
能够减少神经元之间复杂的共适应(co-adaptation
)关系。由于Dropout
每次丢弃的神经元是随机选择的,所以每次保留下来的网络会包含着不同的神经元,这样在训练过程中,网络权值的更新不会依赖于隐节点之间的固定关系(固定关系可能会产生一些共同作用从而影响网络的学习过程)。换句话说,网络中每个神经元不会对另一个特定神经元的激活非常敏感,这使得网络能够学习到—些更加泛化的特征
[
2
]
^{[2]}
[2]。
DropConnect
Perhaps you are already familiar with the Standard Dropout method. But there are many variations. To regularize the forward pass of a Dense network, you can apply a dropout on the neurons. The DropConnect introduced by L. Wan et al. does not apply a dropout directly on the neurons but on the weights and bias linking these neurons.
DropPath / Stochastic Depth
DropPath
就是随机将深度学习网络中的多分支结构随机删除。
参考资料:
[1] 12 Main Dropout Methods: Mathematical and Visual Explanation for DNNs, CNNs, and RNNs
[2] 诸葛越, 百面深度学习: 算法工程师带你去面试