Dropout方法是一种在训练模型时被广泛应用的trick,目的是防止模型过拟合,原理是使网络中某一层的每个参数以一定概率被mask(变为0),只用剩下的参数进行训练,从而达到防止模型过拟合的目的。
以Pytorch中的实现为例,我们常用torch.nn.Dropout(p=0.5, inplace=False)方法实现,它调用的底层函数是torch.nn.functional.dropout(),官方源码见文章末尾。
在使用时,根据情况的不同,主要有以下两种用法:
1. 在搭建网络时使用,防止过拟合
在搭建网络时,一般将dropout层放于全连接层(nn.Linear)之后,用于在训练时将全连接层中参数以一定概率进行丢弃,以防止过拟合。在使用时有以下几点需注意:
- dropout方法是用于训练的,因此在pytorch中,nn.Dropout()层只在model.train()模型下有效,在model.eval()模式下会自动失效
- 参数p,表示每个神经元以一定概率处于不激活的状态,默认为0.5
- 在训练时,nn.Dropout()不仅对每个神经元参数以一定概率变为0,还会将剩下不为0的参数进行rescale(缩放),目的是为了保持期望不变,缩放比例是1/(1-p)
- nn.Dropout()的输入可以是任意形状,输出的形状与输入形状相同
2. 对输出张量使用,用于数据增强
对于网络中某一层输出的张量,也可以对其使用nn.dropout()方法,这样可以使张量中每个元素以一定概率为0,从而模拟现实中数据缺失的情况,以达到数据增强的目的。并且,对于不归0的元素,会缩放为原来的1/(1-p)倍。
官方代码如下:
class Dropout(_DropoutNd):
r"""During training, randomly zeroes some of the elements of the input
tensor with probability :attr:`p` using samples from a Bernoulli
distribution. Each channel will be zeroed out independently on every forward
call.
This has proven to be an effective technique for regularization and
preventing the co-adaptation of neurons as described in the paper
`Improving neural networks by preventing co-adaptation of feature
detectors`_ .
Furthermore, the outputs are scaled by a factor of :math:`\frac{1}{1-p}` during
training. This means that during evaluation the module simply computes an
identity function.
Args:
p: probability of an element to be zeroed. Default: 0.5
inplace: If set to ``True``, will do this operation in-place. Default: ``False``
Shape:
- Input: :math:`(*)`. Input can be of any shape
- Output: :math:`(*)`. Output is of the same shape as input
Examples::
>>> m = nn.Dropout(p=0.2)
>>> input = torch.randn(20, 16)
>>> output = m(input)
.. _Improving neural networks by preventing co-adaptation of feature
detectors: https://arxiv.org/abs/1207.0580
"""
def forward(self, input: Tensor) -> Tensor:
return F.dropout(input, self.p, self.training, self.inplace)