这里需要明确若干个问题:
- dropout层在训练阶段是以概率p随机隐藏神经元,而不是把该层中所有神经元中占比为p的神经元数量进行隐藏。
- 训练阶段dropout层的输出
o
o
o不等于输入
i
i
i,而是
i
1
−
p
\dfrac{i}{1-p}
1−pi,原因如下:
按照论文,原始的dropout在两个阶段的表示应该是:
(论文中的p表示展示概率,PyTorch和Caffe中的p表示隐藏概率,所以两种情况下的p不一样)测试阶段权重应该乘概率。然而在框架中中,训练阶段以 i 1 − p \dfrac{i}{1-p} 1−pi作为输出,那么测试阶段将可以直接输出 i i i,这和论文是等价的。
测试代码:
import torch
import torch.nn as nn
class Model1(nn.Module):
# Model 1 using functional dropout
def __init__(self, p=0.0):
super().__init__()
self.p = p
def forward(self, inputs):
return nn.functional.dropout(inputs, p=self.p, training=True)
class Model2(nn.Module):
# Model 2 using dropout module
def __init__(self, p=0.0):
super().__init__()
self.drop_layer = nn.Dropout(p=p)
def forward(self, inputs):
return self.drop_layer(inputs)
model1 = Model1(p=0.5) # functional dropout
model2 = Model2(p=0.5) # dropout module
# creating inputs
inputs = torch.rand(10)
print("inputs:", inputs)
print()
# forwarding inputs in train mode
print('Normal (train) model:')
print('Model 1', model1(inputs))
print('Model 2', model2(inputs))
print()
# switching to eval mode
model1.eval()
model2.eval()
# forwarding inputs in evaluation mode
print('Evaluation mode:')
print('Model 1', model1(inputs))
print('Model 2', model2(inputs))
print()
# show model summary
print('Print summary:')
print(model1)
print(model2)
inputs: tensor([0.1426, 0.6055, 0.0692, 0.3617, 0.7946, 0.4689, 0.1311, 0.9336, 0.8236, 0.7306])
Normal (train) model:
Model 1 tensor([0.2851, 0.0000, 0.1384, 0.7234, 0.0000, 0.9378, 0.2622, 1.8672, 0.0000, 0.0000])
Model 2 tensor([0.2851, 0.0000, 0.0000, 0.0000, 0.0000, 0.9378, 0.2622, 1.8672, 1.6471, 0.0000])
Evaluation mode:
Model 1 tensor([0.2851, 1.2109, 0.0000, 0.7234, 1.5891, 0.9378, 0.0000, 1.8672, 0.0000, 0.0000])
Model 2 tensor([0.1426, 0.6055, 0.0692, 0.3617, 0.7946, 0.4689, 0.1311, 0.9336, 0.8236, 0.7306])
Print summary:
Model1()
Model2(
(drop_layer): Dropout(p=0.5)
)
ref:
https://stackoverflow.com/questions/53419474/nn-dropout-vs-f-dropout-pytorch
https://stackoverflow.com/questions/34597316/why-input-is-scaled-in-tf-nn-dropout-in-tensorflow