在学习VIT-pytorch中看到drop_path,并不是很了解,在查阅以下大佬的博客后有了初步了解,进行一些总结:
1、DropPath或drop_path正则化(通俗易懂)
DropPath或drop_path正则化(通俗易懂)_惊鸿落-Capricorn的博客-CSDN博客_drop path
2、【正则化】DropPath/drop_path用法
【正则化】DropPath/drop_path用法_风巽·剑染春水的博客-CSDN博客_drop path
3、Dropout 和 Drop Path
Dropout 和 Drop Path_LemonShy在搬砖的博客-CSDN博客_droppath和dropout
在VIT中的写法
def drop_path(x, drop_prob: float = 0., training: bool = False): """Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks). This is the same as the DropConnect impl I created for EfficientNet, etc networks, however, the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper... See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ... I've opted for changing the layer and argument names to 'drop path' rather than mix DropConnect as a layer name and use 'survival rate' as the argument. ### DropPath/drop_path 是一种正则化手段,和Dropout思想类似,其效果是将深度学习模型中的多分支结构的子路径随机”删除“,可以防止过拟合,提升模型表现,而且克服了网络退化问题。 ResNet缺乏正则方法,本文提出了drop-path,对子路径进行随机丢弃,通过实验表示,残差结构对于深度网络来说不是必须的,路径长度才是训练深度网络的需要的基本组件,而不单单是残差块。 """ if drop_prob == 0. or not training: return x keep_prob = 1 - drop_prob shape = (x.shape[0],) + (1,) * (x.ndim - 1) # work with diff dim tensors, not just 2D ConvNets random_tensor = keep_prob + torch.rand(shape, dtype=x.dtype, device=x.device) random_tensor.floor_() # binarize output = x.div(keep_prob) * random_tensor return output class DropPath(nn.Module): """Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks). """ # 每个样本的下落路径(随机深度)(当应用于剩余块的主路径时) def __init__(self, drop_prob=None): super(DropPath, self).__init__() self.drop_prob = drop_prob def forward(self, x): return drop_path(x, self.drop_prob, self.training)
用法如下所示:
# NOTE: drop path for stochastic depth, we shall see if this is better than dropout here self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
def forward(self, x): x = x + self.drop_path(self.attn(self.norm1(x))) x = x + self.drop_path(self.mlp(self.norm2(x))) return x
不能只有一个通路使用drop path,至少有残差连接才行