「解析」正则化 DropPath

ViatorSun

已于 2022-02-15 21:55:52 修改

阅读量7.7k

点赞数 13

分类专栏： Pytorch 文章标签： Pytorch timm DropOut DropPath

于 2022-02-15 21:49:52 首次发布

本文链接：https://blog.csdn.net/ViatorSun/article/details/122947859

版权

Pytorch 专栏收录该内容

36 篇文章 36 订阅

订阅专栏

DropPath 类似于Dropout，不同的是 Drop将深度学习模型中的多分支结构随机 “失效”
而Dropout 是对神经元随机 “失效”

1、DropPath在网络中的应用

假设在前向传播中有如下的代码：

x = x + self.drop_path( self.conv(x) )

那么在drop_path分支中，每个batch有drop_prob的概率样本在 self.conv(x) 不会 “执行”，会以0直接传递。

若x为输入的张量，其通道为[B,C,H,W]，那么drop_path的含义为在一个Batch_size中，随机有drop_prob的样本，不经过主干，而直接由分支进行恒等映射。

⚠️注意：Drop Path不能直接这样使用：
x = self.drop_path(x)

2、DropPath实现

def drop_path(x, drop_prob: float = 0., training: bool = False, scale_by_keep: bool = True):
    """Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).

    This is the same as the DropConnect impl I created for EfficientNet, etc networks, however,
    the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper...
    See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ... I've opted for
    changing the layer and argument names to 'drop path' rather than mix DropConnect as a layer name and use
    'survival rate' as the argument.

    """
    if drop_prob == 0. or not training:
        return x
    keep_prob = 1 - drop_prob
    shape = (x.shape[0],) + (1,) * (x.ndim - 1)  # work with diff dim tensors, not just 2D ConvNets
    random_tensor = x.new_empty(shape).bernoulli_(keep_prob)
    if keep_prob > 0.0 and scale_by_keep:
        random_tensor.div_(keep_prob)
    return x * random_tensor


class DropPath(nn.Module):
    """Drop paths (Stochastic Depth) per sample  (when applied in main path of residual blocks).
    """
    def __init__(self, drop_prob=None, scale_by_keep=True):
        super(DropPath, self).__init__()
        self.drop_prob = drop_prob
        self.scale_by_keep = scale_by_keep

    def forward(self, x):
        return drop_path(x, self.drop_prob, self.training, self.scale_by_keep)

2.1 torch.bernoulli ()

torch.bernoulli(input, *, generator=None, out=None)

>>> a = torch.empty(3, 3).uniform_(0, 1)  # generate a uniform random matrix with range [0, 1]
>>> a
tensor([[ 0.1737,  0.0950,  0.3609],
        [ 0.7148,  0.0289,  0.2676],
        [ 0.9456,  0.8937,  0.7202]])
>>> torch.bernoulli(a)
tensor([[ 1.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 1.,  1.,  1.]])

从伯努利分布中提取二进制随机数（0或1）。
输入张量应该是一个包含用于绘制二进制随机数的概率的张量。因此，输入中的所有值必须在以下范围内： $0≤input_i≤1$

$\mathrm{out_i ∼Bernoulli}( p= \mathrm{input_i} )$

The returned out tensor only has values 0 or 1 and is of the same shape as input.

out can have integral dtype, but input must have floating point dtype

2.2 torch.uniform()

用均匀分布中提取的值填充输入张量 $U (a, b)$

torch.nn.init.uniform_(tensor, a=0.0, b=1.0)

>>> a = torch.empty(3, 3)
>>> a
tensor([[0.0000e+00, 1.5846e+29, 0.0000e+00],
        [1.5846e+29, 9.8091e-45, 0.0000e+00],
        [0.0000e+00, 0.0000e+00, 0.0000e+00]])

>>> a.uniform_(0,1)
tensor([[0.0876, 0.5072, 0.4613],
        [0.7696, 0.4485, 0.1128],
        [0.2512, 0.8060, 0.6595]])

>>> a.bernoulli_()
tensor([[1., 1., 1.],
        [1., 0., 0.],
        [1., 0., 0.]])

2.3 torch.div_()

将输入的每个元素除以另一个元素的对应元素。

torch.div(input, other, *, rounding_mode=None, out=None)


>>> x = torch.tensor([ 0.3810,  1.2774, -0.2972, -0.3719,  0.4637])
>>> torch.div(x, 0.5)		# 相当于 x 中每个元素都除以 0.5
tensor([ 0.7620,  2.5548, -0.5944, -0.7438,  0.9274])


>>> a = torch.tensor([[-0.3711, -1.9353, -0.4605, -0.2917],
...                   [ 0.1815, -1.0111,  0.9805, -1.5923],
...                   [ 0.1062,  1.4581,  0.7759, -1.2344],
...                   [-0.1830, -0.0313,  1.1908, -1.4757]])
>>> b = torch.tensor([ 0.8032,  0.2930, -0.8113, -0.2308])
>>> torch.div(a, b)		# a的每行元素除以b的对应元素
tensor([[-0.4620, -6.6051,  0.5676,  1.2639],
        [ 0.2260, -3.4509, -1.2086,  6.8990],
        [ 0.1322,  4.9764, -0.9564,  5.3484],
        [-0.2278, -0.1068, -1.4678,  6.3938]])

$out_i = \frac{input_i}{other_i}$

3、参考

https://www.cnblogs.com/dan-baishucaizi/p/14703263.html

ViatorSun

关注

13
点赞
踩
39

收藏

觉得还不错? 一键收藏
打赏
0
评论
「解析」正则化 DropPath

DropPath 类似于Dropout，不同的是 Drop将深度学习模型中的多分支结构随机 "失效"而Dropout 是对神经元随机 "失效"1、DropPath在网络中的应用假设在前向传播中有如下的代码：x = x + self.drop_path( self.conv(x) )那么在drop_path分支中，每个batch有drop_prob的概率样本在 self.conv(x) 不会 “执行”，会以0直接传递。若x为输入的张量，其通道为[B,C,H,W]，那么drop_path的含义
复制链接

扫一扫