【笔记】Transformer 的 经典设计—— DropPath:nn.Dropout()是通过丢弃部分神经元的输出来防止过拟合;DropPath是通过丢弃部分的残差连接来防止过拟合

注:

对于丢弃的路径,输出张量中的对应元素被设置为0,而未丢弃的路径的值按比例放大,以保持整体的期望值不变。

这个在nn.Dropout()也是同样的,为了保持整体的期望值不变,需要按照未被丢弃的比例来放大剩余的值。

Code:


def drop_path(x, drop_prob: float = 0., training: bool = False):
    """
    Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
    This is the same as the DropConnect impl I created for EfficientNet, etc networks, however,
    the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper...
    See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ... I've opted for
    changing the layer and argument names to 'drop path' rather than mix DropConnect as a layer name and use
    'survival rate' as the argument.
    """
    if drop_prob == 0. or not training: # if there is no throwing or no training
        return x
    keep_prob = 1 - drop_prob
    shape = (x.shape[0],) + (1,) * (x.ndim - 1)  # work with diff dim tensors, not just 2D ConvNets
    random_tensor = keep_prob + torch.rand(shape, dtype=x.dtype, device=x.device)  # torch.rand()  [0,1)
    random_tensor.floor_()  # binarize                 # random_tensor.floor_() 会将每个浮点数向下取整
    output = x.div(keep_prob) * random_tensor
    return output


class DropPath(nn.Module):
    """
    Drop paths (Stochastic Depth) per sample  (when applied in main path of residual blocks).
    """
    def __init__(self, drop_prob=None):
        super(DropPath, self).__init__()
        self.drop_prob = drop_prob

    def forward(self, x):
        return drop_path(x, self.drop_prob, self.training)









        # x = torch.tensor([[1.0, 2.0],
        #                   [3.0, 4.0],
        #                   [5.0, 6.0],
        #                   [7.0, 8.0]])
        # keep_prob = 1 - drop_prob
        # keep_prob = 1 - 0.5
        # keep_prob = 0.5
        # shape = (x.shape[0],) + (1,) * (x.ndim - 1)
        # shape = (4,) + (1,) * (2 - 1)
        # shape = (4, 1)
        #
        # random_tensor = keep_prob + torch.rand(shape, dtype=x.dtype, device=x.device)
        # random_tensor = 0.5 + torch.rand((4, 1))
        #
        # # 生成的 random_tensor 可能类似于
        # random_tensor = torch.tensor([[0.8],
        #                               [0.3],
        #                               [0.7],
        #                               [0.1]])
        # random_tensor.floor_()
        # random_tensor = torch.tensor([[1.0],
        #                               [0.0],
        #                               [1.0],
        #                               [0.0]])
        # output = x.div(keep_prob) * random_tensor
        # output = x.div(0.5) * random_tensor
        #
        # # 计算每个元素
        # output = torch.tensor([[1.0 / 0.5, 2.0 / 0.5],
        #                        [3.0 / 0.5, 4.0 / 0.5],
        #                        [5.0 / 0.5, 6.0 / 0.5],
        #                        [7.0 / 0.5, 8.0 / 0.5]]) * random_tensor
        #
        # output = torch.tensor([[2.0, 4.0],
        #                        [6.0, 8.0],
        #                        [10.0, 12.0],
        #                        [14.0, 16.0]]) * random_tensor
        #
        # # 计算结果
        # output = torch.tensor([[2.0, 4.0],
        #                        [0.0, 0.0],
        #                        [10.0, 12.0],
        #                        [0.0, 0.0]])

假设我们有一个简单的残差块:



import torch
import torch.nn as nn

class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, drop_prob=0.5):
        super(ResidualBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1)
        self.drop_prob = drop_prob

    def forward(self, x):
        identity = x
        out = self.conv1(x)
        out = torch.relu(out)
        out = self.conv2(out)
        if self.training:
            out = drop_path(out, self.drop_prob, training=self.training)
        out += identity
        return torch.relu(out)

def drop_path(x, drop_prob: float = 0., training: bool = False):
    if drop_prob == 0. or not training:
        return x
    keep_prob = 1 - drop_prob
    shape = (x.shape[0],) + (1,) * (x.ndim - 1)
    random_tensor = keep_prob + torch.rand(shape, dtype=x.dtype, device=x.device)
    random_tensor.floor_()
    output = x.div(keep_prob) * random_tensor
    return output

# 模拟数据输入
x = torch.randn(4, 3, 32, 32)  # 4个样本,3个通道,32x32的图像

# 创建残差块实例
res_block = ResidualBlock(3, 3, drop_prob=0.5)
res_block.train()  # 设置为训练模式

# 前向传播
output = res_block(x)
print(output.shape)

ResidualBlock: 这是一个简单的残差块,包含两个卷积层。在第二个卷积层之后,应用 drop_path。
drop_path: 在训练模式下,drop_path 会以 drop_prob 的概率随机丢弃部分路径。在这个例子中,drop_path 只影响特定的路径,而不会丢弃整个样本。

在这个例子中,drop_path 在每次前向传播中随机丢弃一些路径,而不是丢弃整个样本。这种方法有助于提高模型的鲁棒性和泛化能力。

drop_path 的主要目标是在网络的特定路径上引入随机性,从而提高模型的泛化能力。在实际应用中,它并不是丢弃整个样本,而是对网络中的某些路径进行随机丢弃。


初始化输入张量 x:

x = torch.randn(4, 3, 32, 32)  # 4个样本,3个通道,32x32的图像


创建残差块:

res_block = ResidualBlock(3, 3, drop_prob=0.5)
res_block.train()  # 设置为训练模式



output = res_block(x)
print(output.shape)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

程序猿的探索之路

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值