【笔记】Transformer 的经典设计—— DropPath：nn.Dropout()是通过丢弃部分神经元的输出来防止过拟合；DropPath是通过丢弃部分的残差连接来防止过拟合

程序猿的探索之路

已于 2024-07-31 21:40:30 修改

阅读量28

点赞数

文章标签：笔记 transformer 深度学习

于 2024-07-31 21:35:28 首次发布

本文链接：https://blog.csdn.net/nyist_yangguang/article/details/140832994

版权

注：

对于丢弃的路径，输出张量中的对应元素被设置为0，而未丢弃的路径的值按比例放大，以保持整体的期望值不变。

这个在nn.Dropout()也是同样的，为了保持整体的期望值不变，需要按照未被丢弃的比例来放大剩余的值。

Code:


def drop_path(x, drop_prob: float = 0., training: bool = False):
    """
    Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
    This is the same as the DropConnect impl I created for EfficientNet, etc networks, however,
    the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper...
    See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ... I've opted for
    changing the layer and argument names to 'drop path' rather than mix DropConnect as a layer name and use
    'survival rate' as the argument.
    """
    if drop_prob == 0. or not training: # if there is no throwing or no training
        return x
    keep_prob = 1 - drop_prob
    shape = (x.shape[0],) + (1,) * (x.ndim - 1)  # work with diff dim tensors, not just 2D ConvNets
    random_tensor = keep_prob + torch.rand(shape, dtype=x.dtype, device=x.device)  # torch.rand()  [0,1)
    random_tensor.floor_()  # binarize                 # random_tensor.floor_() 会将每个浮点数向下取整
    output = x.div(keep_prob) * random_tensor
    return output


class DropPath(nn.Module):
    """
    Drop paths (Stochastic Depth) per sample  (when applied in main path of residual blocks).
    """
    def __init__(self, drop_prob=None):
        super(DropPath, self).__init__()
        self.drop_prob = drop_prob

    def forward(self, x):
        return drop_path(x, self.drop_prob, self.training)









        # x = torch.tensor([[1.0, 2.0],
        #                   [3.0, 4.0],
        #                   [5.0, 6.0],
        #                   [7.0, 8.0]])
        # keep_prob = 1 - drop_prob
        # keep_prob = 1 - 0.5
        # keep_prob = 0.5
        # shape = (x.shape[0],) + (1,) * (x.ndim - 1)
        # shape = (4,) + (1,) * (2 - 1)
        # shape = (4, 1)
        #
        # random_tensor = keep_prob + torch.rand(shape, dtype=x.dtype, device=x.device)
        # random_tensor = 0.5 + torch.rand((4, 1))
        #
        # # 生成的 random_tensor 可能类似于
        # random_tensor = torch.tensor([[0.8],
        #                               [0.3],
        #                               [0.7],
        #                               [0.1]])
        # random_tensor.floor_()
        # random_tensor = torch.tensor([[1.0],
        #                               [0.0],
        #                               [1.0],
        #                               [0.0]])
        # output = x.div(keep_prob) * random_tensor
        # output = x.div(0.5) * random_tensor
        #
        # # 计算每个元素
        # output = torch.tensor([[1.0 / 0.5, 2.0 / 0.5],
        #                        [3.0 / 0.5, 4.0 / 0.5],
        #                        [5.0 / 0.5, 6.0 / 0.5],
        #                        [7.0 / 0.5, 8.0 / 0.5]]) * random_tensor
        #
        # output = torch.tensor([[2.0, 4.0],
        #                        [6.0, 8.0],
        #                        [10.0, 12.0],
        #                        [14.0, 16.0]]) * random_tensor
        #
        # # 计算结果
        # output = torch.tensor([[2.0, 4.0],
        #                        [0.0, 0.0],
        #                        [10.0, 12.0],
        #                        [0.0, 0.0]])

假设我们有一个简单的残差块：



import torch
import torch.nn as nn

class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, drop_prob=0.5):
        super(ResidualBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1)
        self.drop_prob = drop_prob

    def forward(self, x):
        identity = x
        out = self.conv1(x)
        out = torch.relu(out)
        out = self.conv2(out)
        if self.training:
            out = drop_path(out, self.drop_prob, training=self.training)
        out += identity
        return torch.relu(out)

def drop_path(x, drop_prob: float = 0., training: bool = False):
    if drop_prob == 0. or not training:
        return x
    keep_prob = 1 - drop_prob
    shape = (x.shape[0],) + (1,) * (x.ndim - 1)
    random_tensor = keep_prob + torch.rand(shape, dtype=x.dtype, device=x.device)
    random_tensor.floor_()
    output = x.div(keep_prob) * random_tensor
    return output

# 模拟数据输入
x = torch.randn(4, 3, 32, 32)  # 4个样本，3个通道，32x32的图像

# 创建残差块实例
res_block = ResidualBlock(3, 3, drop_prob=0.5)
res_block.train()  # 设置为训练模式

# 前向传播
output = res_block(x)
print(output.shape)

ResidualBlock: 这是一个简单的残差块，包含两个卷积层。在第二个卷积层之后，应用 drop_path。
drop_path: 在训练模式下，drop_path 会以 drop_prob 的概率随机丢弃部分路径。在这个例子中，drop_path 只影响特定的路径，而不会丢弃整个样本。

在这个例子中，drop_path 在每次前向传播中随机丢弃一些路径，而不是丢弃整个样本。这种方法有助于提高模型的鲁棒性和泛化能力。

drop_path 的主要目标是在网络的特定路径上引入随机性，从而提高模型的泛化能力。在实际应用中，它并不是丢弃整个样本，而是对网络中的某些路径进行随机丢弃。


初始化输入张量 x:

x = torch.randn(4, 3, 32, 32)  # 4个样本，3个通道，32x32的图像


创建残差块:

res_block = ResidualBlock(3, 3, drop_prob=0.5)
res_block.train()  # 设置为训练模式



output = res_block(x)
print(output.shape)

程序猿的探索之路

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
【笔记】Transformer 的经典设计—— DropPath：nn.Dropout()是通过丢弃部分神经元的输出来防止过拟合；DropPath是通过丢弃部分的残差连接来防止过拟合

drop_path: 在训练模式下，drop_path 会以 drop_prob 的概率随机丢弃部分路径。在这个例子中，drop_path 只影响特定的路径，而不会丢弃整个样本。在这个例子中，drop_path 在每次前向传播中随机丢弃一些路径，而不是丢弃整个样本。这种方法有助于提高模型的鲁棒性和泛化能力。ResidualBlock: 这是一个简单的残差块，包含两个卷积层。在第二个卷积层之后，应用 drop_path。
复制链接

扫一扫