注:
对于丢弃的路径,输出张量中的对应元素被设置为0,而未丢弃的路径的值按比例放大,以保持整体的期望值不变。
这个在nn.Dropout()也是同样的,为了保持整体的期望值不变,需要按照未被丢弃的比例来放大剩余的值。
Code:
def drop_path(x, drop_prob: float = 0., training: bool = False):
"""
Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
This is the same as the DropConnect impl I created for EfficientNet, etc networks, however,
the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper...
See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ... I've opted for
changing the layer and argument names to 'drop path' rather than mix DropConnect as a layer name and use
'survival rate' as the argument.
"""
if drop_prob == 0. or not training: # if there is no throwing or no training
return x
keep_prob = 1 - drop_prob
shape = (x.shape[0],) + (1,) * (x.ndim - 1) # work with diff dim tensors, not just 2D ConvNets
random_tensor = keep_prob + torch.rand(shape, dtype=x.dtype, device=x.device) # torch.rand() [0,1)
random_tensor.floor_() # binarize # random_tensor.floor_() 会将每个浮点数向下取整
output = x.div(keep_prob) * random_tensor
return output
class DropPath(nn.Module):
"""
Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
"""
def __init__(self, drop_prob=None):
super(DropPath, self).__init__()
self.drop_prob = drop_prob
def forward(self, x):
return drop_path(x, self.drop_prob, self.training)
# x = torch.tensor([[1.0, 2.0],
# [3.0, 4.0],
# [5.0, 6.0],
# [7.0, 8.0]])
# keep_prob = 1 - drop_prob
# keep_prob = 1 - 0.5
# keep_prob = 0.5
# shape = (x.shape[0],) + (1,) * (x.ndim - 1)
# shape = (4,) + (1,) * (2 - 1)
# shape = (4, 1)
#
# random_tensor = keep_prob + torch.rand(shape, dtype=x.dtype, device=x.device)
# random_tensor = 0.5 + torch.rand((4, 1))
#
# # 生成的 random_tensor 可能类似于
# random_tensor = torch.tensor([[0.8],
# [0.3],
# [0.7],
# [0.1]])
# random_tensor.floor_()
# random_tensor = torch.tensor([[1.0],
# [0.0],
# [1.0],
# [0.0]])
# output = x.div(keep_prob) * random_tensor
# output = x.div(0.5) * random_tensor
#
# # 计算每个元素
# output = torch.tensor([[1.0 / 0.5, 2.0 / 0.5],
# [3.0 / 0.5, 4.0 / 0.5],
# [5.0 / 0.5, 6.0 / 0.5],
# [7.0 / 0.5, 8.0 / 0.5]]) * random_tensor
#
# output = torch.tensor([[2.0, 4.0],
# [6.0, 8.0],
# [10.0, 12.0],
# [14.0, 16.0]]) * random_tensor
#
# # 计算结果
# output = torch.tensor([[2.0, 4.0],
# [0.0, 0.0],
# [10.0, 12.0],
# [0.0, 0.0]])
假设我们有一个简单的残差块:
import torch
import torch.nn as nn
class ResidualBlock(nn.Module):
def __init__(self, in_channels, out_channels, drop_prob=0.5):
super(ResidualBlock, self).__init__()
self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1)
self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1)
self.drop_prob = drop_prob
def forward(self, x):
identity = x
out = self.conv1(x)
out = torch.relu(out)
out = self.conv2(out)
if self.training:
out = drop_path(out, self.drop_prob, training=self.training)
out += identity
return torch.relu(out)
def drop_path(x, drop_prob: float = 0., training: bool = False):
if drop_prob == 0. or not training:
return x
keep_prob = 1 - drop_prob
shape = (x.shape[0],) + (1,) * (x.ndim - 1)
random_tensor = keep_prob + torch.rand(shape, dtype=x.dtype, device=x.device)
random_tensor.floor_()
output = x.div(keep_prob) * random_tensor
return output
# 模拟数据输入
x = torch.randn(4, 3, 32, 32) # 4个样本,3个通道,32x32的图像
# 创建残差块实例
res_block = ResidualBlock(3, 3, drop_prob=0.5)
res_block.train() # 设置为训练模式
# 前向传播
output = res_block(x)
print(output.shape)
ResidualBlock: 这是一个简单的残差块,包含两个卷积层。在第二个卷积层之后,应用 drop_path。
drop_path: 在训练模式下,drop_path 会以 drop_prob 的概率随机丢弃部分路径。在这个例子中,drop_path 只影响特定的路径,而不会丢弃整个样本。
在这个例子中,drop_path 在每次前向传播中随机丢弃一些路径,而不是丢弃整个样本。这种方法有助于提高模型的鲁棒性和泛化能力。
drop_path 的主要目标是在网络的特定路径上引入随机性,从而提高模型的泛化能力。在实际应用中,它并不是丢弃整个样本,而是对网络中的某些路径进行随机丢弃。
初始化输入张量 x:
x = torch.randn(4, 3, 32, 32) # 4个样本,3个通道,32x32的图像
创建残差块:
res_block = ResidualBlock(3, 3, drop_prob=0.5)
res_block.train() # 设置为训练模式
output = res_block(x)
print(output.shape)