Locally-enhanced Feed-Forward（LEFF）局部增强前馈网络

最新推荐文章于 2025-05-20 19:14:08 发布

Yokon_D

最新推荐文章于 2025-05-20 19:14:08 发布

阅读量909

点赞数

分类专栏：深度学习文章标签：深度学习 python pytorch

本文链接：https://blog.csdn.net/qq_51338442/article/details/130490582

版权

深度学习专栏收录该内容

15 篇文章

订阅专栏

文章介绍了一个名为LeFF的神经网络模块，用于提升模型精度。该模块涉及线性层、depth-wise卷积和额外的通道注意力机制(ECA)。然而，实验结果显示LeFF在关注细节时过于侧重，可能不适用于小目标检测。代码展示了LeFF的实现，包括张量的重塑和转换操作。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

对于这个模块，网上相关的资料不是很多，这里我在网络中加入了此模块以期望提升模型的精度。

模块的代码如下：

class LeFF(nn.Module):
    def __init__(self, dim=1, hidden_dim=16, act_layer=nn.GELU, drop=0., use_eca=False):
        super(LeFF, self).__init__()
        self.linear1 = nn.Sequential(nn.Linear(dim, hidden_dim), act_layer())
        self.dwconv = nn.Sequential(
            nn.Conv2d(hidden_dim, hidden_dim, groups=hidden_dim, kernel_size=3, stride=1, padding=1), act_layer())
        self.linear2 = nn.Sequential(nn.Linear(hidden_dim, dim))
        self.dim = dim
        self.hidden_dim = hidden_dim
        self.eca = eca_layer_1d(dim) if use_eca else nn.Identity()

    def forward(self, x):
        # bs x hw x c
        bs, hw, c = x.size()
        hh = int(math.sqrt(hw))
        device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
        x.to(device)
        x = self.linear1(x)
        # spatial restore
        x = rearrange(x, ' b (h w) (c) -> b c h w ', h=hh, w=hh)
        # bs,hidden_dim,32x32
        x = self.dwconv(x)
        # flaten
        x = rearrange(x, ' b c h w -> b (h w) c', h=hh, w=hh)
        x = self.linear2(x)
        x = self.eca(x)
        return x
    def flops(self, H, W):
        flops = 0
        # fc1
        flops += H * W * self.dim * self.hidden_dim
        # dwconv
        flops += H * W * self.hidden_dim * 3 * 3
        # fc2
        flops += H * W * self.hidden_dim * self.dim
        print("LeFF:{%.2f}" % (flops / 1e9))
        # eca
        if hasattr(self.eca, 'flops'):
            flops += self.eca.flops()
        return flops

class eca_layer_1d(nn.Module):
    """Constructs a ECA module.
    Args:
        channel: Number of channels of the input feature map
        k_size: Adaptive selection of kernel size
    """
    def __init__(self, channel, k_size=3):
        super(eca_layer_1d, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool1d(1)
        self.conv = nn.Conv1d(1, 1, kernel_size=k_size, padding=(k_size - 1) // 2, bias=False)
        self.sigmoid = nn.Sigmoid()
        self.channel = channel
        self.k_size = k_size

    def forward(self, x):
        # b hw c
        # feature descriptor on the global spatial information  
        y = self.avg_pool(x.transpose(-1, -2))
        # Two different branches of ECA module   
        y = self.conv(y.transpose(-1, -2))
        # Multi-scale information fusion  
        y = self.sigmoid(y)
        return x * y.expand_as(x)

    def flops(self):
        flops = 0
        flops += self.channel * self.channel * self.k_size
        return flops

这里展示的代码我只使用了上面的部分，下面是源码里面摘出来的一段没有去细看。从这段代码里面可以看出来LEFF模块大概就是一个展平然后复原的过程。

为了能成功运行这个模块，在传入张量之前需要进行一个格式的转换，并且在处理之后需要进行一个还原。下面是我进行的形状转换：

  # 将张量的形状从 (b, c, h, w) 转换为 (b, c, h * w)
  x = self.dist .reshape(self.dist .size(0), self.dist .size(1), -1)
  # 将张量的形状从 (b, c, h * w) 转换为 (b, h * w, c)
  x = x.permute(0, 2, 1)

  self.LEFF = self.Leff(x)

  # 形状转换
  # 将 x 转换为一维张量，并获取各个维度的大小
  x_1d = self.LEFF.reshape(-1)
  c, hw_b = self.LEFF.size(-1), self.LEFF.numel() // self.LEFF.size(0) // self.LEFF.size(-1)
  # 计算新形状 (b, c, h, w)
  b = self.LEFF.numel() // (c * hw_b)
  h, w = int(hw_b ** 0.5), int(hw_b ** 0.5) if hw_b % 2 == 0 else int(hw_b ** 0.5) + 1
  # 使用 view() 函数将其转换为 (b, c, h, w) 的形状
  x_new = x_1d.view(b, c, h, w)

通过实验，我们发现这个模块对于细节部分的关注过于着重，导致检测结果并不是特别好，所以如果大家对于比较小的目标进行检测，倒是可以去进行尝试。