对于这个模块,网上相关的资料不是很多,这里我在网络中加入了此模块以期望提升模型的精度。
模块的代码如下:
class LeFF(nn.Module):
def __init__(self, dim=1, hidden_dim=16, act_layer=nn.GELU, drop=0., use_eca=False):
super(LeFF, self).__init__()
self.linear1 = nn.Sequential(nn.Linear(dim, hidden_dim), act_layer())
self.dwconv = nn.Sequential(
nn.Conv2d(hidden_dim, hidden_dim, groups=hidden_dim, kernel_size=3, stride=1, padding=1), act_layer())
self.linear2 = nn.Sequential(nn.Linear(hidden_dim, dim))
self.dim = dim
self.hidden_dim = hidden_dim
self.eca = eca_layer_1d(dim) if use_eca else nn.Identity()
def forward(self, x):
# bs x hw x c
bs, hw, c = x.size()
hh = int(math.sqrt(hw))
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
x.to(device)
x = self.linear1(x)
# spatial restore
x = rearrange(x, ' b (h w) (c) -> b c h w ', h=hh, w=hh)
# bs,hidden_dim,32x32
x = self.dwconv(x)
# flaten
x = rearrange(x, ' b c h w -> b (h w) c', h=hh, w=hh)
x = self.linear2(x)
x = self.eca(x)
return x
def flops(self, H, W):
flops = 0
# fc1
flops += H * W * self.dim * self.hidden_dim
# dwconv
flops += H * W * self.hidden_dim * 3 * 3
# fc2
flops += H * W * self.hidden_dim * self.dim
print("LeFF:{%.2f}" % (flops / 1e9))
# eca
if hasattr(self.eca, 'flops'):
flops += self.eca.flops()
return flops
class eca_layer_1d(nn.Module):
"""Constructs a ECA module.
Args:
channel: Number of channels of the input feature map
k_size: Adaptive selection of kernel size
"""
def __init__(self, channel, k_size=3):
super(eca_layer_1d, self).__init__()
self.avg_pool = nn.AdaptiveAvgPool1d(1)
self.conv = nn.Conv1d(1, 1, kernel_size=k_size, padding=(k_size - 1) // 2, bias=False)
self.sigmoid = nn.Sigmoid()
self.channel = channel
self.k_size = k_size
def forward(self, x):
# b hw c
# feature descriptor on the global spatial information
y = self.avg_pool(x.transpose(-1, -2))
# Two different branches of ECA module
y = self.conv(y.transpose(-1, -2))
# Multi-scale information fusion
y = self.sigmoid(y)
return x * y.expand_as(x)
def flops(self):
flops = 0
flops += self.channel * self.channel * self.k_size
return flops
这里展示的代码我只使用了上面的部分,下面是源码里面摘出来的一段没有去细看。从这段代码里面可以看出来LEFF模块大概就是一个展平然后复原的过程。
为了能成功运行这个模块,在传入张量之前需要进行一个格式的转换,并且在处理之后需要进行一个还原。下面是我进行的形状转换:
# 将张量的形状从 (b, c, h, w) 转换为 (b, c, h * w)
x = self.dist .reshape(self.dist .size(0), self.dist .size(1), -1)
# 将张量的形状从 (b, c, h * w) 转换为 (b, h * w, c)
x = x.permute(0, 2, 1)
self.LEFF = self.Leff(x)
# 形状转换
# 将 x 转换为一维张量,并获取各个维度的大小
x_1d = self.LEFF.reshape(-1)
c, hw_b = self.LEFF.size(-1), self.LEFF.numel() // self.LEFF.size(0) // self.LEFF.size(-1)
# 计算新形状 (b, c, h, w)
b = self.LEFF.numel() // (c * hw_b)
h, w = int(hw_b ** 0.5), int(hw_b ** 0.5) if hw_b % 2 == 0 else int(hw_b ** 0.5) + 1
# 使用 view() 函数将其转换为 (b, c, h, w) 的形状
x_new = x_1d.view(b, c, h, w)
通过实验,我们发现这个模块对于细节部分的关注过于着重,导致检测结果并不是特别好,所以如果大家对于比较小的目标进行检测,倒是可以去进行尝试。
以上内容仅为个人观点,请大佬批评指正。