YOLOV8目标检测主干网络改进实例与创新改进专栏
目录
论文地址:2303.08810
https://arxiv.org/pdf/2303.08810
1.完整代码获取
此专栏提供完整的改进后的YOLOv11项目文件,你也可以直接下载到本地,然后打开项目,修改数据集配置文件以及合适的网络yaml文件即可运行,操作很简单!订阅专栏的小伙伴可以私信博主,或者直接联系博主,加入yolov11改进交流群;
+Qq:1921873112
2.Biformer主干网络介绍
- BiFormer(Bilinear Attention Transformer)主干网络是一种创新的神经网络架构,主要用于计算机视觉任务,如目标检测、图像分类等。它融合了 Transformer 架构的自注意力机制优势和一些针对视觉任务的高效设计,以提高模型在视觉数据处理上的性能。
Biformer提出了一种通过双层路由的新型动态稀疏注意力,以实现具有内容感知的计算的更灵活的分配。具体来说,对于查询,首先在粗略的区域级别上删除不相关的键值对,然后在剩余候选区域(即路由区域)的联合中应用细粒度的令牌到令牌注意力。我们提供了一个简单而有效的双层路由注意力的实现,它利用稀疏性来节省计算和内存,同时只涉及对 GPU 友好的密集矩阵乘法。然后,利用所提出的双层路由注意力构建了一个新的通用视觉转换器。由于 BiFormer以查询自适应的方式处理相关标记的一小部分,而不会分散其他不相关标记的注意力,因此它具有良好的性能和高计算效率。
3. Biformer具有优势:
- 架构特点
- 双线性注意力机制(Bilinear Attention)
- 传统的 Transformer 自注意力机制计算复杂度较高,在处理高分辨率视觉图像时会面临巨大的计算量挑战。BiFormer 引入了双线性注意力机制,它通过一种更高效的方式来计算特征之间的相关性。
- 具体来说,双线性注意力将注意力计算分解为两个线性变换,这种分解方式可以有效降低计算复杂度。例如,在计算图像特征图中某个位置与其他位置的相关性时,通过双线性变换可以更快速地得到相似性权重,而不是像传统自注意力那样进行全维度的计算。
- 局部和全局感知融合
- BiFormer 能够有效地融合局部和全局信息。在视觉任务中,局部信息对于捕捉物体的细节特征很重要,比如物体的边缘、纹理等;全局信息则有助于理解物体在整个图像中的位置、类别等语义信息。
- 它通过一种分层的结构,在不同的层次上分别处理局部和全局特征,然后将它们融合在一起。比如在网络的较低层,更侧重于提取局部细节特征,而在较高层,会将局部特征与通过全局池化等方式得到的全局特征进行融合,从而使模型既能关注细节又能把握整体语义。
- 高效的特征提取模块
- 包含多个精心设计的特征提取层,这些层采用了高效的卷积和 Transformer 操作组合。例如,在一些层中,先使用卷积操作来快速提取局部特征,然后通过 Transformer 模块中的多头注意力机制来进一步处理这些特征,以获取特征之间的长距离依赖关系。
- 双线性注意力机制(Bilinear Attention)
- 性能优势
- 在目标检测任务中的表现
- 在目标检测任务中,BiFormer 主干网络能够准确地定位目标物体的位置并且对物体的类别进行精准分类。与一些传统的基于卷积神经网络(CNN)的主干网络相比,它在处理复杂场景下的小目标检测和遮挡目标检测等问题上表现出更好的性能。
- 这是因为其双线性注意力机制可以更好地捕捉目标与背景之间的关系,以及目标内部不同部分之间的关联,从而提高检测的准确性。
- 在图像分类任务中的优势
- 对于图像分类任务,BiFormer 能够提取更具代表性的图像特征。它可以从图像中挖掘出不同层次的语义信息,从低级的纹理特征到高级的类别语义特征。
- 这种多层次的特征提取能力使得它在面对不同类型的图像数据集时,能够更好地适应数据的多样性,从而获得较高的分类准确率。
- 在目标检测任务中的表现
- 应用场景
- 智能安防
- 在智能安防监控系统中,BiFormer 主干网络可以用于对监控视频中的目标进行实时检测和分类。例如,在机场、银行等场所,能够快速准确地识别出可疑人员、行李等目标,提高安防效率。
- 自动驾驶
- 在自动驾驶领域,它可以用于对道路场景中的车辆、行人、交通标志等目标进行检测和识别。帮助车辆更好地理解周围环境,为自动驾驶决策提供更准确的信息。
- 智能安防
4. Biformer网络结构图


5. yolov8-C2FBiformer yaml文件
# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
# [depth, width, max_channels]
n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPs
s: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients, 28.8 GFLOPs
m: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients, 79.3 GFLOPs
l: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
x: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs
# YOLOv8.0n backbone
backbone:
# [from, repeats, module, args]
- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
- [-1, 3, C2f_Biformer, [128, 8]]
- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
- [-1, 6, C2f_Biformer, [256, 4]]
- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
- [-1, 6, C2f_Biformer, [512, 2]]
- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
- [-1, 3, C2f, [1024, True]]
- [-1, 1, SPPF, [1024, 5]] # 9
# YOLOv8.0n head
head:
- [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- [[-1, 6], 1, Concat, [1]] # cat backbone P4
- [-1, 3, C2f, [512]] # 12
- [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- [[-1, 4], 1, Concat, [1]] # cat backbone P3
- [-1, 3, C2f, [256]] # 15 (P3/8-small)
- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 12], 1, Concat, [1]] # cat head P4
- [-1, 3, C2f, [512]] # 18 (P4/16-medium)
- [-1, 1, Conv, [512, 3, 2]]
- [[-1, 9], 1, Concat, [1]] # cat head P5
- [-1, 3, C2f, [1024]] # 21 (P5/32-large)
- [[15, 18, 21], 1, Detect, [nc]] # Detect(P3, P4, P5)
6.BIformer代码实现
from collections import OrderedDict
from functools import partial
from typing import Optional, Union
import torch
import torch.nn as nn
import torch.nn.functional as F
from einops import rearrange
from einops.layers.torch import Rearrange
from timm.models import register_model
from timm.models.layers import DropPath, to_2tuple, trunc_normal_
from timm.models.vision_transformer import _cfg
from typing import Tuple
from torch import Tensor
import numpy as np
class DWConv(nn.Module):
def __init__(self, dim):
super(DWConv, self).__init__()
self.dwconv = nn.Conv2d(dim, dim, 3, 1, 1, bias=True, groups=dim)
def forward(self, x):
x = x.permute(0, 3, 1, 2)
x = self.dwconv(x)
x = x.permute(0, 2, 3, 1)
return x
class KVGather(nn.Module):
def __init__(self, mul_weight='none'):
super().__init__()
assert mul_weight in ['none', 'soft', 'hard']
self.mul_weight = mul_weight
def forward(self, r_idx:Tensor, r_weight:Tensor, kv:Tensor):
"""
r_idx: (n, p^2, topk) tensor
r_weight: (n, p^2, topk) tensor
kv: (n, p^2, w^2, c_kq+c_v)
Return:# https://github.com/iscyy/ultralyticsPro
(n, p^2, topk, w^2, c_kq+c_v) tensor
"""
# select kv according to routing index
n, p2, w2, c_kv = kv.size()
topk = r_idx.size(-1)
# print(r_idx.size(), r_weight.size())
# FIXME: gather consumes much memory (topk times redundancy), write cuda kernel?
topk_kv = torch.gather(kv.view(n, 1, p2, w2, c_kv).expand(-1, p2, -1, -1, -1), # (n, p^2, p^2, w^2, c_kv) without mem cpy
dim=2,
index=r_idx.view(n, p2, topk, 1, 1).expand(-1, -1, -1, w2, c_kv) # (n, p^2, k, w^2, c_kv)
)
if self.mul_weight == 'soft':
topk_kv = r_weight.view(n, p2, topk, 1, 1) * topk_kv # (n, p^2, k, w^2, c_kv)
elif self.mul_weight == 'hard':
raise NotImplementedError('differentiable hard routing TBA')
return topk_kv
class QKVLinear(nn.Module):
def __init__(self, dim, qk_dim, bias=True):
super().__init__()
self.dim = dim
self.qk_dim = qk_dim
self.qkv = nn.Linear(dim, qk_dim + qk_dim + dim, bias=bias)
def forward(self, x):
q, kv = self.qkv(x).split([self.qk_dim, self.qk_dim+self.dim], dim=-1)
return q, kv
class TopkRouting(nn.Module):
"""
differentiable topk routing with scaling
Args:
qk_dim: int, feature dimension of query and key #mg
topk: int, the 'topk'
qk_scale: int or None, temperature (multiply) of softmax activation
with_param: bool, wether inorporate learnable params in routing unit
diff_routing: bool, wether make routing differentiable# https://github.com/iscyy/ultralyticsPro
soft_routing: bool, wether make output value multiplied by routing weights
"""
def __init__(self,
qk_dim,
topk=4,
qk_scale=None,
param_routing=False,
diff_routing=False):
super().__init__()
self.topk = topk
self.qk_dim = qk_dim
self.scale = qk_scale or qk_dim ** -0.5
self.diff_routing = diff_routing
# TODO: norm layer before/after linear?
self.emb = nn.Linear(qk_dim, qk_dim) if param_routing else nn.Identity()
# routing activation
self.routing_act = nn.Softmax(dim=-1)
def forward(self, query, key):
if not self.diff_routing:
query, key = query.detach(), key.detach()
query_hat, key_hat = self.emb(query), self.emb(key) # per-window pooling -> (n, p^2, c)
attn_logit = (query_hat*self.scale) @ key_hat.transpose(-2, -1) # (n, p^2, p^2)
topk_attn_logit, topk_index = torch.topk(attn_logit, k=self.topk, dim=-1) # (n, p^2, k), (n, p^2, k)
r_weight = self.routing_act(topk_attn_logit) # (n, p^2, k)
return r_weight, topk_index
class BiLevelRoutingAttention(nn.Module):
def __init__(self, dim,
num_heads=8,
n_win=7,
qk_dim=None,
qk_scale=None,
kv_per_win=4,
kv_downsample_ratio=4,
kv_downsample_kernel=None,
kv_downsample_mode='identity',
topk=4, param_attention="qkvo",
param_routing=False,
diff_routing=False,
soft_routing=False,
side_dwconv=3,
auto_pad=True):
super().__init__()
# local attention setting
self.dim = dim
self.n_win = n_win # Wh, Ww
self.num_heads = num_heads
self.qk_dim = qk_dim or dim
assert self.qk_dim % num_heads == 0 and self.dim % num_heads==0, 'qk_dim and dim must be divisible by num_heads!'
self.scale = qk_scale or self.qk_dim ** -0.5
self.lepe = nn.Conv2d(dim, dim, kernel_size=side_dwconv, stride=1, padding=side_dwconv//2, groups=dim) if side_dwconv > 0 else \
lambda x: torch.zeros_like(x)
################ global routing setting ###mg##############
self.topk = topk
self.param_routing = param_routing
self.diff_routing = diff_routing
self.soft_routing = soft_routing
# router
assert not (self.param_routing and not self.diff_routing) # cannot be with_param=True and diff_routing=False
self.router = TopkRouting(qk_dim=self.qk_dim,
qk_scale=self.scale,
topk=self.topk,
diff_routing=self.diff_routing,
param_routing=self.param_routing)
if self.soft_routing: # soft routing, always diffrentiable (if no detach)
mul_weight = 'soft'
elif self.diff_routing: # hard differentiable routing
mul_weight = 'hard'
else: # hard non-differentiable routing
mul_weight = 'none'
self.kv_gather = KVGather(mul_weight=mul_weight)
# qkv mapping (shared by both global routing and local attention)
self.param_attention = param_attention
if self.param_attention == 'qkvo':
self.qkv = QKVLinear(self.dim, self.qk_dim)
self.wo = nn.Linear(dim, dim)
elif self.param_attention == 'qkv':
self.qkv = QKVLinear(self.dim, self.qk_dim)
self.wo = nn.Identity()
else:
raise ValueError(f'param_attention mode {self.param_attention} is not surpported!')
self.kv_downsample_mode = kv_downsample_mode
self.kv_per_win = kv_per_win
self.kv_downsample_ratio = kv_downsample_ratio
self.kv_downsample_kenel = kv_downsample_kernel
if self.kv_downsample_mode == 'ada_avgpool':
assert self.kv_per_win is not None
self.kv_down = nn.AdaptiveAvgPool2d(self.kv_per_win)
elif self.kv_downsample_mode == 'ada_maxpool':
assert self.kv_per_win is not None
self.kv_down = nn.AdaptiveMaxPool2d(self.kv_per_win)
elif self.kv_downsample_mode == 'maxpool':
assert self.kv_downsample_ratio is not None
self.kv_down = nn.MaxPool2d(self.kv_downsample_ratio) if self.kv_downsample_ratio > 1 else nn.Identity()
elif self.kv_downsample_mode == 'avgpool':
assert self.kv_downsample_ratio is not None
self.kv_down = nn.AvgPool2d(self.kv_downsample_ratio) if self.kv_downsample_ratio > 1 else nn.Identity()
elif self.kv_downsample_mode == 'identity': # no kv downsampling
self.kv_down = nn.Identity()
elif self.kv_downsample_mode == 'fracpool':
raise NotImplementedError('fracpool policy is not implemented yet!')
elif kv_downsample_mode == 'conv':
# TODO: need to consider the case where k != v so that need two downsample modules
raise NotImplementedError('conv policy is not implemented yet!')
else:
raise ValueError(f'kv_down_sample_mode {self.kv_downsaple_mode} is not surpported!')
# softmax for local attention
self.attn_act = nn.Softmax(dim=-1)
self.auto_pad=auto_pad
def forward(self, x):
N, H, W, C = x.size()
# patchify, (n, p^2, w, w, c), keep 2d window as we need 2d pooling to reduce kv size
x = rearrange(x, "n (j h) (i w) c -> n (j i) h w c", j=self.n_win, i=self.n_win)
q, kv = self.qkv(x)
# pixel-wise qkv
# q_pix: (n, p^2, w^2, c_qk)
# kv_pix: (n, p^2, h_kv*w_kv, c_qk+c_v)# https://github.com/iscyy/ultralyticsPro
q_pix = rearrange(q, 'n p2 h w c -> n p2 (h w) c')
kv_pix = self.kv_down(rearrange(kv, 'n p2 h w c -> (n p2) c h w'))
kv_pix = rearrange(kv_pix, '(n j i) c h w -> n (j i) (h w) c', j=self.n_win, i=self.n_win)
q_win, k_win = q.mean([2, 3]), kv[..., 0:self.qk_dim].mean([2, 3]) # window-wise qk, (n, p^2, c_qk), (n, p^2, c_qk)
##################side_dwconv(lepe)##################
# NOTE: call contiguous to avoid gradient warning when using ddp
lepe = self.lepe(rearrange(kv[..., self.qk_dim:], 'n (j i) h w c -> n c (j h) (i w)', j=self.n_win, i=self.n_win).contiguous())
lepe = rearrange(lepe, 'n c (j h) (i w) -> n (j h) (i w) c', j=self.n_win, i=self.n_win)
############ gather q dependent k/v #################
r_weight, r_idx = self.router(q_win, k_win) # both are (n, p^2, topk) tensors
kv_pix_sel = self.kv_gather(r_idx=r_idx, r_weight=r_weight, kv=kv_pix) #(n, p^2, topk, h_kv*w_kv, c_qk+c_v)
k_pix_sel, v_pix_sel = kv_pix_sel.split([self.qk_dim, self.dim], dim=-1)
######### do attention as normal ####################
k_pix_sel = rearrange(k_pix_sel, 'n p2 k w2 (m c) -> (n p2) m c (k w2)', m=self.num_heads) # flatten to BMLC, (n*p^2, m, topk*h_kv*w_kv, c_kq//m) transpose here?
v_pix_sel = rearrange(v_pix_sel, 'n p2 k w2 (m c) -> (n p2) m (k w2) c', m=self.num_heads) # flatten to BMLC, (n*p^2, m, topk*h_kv*w_kv, c_v//m)
q_pix = rearrange(q_pix, 'n p2 w2 (m c) -> (n p2) m w2 c', m=self.num_heads) # to BMLC tensor (n*p^2, m, w^2, c_qk//m)
# param-free multihead attention
attn_weight = (q_pix * self.scale) @ k_pix_sel # (n*p^2, m, w^2, c) @ (n*p^2, m, c, topk*h_kv*w_kv) -> (n*p^2, m, w^2, topk*h_kv*w_kv)
attn_weight = self.attn_act(attn_weight)
out = attn_weight @ v_pix_sel # (n*p^2, m, w^2, topk*h_kv*w_kv) @ (n*p^2, m, topk*h_kv*w_kv, c) -> (n*p^2, m, w^2, c)
out = rearrange(out, '(n j i) m (h w) c -> n (j h) (i w) (m c)', j=self.n_win, i=self.n_win,
h=H//self.n_win, w=W//self.n_win)
out = out + lepe
# output linear
out = self.wo(out)
return out
class BiFormerBlock(nn.Module):
def __init__(self, dim,
outdim,
n_win,
drop_path=0.,
layer_scale_init_value=-1,
num_heads=8,
qk_dim=None,
qk_scale=None,
kv_per_win=4,
kv_downsample_ratio=4,
kv_downsample_kernel=None,
kv_downsample_mode='ada_avgpool',
topk=4,
param_attention="qkvo",
param_routing=False,
diff_routing=False,
soft_routing=False,
mlp_ratio=4,
mlp_dwconv=False,
side_dwconv=5,
before_attn_dwconv=3,
pre_norm=True,
auto_pad=False):
super().__init__()
qk_dim = qk_dim or dim
# modules
if before_attn_dwconv > 0:
self.pos_embed = nn.Conv2d(dim, dim, kernel_size=before_attn_dwconv, padding=1, groups=dim)
else:
self.pos_embed = lambda x: 0
self.norm1 = nn.LayerNorm(dim, eps=1e-6) # important to avoid attention collapsing
if topk > 0:
self.attn = BiLevelRoutingAttention(dim=dim, num_heads=num_heads, n_win=n_win, qk_dim=qk_dim,
qk_scale=qk_scale, kv_per_win=kv_per_win, kv_downsample_ratio=kv_downsample_ratio,
kv_downsample_kernel=kv_downsample_kernel, kv_downsample_mode=kv_downsample_mode,
topk=topk, param_attention=param_attention, param_routing=param_routing,
diff_routing=diff_routing, soft_routing=soft_routing, side_dwconv=side_dwconv,
auto_pad=auto_pad)
elif topk == 0:
self.attn = nn.Sequential(Rearrange('n h w c -> n c h w'), # compatiability
nn.Conv2d(dim, dim, 1), # pseudo qkv linear
nn.Conv2d(dim, dim, 5, padding=2, groups=dim), # pseudo attention
nn.Conv2d(dim, dim, 1), # pseudo out linear
Rearrange('n c h w -> n h w c')# 🥭
)
self.norm2 = nn.LayerNorm(dim, eps=1e-6)
self.mlp = nn.Sequential(nn.Linear(dim, int(mlp_ratio*dim)),
DWConv(int(mlp_ratio*dim)) if mlp_dwconv else nn.Identity(),
nn.GELU(),
nn.Linear(int(mlp_ratio*dim), dim)
) #mg
self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
# tricks: layer scale & pre_norm/post_norm
if layer_scale_init_value > 0:
self.use_layer_scale = True
self.gamma1 = nn.Parameter(layer_scale_init_value * torch.ones((dim)), requires_grad=True)
self.gamma2 = nn.Parameter(layer_scale_init_value * torch.ones((dim)), requires_grad=True)
else:
self.use_layer_scale = False
self.pre_norm = pre_norm
self.outdim = outdim
def forward(self, x):
# conv pos embedding
x = x + self.pos_embed(x)
# permute to NHWC tensor for attention & mlp #mg
x = x.permute(0, 2, 3, 1) # (N, C, H, W) -> (N, H, W, C)
# attention & mlp
if self.pre_norm:
if self.use_layer_scale:
x = x + self.drop_path(self.gamma1 * self.attn(self.norm1(x))) # (N, H, W, C)
x = x + self.drop_path(self.gamma2 * self.mlp(self.norm2(x))) # (N, H, W, C)
else:
x = x + self.drop_path(self.attn(self.norm1(x))) # (N, H, W, C)
x = x + self.drop_path(self.mlp(self.norm2(x))) # (N, H, W, C)
else:
if self.use_layer_scale:
x = self.norm1(x + self.drop_path(self.gamma1 * self.attn(x))) # (N, H, W, C)
x = self.norm2(x + self.drop_path(self.gamma2 * self.mlp(x))) # (N, H, W, C)
else:
x = self.norm1(x + self.drop_path(self.attn(x))) # (N, H, W, C)
x = self.norm2(x + self.drop_path(self.mlp(x))) # (N, H, W, C)
# permute back
x = x.permute(0, 3, 1, 2) # (N, H, W, C) -> (N, C, H, W)
return x
# from ultralytics.nn.modules.block import Conv
def autopad(k, p=None, d=1): # kernel, padding, dilation
"""Pad to 'same' shape outputs."""
if d > 1:
k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k] # actual kernel-size
if p is None:
p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad
return p
class Conv(nn.Module):
"""Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)."""
default_act = nn.SiLU() # default activation
def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
"""Initialize Conv layer with given arguments including activation."""
super().__init__()
self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
self.bn = nn.BatchNorm2d(c2)
self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()
def forward(self, x):
"""Apply convolution, batch normalization and activation to input tensor."""
return self.act(self.bn(self.conv(x)))
def forward_fuse(self, x):
"""Perform transposed convolution of 2D data."""
return self.act(self.conv(x))
class RepConvN(nn.Module):
"""RepConv is a basic rep-style block, including training and deploy status
This code is based on https://github.com/DingXiaoH/RepVGG/blob/main/repvgg.py
"""
default_act = nn.SiLU() # default activation
def __init__(self, c1, c2, k=3, s=1, p=1, g=1, d=1, act=True, bn=False, deploy=False):
super().__init__()
assert k == 3 and p == 1
self.g = g
self.c1 = c1
self.c2 = c2
self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()
self.bn = None
self.conv1 = Conv(c1, c2, k, s, p=p, g=g, act=False)
self.conv2 = Conv(c1, c2, 1, s, p=(p - k // 2), g=g, act=False)
def forward_fuse(self, x):
"""Forward process"""
return self.act(self.conv(x))
def forward(self, x):
"""Forward process"""
id_out = 0 if self.bn is None else self.bn(x)
return self.act(self.conv1(x) + self.conv2(x) + id_out)
def get_equivalent_kernel_bias(self):
kernel3x3, bias3x3 = self._fuse_bn_tensor(self.conv1)
kernel1x1, bias1x1 = self._fuse_bn_tensor(self.conv2)
kernelid, biasid = self._fuse_bn_tensor(self.bn)
return kernel3x3 + self._pad_1x1_to_3x3_tensor(kernel1x1) + kernelid, bias3x3 + bias1x1 + biasid
def _avg_to_3x3_tensor(self, avgp):
channels = self.c1
groups = self.g
kernel_size = avgp.kernel_size
input_dim = channels // groups
k = torch.zeros((channels, input_dim, kernel_size, kernel_size))
k[np.arange(channels), np.tile(np.arange(input_dim), groups), :, :] = 1.0 / kernel_size ** 2
return k
def _pad_1x1_to_3x3_tensor(self, kernel1x1):
if kernel1x1 is None:
return 0
else:
return torch.nn.functional.pad(kernel1x1, [1, 1, 1, 1])
def _fuse_bn_tensor(self, branch):
if branch is None:
return 0, 0
if isinstance(branch, Conv):
kernel = branch.conv.weight
running_mean = branch.bn.running_mean
running_var = branch.bn.running_var
gamma = branch.bn.weight
beta = branch.bn.bias
eps = branch.bn.eps
elif isinstance(branch, nn.BatchNorm2d):
if not hasattr(self, 'id_tensor'):
input_dim = self.c1 // self.g
kernel_value = np.zeros((self.c1, input_dim, 3, 3), dtype=np.float32)
for i in range(self.c1):
kernel_value[i, i % input_dim, 1, 1] = 1
self.id_tensor = torch.from_numpy(kernel_value).to(branch.weight.device)
kernel = self.id_tensor
running_mean = branch.running_mean
running_var = branch.running_var
gamma = branch.weight
beta = branch.bias
eps = branch.eps
std = (running_var + eps).sqrt()
t = (gamma / std).reshape(-1, 1, 1, 1)
return kernel * t, beta - running_mean * gamma / std
def fuse_convs(self):
if hasattr(self, 'conv'):
return
kernel, bias = self.get_equivalent_kernel_bias()
self.conv = nn.Conv2d(in_channels=self.conv1.conv.in_channels,
out_channels=self.conv1.conv.out_channels,
kernel_size=self.conv1.conv.kernel_size,
stride=self.conv1.conv.stride,
padding=self.conv1.conv.padding,
dilation=self.conv1.conv.dilation,
groups=self.conv1.conv.groups,
bias=True).requires_grad_(False)
self.conv.weight.data = kernel
self.conv.bias.data = bias
for para in self.parameters():
para.detach_()
self.__delattr__('conv1')
self.__delattr__('conv2')
if hasattr(self, 'nm'):
self.__delattr__('nm')
if hasattr(self, 'bn'):
self.__delattr__('bn')
if hasattr(self, 'id_tensor'):
self.__delattr__('id_tensor')
class RepNBottleneck(nn.Module):
# Standard bottleneck# https://github.com/iscyy/ultralyticsPro
def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5): # ch_in, ch_out, shortcut, kernels, groups, expand
super().__init__()
c_ = int(c2 * e) # hidden channels
self.cv1 = RepConvN(c1, c_, k[0], 1)
self.cv2 = Conv(c_, c2, k[1], 1, g=g)
self.add = shortcut and c1 == c2
def forward(self, x):
return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))
class Bottleneck(nn.Module):
# Standard bottleneck
def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5): # ch_in, ch_out, shortcut, kernels, groups, expand
super().__init__()
c_ = int(c2 * e) # hidden channels
self.cv1 = Conv(c1, c_, k[0], 1)
self.cv2 = Conv(c_, c2, k[1], 1, g=g)
self.add = shortcut and c1 == c2
def forward(self, x):
return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))
#################
class CPNBiF(nn.Module):
def __init__(self, c1, c2, n=1, win=64, shortcut=True, g=1, e=0.5):
super().__init__()
self.c = int(c2 * e)
self.cv1 = Conv(c1, 2 * self.c, 1, 1)
self.cv2 = Conv(2 * self.c, c2, 1)
self.m = nn.Sequential(*(BiFormerBlock(self.c, self.c, win) for _ in range(n)))
def forward(self, x):
a, b = self.cv1(x).chunk(2, 1)
return self.cv2(torch.cat((self.m(a), b), 1))
class C3_Biformer(nn.Module):
# C3_Biformer Bottleneck with 3 convolutions
def __init__(self, c1, c2, n=1, win=64, shortcut=True, g=1, e=0.5): # ch_in, ch_out, number, shortcut, groups, expansion
super().__init__()
c_ = int(c2 * e) # hidden channels
self.cv1 = Conv(c1, c_, 1, 1)
self.cv2 = Conv(c1, c_, 1, 1)
self.cv3 = Conv(2 * c_, c2, 1) # act=FReLU(c2)
self.m = nn.Sequential(*(BiFormerBlock(c_, c_, win) for _ in range(n)))
def forward(self, x):
return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))
class C2f_Biformer(nn.Module):
"""Faster Implementation of CSP Bottleneck with 2 convolutions."""
def __init__(self, c1, c2, n=1, win=64, shortcut=False, g=1, e=0.5):
"""Initialize CSP bottleneck layer with two convolutions with arguments ch_in, ch_out, number, shortcut, groups,
expansion.
"""
super().__init__()
self.c = int(c2 * e)
self.cv1 = Conv(c1, 2 * self.c, 1, 1)
self.cv2 = Conv((2 + n) * self.c, c2, 1)
self.m = nn.Sequential(*(BiFormerBlock(self.c, self.c, win) for _ in range(n)))
def forward(self, x):
"""Forward pass through C2f layer."""
y = list(self.cv1(x).chunk(2, 1))
y.extend(m(y[-1]) for m in self.m)
return self.cv2(torch.cat(y, 1))
def forward_split(self, x):
"""Forward pass using split() instead of chunk()."""
y = list(self.cv1(x).split((self.c, self.c), 1))
y.extend(m(y[-1]) for m in self.m)
return self.cv2(torch.cat(y, 1))
class CSCBiF(nn.Module):
def __init__(self, c1, c2, n=1, win=64, shortcut=True, k=(1, 1), g=1, e=0.5): # ch_in, ch_out, number, shortcut, groups, expansion
super(CSCBiF, self).__init__()
c_ = int(c2 * e) # hidden channels
self.cv1 = Conv(c1, c_, k[0], 1)
self.cv2 = Conv(c1, c_, k[0], 1)
self.cv3 = Conv(c_, c_, k[0], 1)
self.cv4 = Conv(2 * c_, c2, 1, 1)
self.m = nn.Sequential(*(BiFormerBlock(c_, c_, win) for _ in range(n)))
def forward(self, x):
y1 = self.cv3(self.m(self.cv1(x)))
y2 = self.cv2(x)
return self.cv4(torch.cat((y1, y2), dim=1))
class ReNBC(nn.Module):
def __init__(self, c1, c2, n=1, win=8, isUse=False, shortcut=True, g=1, e=0.5): # ch_in, ch_out, number, shortcut, groups, expansion
super().__init__()
c_ = int(c2 * e) # hidden channels# https://github.com/iscyy/ultralyticsPro
self.cv1 = Conv(c1, c_, 1, 1)
self.cv2 = Conv(c1, c_, 1, 1)
self.cv3 = Conv(2 * c_, c2, 1) # optional act=FReLU(c2)
if isUse:
self.m = nn.Sequential(*(RepNBottleneck(c_, c_, shortcut, g, e=1.0) for _ in range(n)))
else:
self.m = nn.Sequential(*(BiFormerBlock(c_, c_, win) for _ in range(n)))
def forward(self, x):
return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), 1))
class ReNLANBiF(nn.Module):
# ReNLANBiF Block
def __init__(self, c1, c2, c3, c4, extra, c=True, n=1): # ch_in, ch_out, number, shortcut, groups, expansion
super().__init__()
self.c = c3//2
self.cv1 = Conv(c1, c3, 1, 1)
self.cv2 = nn.Sequential(ReNBC(c3//2, c4, n, win=extra, isUse=False))
self.cv3 = nn.Sequential(ReNBC(c4, c4, n, win=extra, isUse=False))
self.cv4 = Conv(c3+(2*c4), c2, 1, 1)
def forward(self, x):
y = list(self.cv1(x).chunk(2, 1))
y.extend((m(y[-1])) for m in [self.cv2, self.cv3])
return self.cv4(torch.cat(y, 1))
def forward_split(self, x):
y = list(self.cv1(x).split((self.c, self.c), 1))
y.extend(m(y[-1]) for m in [self.cv2, self.cv3])
return self.cv4(torch.cat(y, 1))
7.训练成功展示

此篇博客到这里就告一段落了,后续我还会继续更新很多改进,大家感兴趣的可以关注我的专栏加入 Qq交流群。
4409

被折叠的 条评论
为什么被折叠?



