YOLO v5 前向传播实验记录（demo时）

匿名的魔术师

已于 2022-10-18 23:15:33 修改

阅读量1.4k

点赞数 3

文章标签： python 深度学习人工智能

于 2022-10-18 23:14:52 首次发布

本文链接：https://blog.csdn.net/allrubots/article/details/127385048

版权

一、输入图片处理

二、前向传播过程（demo时）

1、结合代码与三、网络结构对上图的解读，从而更好了解整个网络

注意，demo和训练时对输入的处理不同，此为demo过程的前向推理过程

一、输入图片处理

# detect.py - 108
dataset = LoadImages(source, img_size=imgsz, stride=stride, auto=pt, vid_stride=vid_stride)

主要由这个类去处理，然后关键的函数为这个类中的

# dataloaders.py - 311       
im = letterbox(im0, self.img_size, stride=self.stride, auto=self.auto)[0]  # padded resize  调用的函数
im = im.transpose((2, 0, 1))[::-1]  # HWC to CHW, BGR to RGB
im = np.ascontiguousarray(im)  # contiguous

letterbox 函数，对图片的处理过程为

1、new_shape = (640, 640)，其作用为保证输入到网络中的img的h w 最大为640
2、计算 640 和 原img的h w 和  的比例并取最小的比例。注意，这里是 640 / 原img的hw 取最小。也就是以
原img的h w 最大的边为基准，从而保证原img的内容不丢失
3、按照最小的比例 缩小 或 放大，即原img的h w 乘上 比例
4、为了保证 输入到model中的img h w 为stride的整数倍，需要对其进行填充。如下：
dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding
if auto:  # minimum rectangle
    dw, dh = np.mod(dw, stride), np.mod(dh, stride)  # wh padding  np.mod 取模运算

得到填充量大小，然后除以2为单边填充量大小

dw /= 2  # divide padding into 2 sides
dh /= 2
5、填充，即得输入到 model 中的 img

举例如下，上面的处理后上下边有填充。下面的由于reshape后已经是stride的整数倍，不需要padding。

所以，总结 input img 的尺寸并不是固定的640X640，而是保证最大的一边为640，另一边为stride的整数倍即可。

二、前向传播过程（demo时）

图1

1、结合代码与三、网络结构对上图的解读，从而更好了解整个网络

借用网上的图片，并在其上作了标注（图1所示），其中序号对应着 model 中的模块，model见下面的三、网络结构。（注意，个人对照着看感觉上图中的8和9反了。）

前向传播过程主要由下面函数完成

yolo.py ---114

    def _forward_once(self, x, profile=False, visualize=False):
        y, dt = [], []  # outputs
        # print('================')
        for m in self.model:  # m 与 model <c-8>
            # print('====================')
            # print("i is {}".format(m.i))
            # print('====================')
            if m.f != -1:  # if not from previous layer
                # print("when i is {},f is {}".format(m.i, m.f))
                x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f]  # from earlier layers
            if profile:
                self._profile_one_layer(m, x, dt)
            x = m(x)  # run
            y.append(x if m.i in self.save else None)  # save output  这里注意 m.i 这个属性
            if visualize:
                feature_visualization(x, m.type, m.i, save_dir=visualize)
        return x

其中重要的是 m 这个变量，它循环拿出 model 中的模块，在实验过程种打印了一下什么时候 m.f会不是-1，其中 m.i 就是模块的序号，打印出的信息如下所示

when i is 12,f is [-1, 6]
when i is 16,f is [-1, 4]
when i is 19,f is [-1, 14]
when i is 22,f is [-1, 10]
when i is 24,f is [17, 20, 23]

可以看出，在 i = 12, 16, 19, 22, 24 时触发 if 条件，结合着图1 以及下面的三、中不难发现，其就是 Contact 模块对应的序号（除了最后一个，为Detect模块，这也正对应着图1），也就是说当到 contact 层时需要前面的输出来进行 tensor拼接（torch.cat），而且注意f 的值，-1为前面模块的输出，另一个就是先前的输出。到这里，代码中的 y 就用上了，debug一下发现 self.save 如下

可以发现 正对应着并且完全包留了包括 f 中需要 contact 层的输出。所以 y 的作用就体现出来了，保留进行 contact 模块需要的对应位置的输出，且 y 中其余都为 None（正如代码中所示，else None）。

结合着图1，以及 'when i is , f is ' 中的内容，总结如下：

1、⑫ contact 模块拼接 ⑪和⑥

2、⑯ contact 模块拼接 ⑮和④

3、⑲ contact 模块拼接 ⑱和⑭

4、㉒ contact 模块拼接㉑和⑩

5、㉔ Detect head 接收来自 ⑰、⑳ 和㉓，所以最终输出三个尺度下的输出

2、代码中各个模块的定义以及功能

模块的定义都在 common.py 中

注意：这些模块由于都继承了 nn.Model 父类，所以在执行时调用的是 forward 函数，当加载模型时，也就是下面所示

# detect.py ---85
model = DetectMultiBackend(weights, device=device, dnn=dnn, data=data, fp16=half)

跳转到

# common.py ---339        
if pt:  # PyTorch
     model = attempt_load(weights if isinstance(weights, list) else w, device=device, inplace=True, fuse=fuse)  # 搭建的网络，来自权重文件

加载模型时已经创建了 model 的实例，所以各模块已经在这时初始化完毕了，也就是 __init__中的属性 self 已经根据权重文件中传入的参数（比如输入输出通道 c1 、c2 ，shortcut等）创建完成。

（1） Conv 模块

class Conv(nn.Module):
    # Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)
    default_act = nn.SiLU()  # default activation

    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
        super().__init__()
        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
        self.bn = nn.BatchNorm2d(c2)
        self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()

    def forward(self, x):  # 有 bn 层执行这个
        return self.act(self.bn(self.conv(x)))

    def forward_fuse(self, x):  # 无bn层 执行这个
        return self.act(self.conv(x))

基础的模块。如三中所示，没有bn层，所以直接卷积 + 激活函数，其中 SiLU 激活函数如下所示

导数为（详见：23种激活函数）

（2） C3 模块

class C3(nn.Module):
    # CSP Bottleneck with 3 convolutions
    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):  # ch_in, ch_out, number, shortcut, groups, expansion
        super().__init__()
        c_ = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c1, c_, 1, 1)
        self.cv3 = Conv(2 * c_, c2, 1)  # optional act=FReLU(c2)
        self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, e=1.0) for _ in range(n)))

    def forward(self, x):
        return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), 1))

class Bottleneck(nn.Module):
    # Standard bottleneck
    def __init__(self, c1, c2, shortcut=True, g=1, e=0.5):  # ch_in, ch_out, shortcut, groups, expansion
        super().__init__()
        c_ = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c_, c2, 3, 1, g=g)
        self.add = shortcut and c1 == c2

    def forward(self, x):
        return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))

这里将 Bottleneck 和 C3放一起了。 C3 首先执行 self.cv1(x)), self.cv2(x)，其都为 Conv 模块，然后执行 Bottleneck 模块，该模块里依然包含 Conv模块，连续执行两次Conv模块然后将输出与输入加和（结合着参考图1中的该模块），当输入和输出通道不相等是不执行加和。

（3）SSPF模块

class SPPF(nn.Module):
    # Spatial Pyramid Pooling - Fast (SPPF) layer for YOLOv5 by Glenn Jocher
    def __init__(self, c1, c2, k=5):  # equivalent to SPP(k=(5, 9, 13))
        super().__init__()
        c_ = c1 // 2  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c_ * 4, c2, 1, 1)
        self.m = nn.MaxPool2d(kernel_size=k, stride=1, padding=k // 2)

    def forward(self, x):
        x = self.cv1(x)
        with warnings.catch_warnings():
            warnings.simplefilter('ignore')  # suppress torch 1.9.0 max_pool2d() warning
            y1 = self.m(x)
            y2 = self.m(y1)
            return self.cv2(torch.cat((x, y1, y2, self.m(y2)), 1))  # 拼接 四个 ，然后 再卷积

依然结合着图1 中的该模块来看。其先执行Conv模块，然后连续进行三次最大池化（return之前又进行了一次），保留这三次的结果 并与输入拼接，然后再执行Conv模块。

（4）Upsample

其通过 hook 直接调用 pytorch库里的上采样类 Upsample

def forward(self, input: Tensor) -> Tensor:
    return F.interpolate(input, self.size, self.scale_factor, self.mode, self.align_corners,
                             recompute_scale_factor=self.recompute_scale_factor)

通过双线性插值扩大特征图的尺寸，完成上采样。

（5） Detect 模块

检测头，他将来自不同尺度下的特征图通道统一为 255。

class Detect(nn.Module):
    # YOLOv5 Detect head for detection models
    stride = None  # strides computed during build
    dynamic = False  # force grid reconstruction
    export = False  # export mode

    def __init__(self, nc=80, anchors=(), ch=(), inplace=True):  # detection layer
        super().__init__()
        # self.anchors = anchors
        self.nc = nc  # number of classes  80
        self.no = nc + 5  # number of outputs per anchor  85
        self.nl = len(anchors)  # number of detection layers  3  anchors 为设置的锚框的参数，shape为（3，3，2），表示各层的特征图每个位置设置的锚框数量
        self.na = len(anchors[0]) // 2  # number of anchors  3
        self.grid = [torch.empty(0) for _ in range(self.nl)]  # init grid
        self.anchor_grid = [torch.empty(0) for _ in range(self.nl)]  # init anchor grid
        self.register_buffer('anchors', torch.tensor(anchors).float().view(self.nl, -1, 2))  # shape(nl,na,2)
        self.m = nn.ModuleList(nn.Conv2d(x, self.no * self.na, 1) for x in ch)  # output conv
        self.inplace = inplace  # use inplace ops (e.g. slice assignment)

    def forward(self, x):  # 举例 x {list:3}  Tensor:(1,128,80,80), Tensor:(1,256,40,40), Tensor:(1,512,20,20)
        z = []  # inference output
        for i in range(self.nl):  # 举例 i：0  分通道处理
            x[i] = self.m[i](x[i])  # conv  举例 x {list:3}  Tensor:(1,255,80,80), Tensor:(1,256,40,40), Tensor:(1,512,20,20)
            bs, _, ny, nx = x[i].shape  # x(bs,255,20,20) to x(bs,3,20,20,85)  举例 bs:1, _ : 255, ny:80, nx:80
            x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()  # 举例 x {list:3}  Tensor:(1,3,80,80,85), Tensor:(1,256,40,40), Tensor:(1,512,20,20)

            if not self.training:  # inference
                if self.dynamic or self.grid[i].shape[2:4] != x[i].shape[2:4]:  # 换输入后重新 设定锚框
                    self.grid[i], self.anchor_grid[i] = self._make_grid(nx, ny, i)  # 举例 grid {list:3}  Tensor:(1,3,80,80,2),Tensor:(1,3,42,28,2),Tensor:(1,3,21,14,2)
                                                                                # anchor_grid {list:3}  Tensor:(1,3,80,80,2),Tensor:(1,3,42,28,2),Tensor:(1,3,21,14,2)
                                                                                # 也是按通道处理，只改变当前的，之后的还是原来的还没做改变呢
                                                                                # 其中 grid 为特征图的坐标， anchor_grid为原图的点坐标
                    xy, wh, conf, mask = x[i].split((2, 2, self.nc + 1, self.no - self.nc - 5), 4)  #
                    xy = (xy.sigmoid() * 2 + self.grid[i]) * self.stride[i]  # xy  结合着锚框的标签设定，逆运算求取 预测 的 xy
                    wh = (wh.sigmoid() * 2) ** 2 * self.anchor_grid[i]  # wh   同上，逆运算 求取 wh
                    y = torch.cat((xy, wh, conf.sigmoid(), mask), 4)  # 最终的预测，这只是一个尺度下的
                else:  # Detect (boxes only)
                    xy, wh, conf = x[i].sigmoid().split((2, 2, self.nc + 1), 4)
                    xy = (xy * 2 + self.grid[i]) * self.stride[i]  # xy
                    wh = (wh * 2) ** 2 * self.anchor_grid[i]  # wh
                    y = torch.cat((xy, wh, conf), 4)
                z.append(y.view(bs, self.na * nx * ny, self.no))  # 全部尺度下的， 整成相应输出的形状

        return x if self.training else (torch.cat(z, 1),) if self.export else (torch.cat(z, 1), x)  

    def _make_grid(self, nx=20, ny=20, i=0, torch_1_10=check_version(torch.__version__, '1.10.0')):
        d = self.anchors[i].device
        t = self.anchors[i].dtype
        shape = 1, self.na, ny, nx, 2  # grid shape
        y, x = torch.arange(ny, device=d, dtype=t), torch.arange(nx, device=d, dtype=t)
        yv, xv = torch.meshgrid(y, x, indexing='ij') if torch_1_10 else torch.meshgrid(y, x)  # torch>=0.7 compatibility
        grid = torch.stack((xv, yv), 2).expand(shape) - 0.5  # add grid offset, i.e. y = 2.0 * x - 0.5
        anchor_grid = (self.anchors[i] * self.stride[i]).view((1, self.na, 1, 1, 2)).expand(shape)  # 乘上stride，反射回原图
        return grid, anchor_grid

其中要注意锚框的设立。如果输入变了就会重新设立。

三、网络结构

其为model中的结构，见 yolo.py --- 116

for m in self.model:

中 m 拿出的就是其中的层，可见其并不包含 bn层。

Sequential(
  (0): Conv(
    (conv): Conv2d(3, 32, kernel_size=(6, 6), stride=(2, 2), padding=(2, 2))
    (act): SiLU(inplace=True)
  )
  (1): Conv(
    (conv): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (act): SiLU(inplace=True)
  )
  (2): C3(
    (cv1): Conv(
      (conv): Conv2d(64, 32, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU(inplace=True)
    )
    (cv2): Conv(
      (conv): Conv2d(64, 32, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU(inplace=True)
    )
    (cv3): Conv(
      (conv): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU(inplace=True)
    )
    (m): Sequential(
      (0): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(32, 32, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU(inplace=True)
        )
        (cv2): Conv(
          (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU(inplace=True)
        )
      )
    )
  )
  (3): Conv(
    (conv): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (act): SiLU(inplace=True)
  )
  (4): C3(
    (cv1): Conv(
      (conv): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU(inplace=True)
    )
    (cv2): Conv(
      (conv): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU(inplace=True)
    )
    (cv3): Conv(
      (conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU(inplace=True)
    )
    (m): Sequential(
      (0): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU(inplace=True)
        )
        (cv2): Conv(
          (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU(inplace=True)
        )
      )
      (1): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU(inplace=True)
        )
        (cv2): Conv(
          (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU(inplace=True)
        )
      )
    )
  )
  (5): Conv(
    (conv): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (act): SiLU(inplace=True)
  )
  (6): C3(
    (cv1): Conv(
      (conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU(inplace=True)
    )
    (cv2): Conv(
      (conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU(inplace=True)
    )
    (cv3): Conv(
      (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU(inplace=True)
    )
    (m): Sequential(
      (0): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU(inplace=True)
        )
        (cv2): Conv(
          (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU(inplace=True)
        )
      )
      (1): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU(inplace=True)
        )
        (cv2): Conv(
          (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU(inplace=True)
        )
      )
      (2): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU(inplace=True)
        )
        (cv2): Conv(
          (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU(inplace=True)
        )
      )
    )
  )
  (7): Conv(
    (conv): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (act): SiLU(inplace=True)
  )
  (8): C3(
    (cv1): Conv(
      (conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU(inplace=True)
    )
    (cv2): Conv(
      (conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU(inplace=True)
    )
    (cv3): Conv(
      (conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU(inplace=True)
    )
    (m): Sequential(
      (0): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU(inplace=True)
        )
        (cv2): Conv(
          (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU(inplace=True)
        )
      )
    )
  )
  (9): SPPF(
    (cv1): Conv(
      (conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU(inplace=True)
    )
    (cv2): Conv(
      (conv): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU(inplace=True)
    )
    (m): MaxPool2d(kernel_size=5, stride=1, padding=2, dilation=1, ceil_mode=False)
  )
  (10): Conv(
    (conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
    (act): SiLU(inplace=True)
  )
  (11): Upsample(scale_factor=2.0, mode=nearest)
  (12): Concat()
  (13): C3(
    (cv1): Conv(
      (conv): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU(inplace=True)
    )
    (cv2): Conv(
      (conv): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU(inplace=True)
    )
    (cv3): Conv(
      (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU(inplace=True)
    )
    (m): Sequential(
      (0): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU(inplace=True)
        )
        (cv2): Conv(
          (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU(inplace=True)
        )
      )
    )
  )
  (14): Conv(
    (conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1))
    (act): SiLU(inplace=True)
  )
  (15): Upsample(scale_factor=2.0, mode=nearest)
  (16): Concat()
  (17): C3(
    (cv1): Conv(
      (conv): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU(inplace=True)
    )
    (cv2): Conv(
      (conv): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU(inplace=True)
    )
    (cv3): Conv(
      (conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU(inplace=True)
    )
    (m): Sequential(
      (0): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU(inplace=True)
        )
        (cv2): Conv(
          (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU(inplace=True)
        )
      )
    )
  )
  (18): Conv(
    (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (act): SiLU(inplace=True)
  )
  (19): Concat()
  (20): C3(
    (cv1): Conv(
      (conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU(inplace=True)
    )
    (cv2): Conv(
      (conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU(inplace=True)
    )
    (cv3): Conv(
      (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU(inplace=True)
    )
    (m): Sequential(
      (0): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU(inplace=True)
        )
        (cv2): Conv(
          (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU(inplace=True)
        )
      )
    )
  )
  (21): Conv(
    (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (act): SiLU(inplace=True)
  )
  (22): Concat()
  (23): C3(
    (cv1): Conv(
      (conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU(inplace=True)
    )
    (cv2): Conv(
      (conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU(inplace=True)
    )
    (cv3): Conv(
      (conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1))
      (act): SiLU(inplace=True)
    )
    (m): Sequential(
      (0): Bottleneck(
        (cv1): Conv(
          (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
          (act): SiLU(inplace=True)
        )
        (cv2): Conv(
          (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (act): SiLU(inplace=True)
        )
      )
    )
  )
  (24): Detect(
    (m): ModuleList(
      (0): Conv2d(128, 255, kernel_size=(1, 1), stride=(1, 1))
      (1): Conv2d(256, 255, kernel_size=(1, 1), stride=(1, 1))
      (2): Conv2d(512, 255, kernel_size=(1, 1), stride=(1, 1))
    )
  )
)