yolo.py文件解读

最新推荐文章于 2024-07-25 11:07:00 发布

a3188045002

最新推荐文章于 2024-07-25 11:07:00 发布

阅读量285

点赞数 5

文章标签： YOLO

本文链接：https://blog.csdn.net/a3188045002/article/details/138801806

版权

今天我们来解读yolo.py文件，这个文件是用来搭建Yolo的网络模型。它会根据你配置的yaml文件来搭建网络模型，如果你想对Yolov5的模型做出改进，那么你需要对这个文件里的模块有一定的了解。

一，parse_model模块

首先这个模块的功能是从字典里获取网络模型相关的信息，然后构建网络。接下来我会对这段代码做出详细的讲解。

def parse_model(d, ch):
    LOGGER.info(f"\n{'':>3}{'from':>18}{'n':>3}{'params':>10}  {'module':<40}{'arguments':<30}")
    # 使用日志记录器对象记录一条带有特定格式的信息
    anchors, nc, gd, gw, act, ch_mul = (
        d["anchors"],
        d["nc"],
        d["depth_multiple"],
        d["width_multiple"],
        d.get("activation"),
        d.get("channel_multiple"),
    )
    """
        从字典里获取anchors和parameters的信息
        anchors：锚框，用于检测目标的预定义框。
        nc：类别数量，表示模型需要检测的不同类别的数量。
        depth_multiple：深度倍增因子，用于调整模型的深度。
        width_multiple：宽度倍增因子，用于调整模型的宽度。
        activation：激活函数，用于网络层的激活。
        channel_multiple：通道倍增因子，用于调整模型的通道数。
    """
    if act:
        Conv.default_act = eval(act) 
        LOGGER.info(f"{colorstr('activation:')} {act}") 
    # 使用act作为默认的激活函数
    if not ch_mul:
        ch_mul = 8
    na = (len(anchors[0]) // 2) if isinstance(anchors, list) else anchors
    # 计算锚框的数量，如果anchors是一个列表，则锚框的数量等于列表中第一个锚框的一半的长度（因为锚框是由两个坐标组成的）
    no = na * (nc + 5)
    # 计算输出通道数。每个锚框对应的输出通道数等于类别数量加上5（其中4个是坐标，1个是置信度），然后乘以锚框数量。

    layers, save, c2 = [], [], ch[-1]
    # 三个空列表存储模型的层结构、保存列表以及输出通道数（由输入参数ch的最后一个元素确定）

以上只是对构建网络模型的一些前置处理，包括激活函数，锚框的数量和构建三个空列表存储模型的层结构、保存列表以及输出通道数。

    for i, (f, n, m, args) in enumerate(d["backbone"] + d["head"]):
        # 遍历模型的骨干网络和头部网络。
        m = eval(m) if isinstance(m, str) else m
        # 将模块名从字符串转换为对应的Python对象
        for j, a in enumerate(args):
            with contextlib.suppress(NameError):
                args[j] = eval(a) if isinstance(a, str) else a
                # 将参数从字符串转换为对应的Python对象

        n = n_ = max(round(n * gd), 1) if n > 1 else n  # 控制深度，这里是n*0.33

为了方便理解parse_model模块是怎么将yaml文件转化为网络模型，这里我将yaml文件放出来，方便大家理解。

# Parameters
nc: 80 # number of classes
depth_multiple: 0.33 # model depth multiple
width_multiple: 0.25 # layer channel multiple
anchors:
  - [10, 13, 16, 30, 33, 23] # P3/8
  - [30, 61, 62, 45, 59, 119] # P4/16
  - [116, 90, 156, 198, 373, 326] # P5/32

# YOLOv5 v6.0 backbone
backbone:
  # [from, number, module, args]
  [
    [-1, 1, Conv, [64, 6, 2, 2]], # 0-P1/2
    [-1, 1, Conv, [128, 3, 2]], # 1-P2/4
    [-1, 3, C3, [128]],
    [-1, 1, Conv, [256, 3, 2]], # 3-P3/8
    [-1, 6, C3, [256]],
    [-1, 1, Conv, [512, 3, 2]], # 5-P4/16
    [-1, 9, C3, [512]],
    [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
    [-1, 3, C3, [1024]],
    [-1, 1, SPPF, [1024, 5]], # 9
  ]

# YOLOv5 v6.0 head
head: [
    [-1, 1, Conv, [512, 1, 1]],
    [-1, 1, nn.Upsample, [None, 2, "nearest"]],
    [[-1, 6], 1, Concat, [1]], # cat backbone P4
    [-1, 3, C3, [512, False]], # 13

    [-1, 1, Conv, [256, 1, 1]],
    [-1, 1, nn.Upsample, [None, 2, "nearest"]],
    [[-1, 4], 1, Concat, [1]], # cat backbone P3
    [-1, 3, C3, [256, False]], # 17 (P3/8-small)

    [-1, 1, Conv, [256, 3, 2]],
    [[-1, 14], 1, Concat, [1]], # cat head P4
    [-1, 3, C3, [512, False]], # 20 (P4/16-medium)

    [-1, 1, Conv, [512, 3, 2]],
    [[-1, 10], 1, Concat, [1]], # cat head P5
    [-1, 3, C3, [1024, False]], # 23 (P5/32-large)

    [[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
  ]

接下来是通过m（模型）和args（参数）来构建网络。

if m in {Conv,GhostConv,Bottleneck,GhostBottleneck,SPP,
         SPPF,DWConv,MixConv2d,Focus,CrossConv,BottleneckCSP,
         C3,C3TR,C3SPP,C3Ghost,nn.ConvTranspose2d,DWConvTranspose2d,C3x,}:

如果模块名字是属于以上的内容，采取以下的方式来构建网络：

            c1, c2 = ch[f], args[0]
            # c1为当前层的输入层，c2为当前层的输出层
            if c2 != no:
                c2 = make_divisible(c2 * gw, ch_mul)
            # 如果c2不等于no（总输出通道），c2等于c2 * gw/ch_mul并取最接近的可整除的整数

            args = [c1, c2, *args[1:]]
            # c1，c2和args除了第一元素外的所有元素放入新的args里面
            if m in {BottleneckCSP, C3, C3TR, C3Ghost, C3x}:
                args.insert(2, n)
                # 第三个位置（索引为2的位置）插入一个值n，表示重复次数
                n = 1

这里我在yaml里面随机选一个C3模块来解释，例如[-1, 3, C3, [128]]这行来解释，f=-1，n=3，m=C3，args=[128]，所以这里的c1（输入层）和c2（输出层）为ch[-1](上一层的通道数)和args[0]（128），然后这部分会接着判断这是不是最后的输出通道宽度，如果不是则对其宽度进行控制。然后，args进行跟新，变为[c1, 128], 因为这里的args=[128],*args[1:]为空。最后再加入重复数n=3，最后的args结果为[c1, 128, 3]。接下来的部分和上面差不多，就不一一详述了。

        elif m is nn.BatchNorm2d:
            # BN层只需要返回上一层的输出channel
            args = [ch[f]]
        elif m is Concat:
            # Concat层则将f中所有的输出累加得到这层的输出channel
            c2 = sum(ch[x] for x in f)
        elif m in {Detect, Segment}:
            args.append([ch[x] for x in f])
            # 将f中的所有层添加到args里面
            if isinstance(args[1], int):  # number of anchors
                args[1] = [list(range(args[1] * 2))] * len(f)
            if m is Segment:
                args[3] = make_divisible(args[3] * gw, ch_mul)
        elif m is Contract:
            c2 = ch[f] * args[0] ** 2
        elif m is Expand:
            c2 = ch[f] // args[0] ** 2
        else:
            c2 = ch[f]

接下来就是，根据处理好的args来构建当前层的model及一些其他操作。

        # m_: 得到当前层module  如果n>1就创建多个m(当前层结构), 如果n=1就创建一个m
        m_ = nn.Sequential(*[m(*args) for _ in range(n)]) if n > 1 else m(*args)

        # 打印当前层结构的一些基本信息
        t = str(m)[8:-2].replace('__main__.', '')  # t = module type           
        'modules.common.Focus'
        np = sum([x.numel() for x in m_.parameters()])  # number params  计算这一层的参数量
        m_.i, m_.f, m_.type, m_.np = i, f, t, np  # index, 'from' index, number, type, 
        number params
        logger.info('%3s%18s%3s%10.0f  %-40s%-30s' % (i, f, n, np, t, args))  # print

        # append to savelist  把所有层结构中from不是-1的值记下  [6, 4, 14, 10, 17, 20, 23]
        save.extend(x % i for x in ([f] if isinstance(f, int) else f) if x != -1)

        # 将当前层结构module加入layers中
        layers.append(m_)

        if i == 0:
            ch = []  # 去除输入channel [3]

        # 把当前层的输出channel数加入ch
        ch.append(c2)

    return nn.Sequential(*layers), sorted(save)

以上的部分就是对parse_model模块的解读。

二，detect模块

接下来介绍detect模块，这个部分的内容会在yaml文件head的最后一层用到，也是比较重要的部分。

class Detect(nn.Module):
    # 将输入的特征图转化成我们现需要的shape
    stride = None  # strides computed during build
    dynamic = False  # force grid reconstruction
    export = False  # export mode

    def __init__(self, nc=80, anchors=(), ch=(), inplace=True):
        """Initializes YOLOv5 detection layer with specified classes, anchors, channels, and inplace operations."""
        super().__init__()
        self.nc = nc  # 类别数目
        self.no = nc + 5  # 每个锚框对应的输出通道数等于类别数量加上5（其中4个是坐标，1个是置信度）
        self.nl = len(anchors)  # 计算检测到的数量
        self.na = len(anchors[0]) // 2  # 计算锚框的数量
        self.grid = [torch.empty(0) for _ in range(self.nl)]  # init grid
        self.anchor_grid = [torch.empty(0) for _ in range(self.nl)]  # init anchor grid
        self.register_buffer("anchors", torch.tensor(anchors).float().view(self.nl, -1, 2))
        #  将锚框列表转换为张量，并将其注册为模型的缓冲区（buffer）。这个缓冲区存储了锚框的张量表示，形状为 (nl, na, 2)。
        self.m = nn.ModuleList(nn.Conv2d(x, self.no * self.na, 1) for x in ch)  
        #  创建了一个包含了多个 1x1 卷积层的模块列表 self.m，其中 x 是 ch 中的每个元素，self.no * self.na 是每个卷积层的输出通道数，1 表示卷积核的大小。
        self.inplace = inplace  # use inplace ops (e.g. slice assignment)

以上就是detect的搭建部分了。