【YOLOv5学习笔记】夏侯南溪的学习笔记

songyuc

已于 2023-01-13 15:58:50 修改

阅读量3.6k

点赞数 5

分类专栏：《南溪的目标检测学习笔记》目标检测文章标签： YOLOv5

于 2021-01-21 18:09:59 首次发布

本文链接：https://blog.csdn.net/songyuc/article/details/112968864

版权

目标检测同时被 2 个专栏收录

62 篇文章 3 订阅

订阅专栏

《南溪的目标检测学习笔记》

28 篇文章 5 订阅

订阅专栏

1 学习资料

博文《YOLOv5网络详解_太阳花的小绿豆的博客-CSDN博客》：

样本匹配策略
4.3 消除Grid敏感度：对YOLOv4编码方式的修改
4.1 损失计算：损失函数的分析

“太阳花博文”中详细的模型结构图：
在这里插入图片描述

2 技术一览表

Technique	超参数
Epoch	300
Input size	640
Down sample	NearestSample (+`C3`)
Norm	BN
Act	SiLU
Loss
LR schedule	Warmup + CosineLR
Conf thresh	0.25
Weight smooth	EMA

3 模型结构分析

特征图大小一览表

Layer	OutSize	Channels	N $×0.25 \times0.25$	S $×0.5 \times0.5$
Image	$640\times640$	3	3	3
P1	$320\times320$	64	16	32
P2	$160\times160$	128	32	64
P3 (D1)	$80\times80$	256	64	128
P4 (D2)	$40\times40$	512	128	256
P5 (D3)	$20\times20$	1024	256	512

3.1 下采样

Focus：类似于超分反向操作的下采样

在YOLOv5r6_1中，Glenn Jocher使用 $6\times6$ 卷积替代了Focus进行下采样，他们在Github上提到了如下的源由：

新加入的 $6\times6$ 卷积可潜在地在理论上与Focus等价；
在某些计算卡（如V100/A100）上， $6\times6$ 卷积的性能更快；
替换的主要原因是： $6\times6$ 卷积相对Focus在exportability上更加便利；

这里我们引用博文《YOLOv5网络详解_太阳花的小绿豆的博客》中Focus的示意图，来帮助我们直观地理解Focus的含义：
在这里插入图片描述

3.2 Loss函数——DIoU-Loss

YOLOV5使用的loss函数与YOLOV4保持一致，为DIoU-Loss；
在学习DIoU-Loss之前，我们首先来回顾一下regression任务loss的发展历史：
Smooth L1 Loss (used by Faster RCNN, 2015) $\rightarrow$ IoU Loss(2016)
$\rightarrow$ GIoU Loss(2019) $\rightarrow$ DIoU Loss(2020) $\rightarrow$ CIoU Loss(2020)

4 代码说明

4.1 数据格式

数据标注使用yolo格式，也就是坐标是目标框相对于图像尺寸的归一化坐标；

数据集划分

数据集的设置可以使用列表的方式：
也就是在yolo_par_dir目录下建立.txt文件来表示train集和val集的文件列表；
在这里插入图片描述
其中，文件的路径可以使用相对路径，形式为

../dataset/image.jpg

4.2 文件结构说明

数据集文件结构

yolov5的数据集文件夹与yolov5的项目文件夹处于同一级目录，（也就是）代码文件的上一级目录；

4.3 配置文件说明

关于配置文件，请参考《YOLOv5从入门到部署之：配置与初始化超参》
这里以yolov5x.yaml为例来进行说明：

模型配置：

# Parameters
nc: 80  # number of classes
depth_multiple: 1.33  # model depth multiple
width_multiple: 1.25  # layer channel multiple
# 这里dm和wm参数决定了模型的大小，也是yolov5s与yolov5x等大型模型的主要不同
# 后面在计算卷积核的数量时，需要考虑width_multiple的大小；
anchors: # 特征图/上采样倍率
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32

Layer配置格式：[in, n, Module, args]，其中n为layer重复叠加的次数；
depth_multiple: 模型宽度系数，即控制layer深度的乘积因子，具体计算为dm*n；
width_multiple: 视觉层通道因子，控制输出通道数，即wn*out。

模块配置：

# YOLOv5 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Focus, [64, 3]],  # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
   [-1, 9, C3, [256]],
   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32
   [-1, 1, SPP, [1024, [5, 9, 13]]],
   [-1, 3, C3, [1024, False]],  # 9
  ]

# YOLOv5 head
head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]],  # cat backbone P4
   [-1, 3, C3, [512, False]],  # 13

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]],  # cat backbone P3,[-1, 4]代表ID为-1和4层作为输入
   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]],  # cat head P4
   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]],  # cat head P5
   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)

   [[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)
  ]

4.4 YOLOv5算子

`Conv`：卷积模块

配置说明：Conv, [channel, kernel, stride, padding] [SOURCE]

class Conv(nn.Module):
    # Standard convolution
    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True):  # ch_in, ch_out, kernel, stride, padding, groups
        super(Conv, self).__init__()
        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)
        # autopad(k, p)：padding，默认使用same-padding，在进行复现的时候要注意
        self.bn = nn.BatchNorm2d(c2)
        self.act = nn.SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())

    def forward(self, x):
        return self.act(self.bn(self.conv(x)))

    def fuseforward(self, x):
        return self.act(self.conv(x))

参数说明：

c1: 输入通道数。
c2: 输出通道数。

`C3`: CSP Bottleneck with 3 convolutions

配置说明: [input_idx, n_bottlenecks, C3, [out_channels, if_shortcut]] [SOURCE]

Note
第二顺位超参数n在这里是Bottleneck重复的次数 [code]

Parameters

c1 (int) – 输入通道数
c2 (int) – 输出通道数

`SPPF`: 多个`MaxPool2d`串联的SPP

SPPF: SPP Fast，是SPP的改进版本，具有更快的推理速度；
其结构如图所示（示意图来自《YOLOv5网络详解_太阳花的小绿豆的博客》）
在这里插入图片描述
在图中可以清晰地看到SPPF的串联结构；

`Detect`: 用来产生最后的预测值（包含解码操作）

配置说明: [[input1_idx, input12_idx, input3_idx], n, Detect, [nc, anchors]] [SOURCE]

5 数据载入

载入图像文件类LoadImages：

class LoadImages:  # for inference
    def __init__(self, path, img_size=640):
        p = str(Path(path))  # os-agnostic
        p = os.path.abspath(p)  # absolute path
        if '*' in p: # 如果给的是“path/*.jpg”
            files = sorted(glob.glob(p, recursive=True))  # glob
        elif os.path.isdir(p):
            files = sorted(glob.glob(os.path.join(p, '*.*')))  # dir
        elif os.path.isfile(p):
            files = [p]  # files
        else:
            raise Exception(f'ERROR: {p} does not exist')
		# img_formats表示支持的文件格式
        images = [x for x in files if x.split('.')[-1].lower() in img_formats]
        videos = [x for x in files if x.split('.')[-1].lower() in vid_formats]
        ni, nv = len(images), len(videos)

        self.img_size = img_size
        self.files = images + videos
        self.nf = ni + nv  # number of files
        self.video_flag = [False] * ni + [True] * nv
        self.mode = 'image'
        if any(videos):
            self.new_video(videos[0])  # new video
        else:
            self.cap = None
        assert self.nf > 0, f'No images or videos found in {p}. ' \
                            f'Supported formats are:\nimages: {img_formats}\nvideos: {vid_formats}'

    def __iter__(self):
        self.count = 0
        return self

    def __next__(self):
        if self.count == self.nf:
            raise StopIteration
        path = self.files[self.count]

        if self.video_flag[self.count]:
            # Read video
            self.mode = 'video'
            ret_val, img0 = self.cap.read()
            if not ret_val:
                self.count += 1
                self.cap.release()
                if self.count == self.nf:  # last video
                    raise StopIteration
                else:
                    path = self.files[self.count]
                    self.new_video(path)
                    ret_val, img0 = self.cap.read()

            self.frame += 1
            print(f'video {self.count + 1}/{self.nf} ({self.frame}/{self.nframes}) {path}: ', end='')

        else:
            # Read image
            self.count += 1
            img0 = cv2.imread(path)  # BGR
            assert img0 is not None, 'Image Not Found ' + path
            print(f'image {self.count}/{self.nf} {path}: ', end='')

        # Padded resize
        img = letterbox(img0, new_shape=self.img_size)[0]

        # Convert
        img = img[:, :, ::-1].transpose(2, 0, 1)  # BGR to RGB, to 3x416x416
        img = np.ascontiguousarray(img)

        return path, img, img0, self.cap

    def new_video(self, path):
        self.frame = 0
        self.cap = cv2.VideoCapture(path)
        self.nframes = int(self.cap.get(cv2.CAP_PROP_FRAME_COUNT))

    def __len__(self):
        return self.nf  # number of files

2.1 数据预处理

检测img_size是否是步长s的倍数：

def check_img_size(img_size, s=32):
    # Verify img_size is a multiple(倍数) of stride s
    new_size = make_divisible(img_size, int(s))  # ceil gs-multiple
    # 获得最接近img_size的s最小倍数
    if new_size != img_size:
        print('WARNING: --img-size %g must be multiple of max stride %g, updating to %g' % (img_size, s, new_size))
        # %g表示浮点数
    return new_size

letterbox预处理：

def letterbox(img, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True):
    # Resize image to a 32-pixel-multiple rectangle https://github.com/ultralytics/yolov3/issues/232
    # 获得当前图像的尺寸
    shape = img.shape[:2]  # current shape [height, width]
    if isinstance(new_shape, int):
        new_shape = (new_shape, new_shape)
        # new_shape是传入的参数，即：
        # img = letterbox(img0, new_shape=self.img_size)[0]

    # Scale ratio (new / old)
    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
    if not scaleup:  # only scale down, do not scale up (for better test mAP)
        r = min(r, 1.0)

    # Compute padding
    ratio = r, r  # width, height ratios
    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding
    if auto:  # minimum rectangle
        dw, dh = np.mod(dw, 32), np.mod(dh, 32)  # wh padding
    elif scaleFill:  # stretch
        dw, dh = 0.0, 0.0
        new_unpad = (new_shape[1], new_shape[0])
        ratio = new_shape[1] / shape[1], new_shape[0] / shape[0]  # width, height ratios

    dw /= 2  # divide padding into 2 sides
    dh /= 2

    if shape[::-1] != new_unpad:  # resize
        img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border
    return img, ratio, (dw, dh)

6 模型加载

加载模型的函数attempt_load()

def attempt_load(weights, map_location=None):
    # Loads an ensemble of models weights=[a,b,c] or a single model weights=[a] or weights=a
    # 也就是多模型融合的意思
    model = Ensemble()
    for w in weights if isinstance(weights, list) else [weights]:
    	# 尝试下载模型
        attempt_download(w)
        # 载入多个模型
        model.append(torch.load(w, map_location=map_location)['model'].float().fuse().eval())  # load FP32 model
        # torch.load()函数用来加载模型，与nn.Module.load_state_dict()的效果不同

    # Compatibility updates
    for m in model.modules():
        if type(m) in [nn.Hardswish, nn.LeakyReLU, nn.ReLU, nn.ReLU6, nn.SiLU]:
            m.inplace = True  # pytorch 1.7.0 compatibility
        elif type(m) is Conv:
            m._non_persistent_buffers_set = set()  # pytorch 1.6.0 compatibility

    if len(model) == 1:
        return model[-1]  # return model
    else:
        print('Ensemble created with %s\n' % weights)
        for k in ['names', 'stride']:
            setattr(model, k, getattr(model[-1], k))
        return model  # return ensemble

7 Inference——前向推理

使用代码：
python detect.py
命令行参数：

--weights：权重文件的路径；
--source：图像来源；
--exist-ok：（未使用）允许run记录文件夹同名；

主函数部分：

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--weights', nargs='+', type=str, default='yolov5s.pt', help='model.pt path(s)')
    # 指定权重文件；
    # nargs='+'：表示至少读一个值
    parser.add_argument('--source', type=str, default='data/images', help='source')  		   
    # file/folder, 0 for webcam
    # 检测的来源
    parser.add_argument('--img-size', type=int, default=640, help='inference size (pixels)')
    parser.add_argument('--conf-thres', type=float, default=0.25, help='object confidence threshold')
    parser.add_argument('--iou-thres', type=float, default=0.45, help='IOU threshold for NMS')
    parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
    parser.add_argument('--view-img', action='store_true', help='display results')
    parser.add_argument('--save-txt', action='store_true', help='save results to *.txt')
    parser.add_argument('--save-conf', action='store_true', help='save confidences in --save-txt labels')
    parser.add_argument('--classes', nargs='+', type=int, help='filter by class: --class 0, or --class 0 2 3')
    parser.add_argument('--agnostic-nms', action='store_true', help='class-agnostic NMS')
    parser.add_argument('--augment', action='store_true', help='augmented inference')
    parser.add_argument('--update', action='store_true', help='update all models')
    parser.add_argument('--project', default='runs/detect', help='save results to project/name')
    parser.add_argument('--name', default='exp', help='save results to project/name')
    parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
    opt = parser.parse_args()
    print(opt)
    check_requirements()

    with torch.no_grad():
        if opt.update:  # update all models (to fix SourceChangeWarning)
            for opt.weights in ['yolov5s.pt', 'yolov5m.pt', 'yolov5l.pt', 'yolov5x.pt']:
                detect()
                strip_optimizer(opt.weights)
        else:
            detect()