【YOLOv5学习笔记】夏侯南溪的学习笔记

1 学习资料

博文《YOLOv5网络详解_太阳花的小绿豆的博客-CSDN博客》

  • 样本匹配策略
  • 4.3 消除Grid敏感度:对YOLOv4编码方式的修改
  • 4.1 损失计算:损失函数的分析

“太阳花博文”中详细的模型结构图:
在这里插入图片描述

2 技术一览表

Technique超参数
Epoch300
Input size640
Down sampleNearestSample (+C3)
NormBN
ActSiLU
Loss
LR scheduleWarmup + CosineLR
Conf thresh0.25
Weight smoothEMA

3 模型结构分析

特征图大小一览表

LayerOutSizeChannelsN × 0.25 \times0.25 ×0.25S × 0.5 \times0.5 ×0.5
Image 640 × 640 640\times640 640×640333
P1 320 × 320 320\times320 320×320641632
P2 160 × 160 160\times160 160×1601283264
P3 (D1) 80 × 80 80\times80 80×8025664128
P4 (D2) 40 × 40 40\times40 40×40512128256
P5 (D3) 20 × 20 20\times20 20×201024256512

3.1 下采样

Focus:类似于超分反向操作的下采样

在YOLOv5r6_1中,Glenn Jocher使用 6 × 6 6\times6 6×6卷积替代了Focus进行下采样,他们在Github上提到了如下的源由:

这里我们引用博文《YOLOv5网络详解_太阳花的小绿豆的博客》中Focus的示意图,来帮助我们直观地理解Focus的含义:
在这里插入图片描述

3.2 Loss函数——DIoU-Loss

YOLOV5使用的loss函数与YOLOV4保持一致,为DIoU-Loss;
在学习DIoU-Loss之前,我们首先来回顾一下regression任务loss的发展历史:
Smooth L1 Loss (used by Faster RCNN, 2015) → \rightarrow IoU Loss(2016)
→ \rightarrow GIoU Loss(2019) → \rightarrow DIoU Loss(2020) → \rightarrow CIoU Loss(2020)

4 代码说明

4.1 数据格式

数据标注使用yolo格式,也就是坐标是目标框相对于图像尺寸的归一化坐标;

数据集划分

数据集的设置可以使用列表的方式:
也就是在yolo_par_dir目录下建立.txt文件来表示train集和val集的文件列表;
在这里插入图片描述
其中,文件的路径可以使用相对路径,形式为

../dataset/image.jpg

4.2 文件结构说明

数据集文件结构

yolov5的数据集文件夹与yolov5的项目文件夹处于同一级目录,(也就是)代码文件的上一级目录;

4.3 配置文件说明

关于配置文件,请参考《YOLOv5从入门到部署之:配置与初始化超参》
这里以yolov5x.yaml为例来进行说明:

模型配置:

# Parameters
nc: 80  # number of classes
depth_multiple: 1.33  # model depth multiple
width_multiple: 1.25  # layer channel multiple
# 这里dm和wm参数决定了模型的大小,也是yolov5s与yolov5x等大型模型的主要不同
# 后面在计算卷积核的数量时,需要考虑width_multiple的大小;
anchors: # 特征图/上采样倍率
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32
  • Layer配置格式:[in, n, Module, args],其中n为layer重复叠加的次数;
  • depth_multiple: 模型宽度系数,即控制layer深度的乘积因子,具体计算为dm*n
  • width_multiple: 视觉层通道因子,控制输出通道数,即wn*out

模块配置:

# YOLOv5 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Focus, [64, 3]],  # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
   [-1, 9, C3, [256]],
   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32
   [-1, 1, SPP, [1024, [5, 9, 13]]],
   [-1, 3, C3, [1024, False]],  # 9
  ]

# YOLOv5 head
head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]],  # cat backbone P4
   [-1, 3, C3, [512, False]],  # 13

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]],  # cat backbone P3,[-1, 4]代表ID为-1和4层作为输入
   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]],  # cat head P4
   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]],  # cat head P5
   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)

   [[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)
  ]

4.4 YOLOv5算子

Conv:卷积模块

配置说明:Conv, [channel, kernel, stride, padding] [SOURCE]

class Conv(nn.Module):
    # Standard convolution
    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True):  # ch_in, ch_out, kernel, stride, padding, groups
        super(Conv, self).__init__()
        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)
        # autopad(k, p):padding,默认使用same-padding,在进行复现的时候要注意
        self.bn = nn.BatchNorm2d(c2)
        self.act = nn.SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())

    def forward(self, x):
        return self.act(self.bn(self.conv(x)))

    def fuseforward(self, x):
        return self.act(self.conv(x))

参数说明:

  • c1: 输入通道数。
  • c2: 输出通道数。

C3: CSP Bottleneck with 3 convolutions

配置说明: [input_idx, n_bottlenecks, C3, [out_channels, if_shortcut]] [SOURCE]

Note
第二顺位超参数n在这里是Bottleneck重复的次数 [code]

Parameters
  • c1 (int) – 输入通道数
  • c2 (int) – 输出通道数

SPPF: 多个MaxPool2d串联的SPP

SPPF: SPP Fast,是SPP的改进版本,具有更快的推理速度;
其结构如图所示(示意图来自《YOLOv5网络详解_太阳花的小绿豆的博客》
在这里插入图片描述
在图中可以清晰地看到SPPF的串联结构;

Detect: 用来产生最后的预测值(包含解码操作)

配置说明: [[input1_idx, input12_idx, input3_idx], n, Detect, [nc, anchors]] [SOURCE]

5 数据载入

载入图像文件类LoadImages

class LoadImages:  # for inference
    def __init__(self, path, img_size=640):
        p = str(Path(path))  # os-agnostic
        p = os.path.abspath(p)  # absolute path
        if '*' in p: # 如果给的是“path/*.jpg”
            files = sorted(glob.glob(p, recursive=True))  # glob
        elif os.path.isdir(p):
            files = sorted(glob.glob(os.path.join(p, '*.*')))  # dir
        elif os.path.isfile(p):
            files = [p]  # files
        else:
            raise Exception(f'ERROR: {p} does not exist')
		# img_formats表示支持的文件格式
        images = [x for x in files if x.split('.')[-1].lower() in img_formats]
        videos = [x for x in files if x.split('.')[-1].lower() in vid_formats]
        ni, nv = len(images), len(videos)

        self.img_size = img_size
        self.files = images + videos
        self.nf = ni + nv  # number of files
        self.video_flag = [False] * ni + [True] * nv
        self.mode = 'image'
        if any(videos):
            self.new_video(videos[0])  # new video
        else:
            self.cap = None
        assert self.nf > 0, f'No images or videos found in {p}. ' \
                            f'Supported formats are:\nimages: {img_formats}\nvideos: {vid_formats}'

    def __iter__(self):
        self.count = 0
        return self

    def __next__(self):
        if self.count == self.nf:
            raise StopIteration
        path = self.files[self.count]

        if self.video_flag[self.count]:
            # Read video
            self.mode = 'video'
            ret_val, img0 = self.cap.read()
            if not ret_val:
                self.count += 1
                self.cap.release()
                if self.count == self.nf:  # last video
                    raise StopIteration
                else:
                    path = self.files[self.count]
                    self.new_video(path)
                    ret_val, img0 = self.cap.read()

            self.frame += 1
            print(f'video {self.count + 1}/{self.nf} ({self.frame}/{self.nframes}) {path}: ', end='')

        else:
            # Read image
            self.count += 1
            img0 = cv2.imread(path)  # BGR
            assert img0 is not None, 'Image Not Found ' + path
            print(f'image {self.count}/{self.nf} {path}: ', end='')

        # Padded resize
        img = letterbox(img0, new_shape=self.img_size)[0]

        # Convert
        img = img[:, :, ::-1].transpose(2, 0, 1)  # BGR to RGB, to 3x416x416
        img = np.ascontiguousarray(img)

        return path, img, img0, self.cap

    def new_video(self, path):
        self.frame = 0
        self.cap = cv2.VideoCapture(path)
        self.nframes = int(self.cap.get(cv2.CAP_PROP_FRAME_COUNT))

    def __len__(self):
        return self.nf  # number of files

2.1 数据预处理

检测img_size是否是步长s的倍数:

def check_img_size(img_size, s=32):
    # Verify img_size is a multiple(倍数) of stride s
    new_size = make_divisible(img_size, int(s))  # ceil gs-multiple
    # 获得最接近img_size的s最小倍数
    if new_size != img_size:
        print('WARNING: --img-size %g must be multiple of max stride %g, updating to %g' % (img_size, s, new_size))
        # %g表示浮点数
    return new_size

letterbox预处理:

def letterbox(img, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True):
    # Resize image to a 32-pixel-multiple rectangle https://github.com/ultralytics/yolov3/issues/232
    # 获得当前图像的尺寸
    shape = img.shape[:2]  # current shape [height, width]
    if isinstance(new_shape, int):
        new_shape = (new_shape, new_shape)
        # new_shape是传入的参数,即:
        # img = letterbox(img0, new_shape=self.img_size)[0]

    # Scale ratio (new / old)
    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
    if not scaleup:  # only scale down, do not scale up (for better test mAP)
        r = min(r, 1.0)

    # Compute padding
    ratio = r, r  # width, height ratios
    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding
    if auto:  # minimum rectangle
        dw, dh = np.mod(dw, 32), np.mod(dh, 32)  # wh padding
    elif scaleFill:  # stretch
        dw, dh = 0.0, 0.0
        new_unpad = (new_shape[1], new_shape[0])
        ratio = new_shape[1] / shape[1], new_shape[0] / shape[0]  # width, height ratios

    dw /= 2  # divide padding into 2 sides
    dh /= 2

    if shape[::-1] != new_unpad:  # resize
        img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border
    return img, ratio, (dw, dh)

6 模型加载

加载模型的函数attempt_load()

def attempt_load(weights, map_location=None):
    # Loads an ensemble of models weights=[a,b,c] or a single model weights=[a] or weights=a
    # 也就是多模型融合的意思
    model = Ensemble()
    for w in weights if isinstance(weights, list) else [weights]:
    	# 尝试下载模型
        attempt_download(w)
        # 载入多个模型
        model.append(torch.load(w, map_location=map_location)['model'].float().fuse().eval())  # load FP32 model
        # torch.load()函数用来加载模型,与nn.Module.load_state_dict()的效果不同

    # Compatibility updates
    for m in model.modules():
        if type(m) in [nn.Hardswish, nn.LeakyReLU, nn.ReLU, nn.ReLU6, nn.SiLU]:
            m.inplace = True  # pytorch 1.7.0 compatibility
        elif type(m) is Conv:
            m._non_persistent_buffers_set = set()  # pytorch 1.6.0 compatibility

    if len(model) == 1:
        return model[-1]  # return model
    else:
        print('Ensemble created with %s\n' % weights)
        for k in ['names', 'stride']:
            setattr(model, k, getattr(model[-1], k))
        return model  # return ensemble

7 Inference——前向推理

使用代码:
python detect.py
命令行参数:

  • --weights:权重文件的路径;
  • --source:图像来源;
  • --exist-ok(未使用)允许run记录文件夹同名;

主函数部分:

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--weights', nargs='+', type=str, default='yolov5s.pt', help='model.pt path(s)')
    # 指定权重文件;
    # nargs='+':表示至少读一个值
    parser.add_argument('--source', type=str, default='data/images', help='source')  		   
    # file/folder, 0 for webcam
    # 检测的来源
    parser.add_argument('--img-size', type=int, default=640, help='inference size (pixels)')
    parser.add_argument('--conf-thres', type=float, default=0.25, help='object confidence threshold')
    parser.add_argument('--iou-thres', type=float, default=0.45, help='IOU threshold for NMS')
    parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
    parser.add_argument('--view-img', action='store_true', help='display results')
    parser.add_argument('--save-txt', action='store_true', help='save results to *.txt')
    parser.add_argument('--save-conf', action='store_true', help='save confidences in --save-txt labels')
    parser.add_argument('--classes', nargs='+', type=int, help='filter by class: --class 0, or --class 0 2 3')
    parser.add_argument('--agnostic-nms', action='store_true', help='class-agnostic NMS')
    parser.add_argument('--augment', action='store_true', help='augmented inference')
    parser.add_argument('--update', action='store_true', help='update all models')
    parser.add_argument('--project', default='runs/detect', help='save results to project/name')
    parser.add_argument('--name', default='exp', help='save results to project/name')
    parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
    opt = parser.parse_args()
    print(opt)
    check_requirements()

    with torch.no_grad():
        if opt.update:  # update all models (to fix SourceChangeWarning)
            for opt.weights in ['yolov5s.pt', 'yolov5m.pt', 'yolov5l.pt', 'yolov5x.pt']:
                detect()
                strip_optimizer(opt.weights)
        else:
            detect()

检测detect()函数:《【YOLOv5学习笔记】——detect.py.detect()》

8 Train——模型训练

代码学习笔记: [Notes of train.py]
命令行参数:

  • --weights:初始化的权重文件路径;
  • --cfg:模型的配置文件;
  • --data:数据集的配置信息;
  • 5
    点赞
  • 33
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值