1 学习资料
博文《YOLOv5网络详解_太阳花的小绿豆的博客-CSDN博客》:
- 样本匹配策略
- 4.3 消除Grid敏感度:对YOLOv4编码方式的修改
- 4.1 损失计算:损失函数的分析
“太阳花博文”中详细的模型结构图:
2 技术一览表
Technique | 超参数 |
---|---|
Epoch | 300 |
Input size | 640 |
Down sample | NearestSample (+C3 ) |
Norm | BN |
Act | SiLU |
Loss | |
LR schedule | Warmup + CosineLR |
Conf thresh | 0.25 |
Weight smooth | EMA |
3 模型结构分析
特征图大小一览表
Layer | OutSize | Channels | N × 0.25 \times0.25 ×0.25 | S × 0.5 \times0.5 ×0.5 |
---|---|---|---|---|
Image | 640 × 640 640\times640 640×640 | 3 | 3 | 3 |
P1 | 320 × 320 320\times320 320×320 | 64 | 16 | 32 |
P2 | 160 × 160 160\times160 160×160 | 128 | 32 | 64 |
P3 (D1) | 80 × 80 80\times80 80×80 | 256 | 64 | 128 |
P4 (D2) | 40 × 40 40\times40 40×40 | 512 | 128 | 256 |
P5 (D3) | 20 × 20 20\times20 20×20 | 1024 | 256 | 512 |
3.1 下采样
Focus:类似于超分反向操作的下采样
在YOLOv5r6_1中,Glenn Jocher使用 6 × 6 6\times6 6×6卷积替代了Focus进行下采样,他们在Github上提到了如下的源由:
- 新加入的
6
×
6
6\times6
6×6卷积可潜在地在理论上与
Focus
等价; - 在某些计算卡(如V100/A100)上, 6 × 6 6\times6 6×6卷积的性能更快;
- 替换的主要原因是:
6
×
6
6\times6
6×6卷积相对
Focus
在exportability上更加便利;
这里我们引用博文《YOLOv5网络详解_太阳花的小绿豆的博客》中Focus的示意图,来帮助我们直观地理解Focus
的含义:
3.2 Loss函数——DIoU-Loss
YOLOV5使用的loss函数与YOLOV4保持一致,为DIoU-Loss;
在学习DIoU-Loss之前,我们首先来回顾一下regression任务loss的发展历史:
Smooth L1 Loss (used by Faster RCNN, 2015)
→
\rightarrow
→ IoU Loss(2016)
→
\rightarrow
→ GIoU Loss(2019)
→
\rightarrow
→ DIoU Loss(2020)
→
\rightarrow
→ CIoU Loss(2020)
4 代码说明
4.1 数据格式
数据标注使用yolo格式,也就是坐标是目标框相对于图像尺寸的归一化坐标;
数据集划分
数据集的设置可以使用列表的方式:
也就是在yolo_par_dir目录下建立.txt文件来表示train集和val集的文件列表;
其中,文件的路径可以使用相对路径,形式为
../dataset/image.jpg
4.2 文件结构说明
数据集文件结构
yolov5的数据集文件夹与yolov5的项目文件夹处于同一级目录,(也就是)代码文件的上一级目录;
4.3 配置文件说明
关于配置文件,请参考《YOLOv5从入门到部署之:配置与初始化超参》
这里以yolov5x.yaml为例来进行说明:
模型配置:
# Parameters
nc: 80 # number of classes
depth_multiple: 1.33 # model depth multiple
width_multiple: 1.25 # layer channel multiple
# 这里dm和wm参数决定了模型的大小,也是yolov5s与yolov5x等大型模型的主要不同
# 后面在计算卷积核的数量时,需要考虑width_multiple的大小;
anchors: # 特征图/上采样倍率
- [10,13, 16,30, 33,23] # P3/8
- [30,61, 62,45, 59,119] # P4/16
- [116,90, 156,198, 373,326] # P5/32
- Layer配置格式:
[in, n, Module, args]
,其中n
为layer重复叠加的次数; depth_multiple
: 模型宽度系数,即控制layer深度的乘积因子,具体计算为dm*n
;width_multiple
: 视觉层通道因子,控制输出通道数,即wn*out
。
模块配置:
# YOLOv5 backbone
backbone:
# [from, number, module, args]
[[-1, 1, Focus, [64, 3]], # 0-P1/2
[-1, 1, Conv, [128, 3, 2]], # 1-P2/4
[-1, 3, C3, [128]],
[-1, 1, Conv, [256, 3, 2]], # 3-P3/8
[-1, 9, C3, [256]],
[-1, 1, Conv, [512, 3, 2]], # 5-P4/16
[-1, 9, C3, [512]],
[-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
[-1, 1, SPP, [1024, [5, 9, 13]]],
[-1, 3, C3, [1024, False]], # 9
]
# YOLOv5 head
head:
[[-1, 1, Conv, [512, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 6], 1, Concat, [1]], # cat backbone P4
[-1, 3, C3, [512, False]], # 13
[-1, 1, Conv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 4], 1, Concat, [1]], # cat backbone P3,[-1, 4]代表ID为-1和4层作为输入
[-1, 3, C3, [256, False]], # 17 (P3/8-small)
[-1, 1, Conv, [256, 3, 2]],
[[-1, 14], 1, Concat, [1]], # cat head P4
[-1, 3, C3, [512, False]], # 20 (P4/16-medium)
[-1, 1, Conv, [512, 3, 2]],
[[-1, 10], 1, Concat, [1]], # cat head P5
[-1, 3, C3, [1024, False]], # 23 (P5/32-large)
[[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
]
4.4 YOLOv5算子
Conv
:卷积模块
配置说明:Conv, [channel, kernel, stride, padding]
[SOURCE]
class Conv(nn.Module):
# Standard convolution
def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True): # ch_in, ch_out, kernel, stride, padding, groups
super(Conv, self).__init__()
self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)
# autopad(k, p):padding,默认使用same-padding,在进行复现的时候要注意
self.bn = nn.BatchNorm2d(c2)
self.act = nn.SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
def forward(self, x):
return self.act(self.bn(self.conv(x)))
def fuseforward(self, x):
return self.act(self.conv(x))
参数说明:
c1
: 输入通道数。c2
: 输出通道数。
C3
: CSP Bottleneck with 3 convolutions
配置说明: [input_idx, n_bottlenecks, C3, [out_channels, if_shortcut]]
[SOURCE]
Note
第二顺位超参数n
在这里是Bottleneck重复的次数 [code]
Parameters
- c1 (
int
) – 输入通道数 - c2 (
int
) – 输出通道数
SPPF
: 多个MaxPool2d
串联的SPP
SPPF: SPP Fast,是SPP的改进版本,具有更快的推理速度;
其结构如图所示(示意图来自《YOLOv5网络详解_太阳花的小绿豆的博客》)
在图中可以清晰地看到SPPF的串联结构;
Detect
: 用来产生最后的预测值(包含解码操作)
配置说明: [[input1_idx, input12_idx, input3_idx], n, Detect, [nc, anchors]]
[SOURCE]
5 数据载入
载入图像文件类LoadImages
:
class LoadImages: # for inference
def __init__(self, path, img_size=640):
p = str(Path(path)) # os-agnostic
p = os.path.abspath(p) # absolute path
if '*' in p: # 如果给的是“path/*.jpg”
files = sorted(glob.glob(p, recursive=True)) # glob
elif os.path.isdir(p):
files = sorted(glob.glob(os.path.join(p, '*.*'))) # dir
elif os.path.isfile(p):
files = [p] # files
else:
raise Exception(f'ERROR: {p} does not exist')
# img_formats表示支持的文件格式
images = [x for x in files if x.split('.')[-1].lower() in img_formats]
videos = [x for x in files if x.split('.')[-1].lower() in vid_formats]
ni, nv = len(images), len(videos)
self.img_size = img_size
self.files = images + videos
self.nf = ni + nv # number of files
self.video_flag = [False] * ni + [True] * nv
self.mode = 'image'
if any(videos):
self.new_video(videos[0]) # new video
else:
self.cap = None
assert self.nf > 0, f'No images or videos found in {p}. ' \
f'Supported formats are:\nimages: {img_formats}\nvideos: {vid_formats}'
def __iter__(self):
self.count = 0
return self
def __next__(self):
if self.count == self.nf:
raise StopIteration
path = self.files[self.count]
if self.video_flag[self.count]:
# Read video
self.mode = 'video'
ret_val, img0 = self.cap.read()
if not ret_val:
self.count += 1
self.cap.release()
if self.count == self.nf: # last video
raise StopIteration
else:
path = self.files[self.count]
self.new_video(path)
ret_val, img0 = self.cap.read()
self.frame += 1
print(f'video {self.count + 1}/{self.nf} ({self.frame}/{self.nframes}) {path}: ', end='')
else:
# Read image
self.count += 1
img0 = cv2.imread(path) # BGR
assert img0 is not None, 'Image Not Found ' + path
print(f'image {self.count}/{self.nf} {path}: ', end='')
# Padded resize
img = letterbox(img0, new_shape=self.img_size)[0]
# Convert
img = img[:, :, ::-1].transpose(2, 0, 1) # BGR to RGB, to 3x416x416
img = np.ascontiguousarray(img)
return path, img, img0, self.cap
def new_video(self, path):
self.frame = 0
self.cap = cv2.VideoCapture(path)
self.nframes = int(self.cap.get(cv2.CAP_PROP_FRAME_COUNT))
def __len__(self):
return self.nf # number of files
2.1 数据预处理
检测img_size是否是步长s的倍数:
def check_img_size(img_size, s=32):
# Verify img_size is a multiple(倍数) of stride s
new_size = make_divisible(img_size, int(s)) # ceil gs-multiple
# 获得最接近img_size的s最小倍数
if new_size != img_size:
print('WARNING: --img-size %g must be multiple of max stride %g, updating to %g' % (img_size, s, new_size))
# %g表示浮点数
return new_size
letterbox预处理:
def letterbox(img, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True):
# Resize image to a 32-pixel-multiple rectangle https://github.com/ultralytics/yolov3/issues/232
# 获得当前图像的尺寸
shape = img.shape[:2] # current shape [height, width]
if isinstance(new_shape, int):
new_shape = (new_shape, new_shape)
# new_shape是传入的参数,即:
# img = letterbox(img0, new_shape=self.img_size)[0]
# Scale ratio (new / old)
r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
if not scaleup: # only scale down, do not scale up (for better test mAP)
r = min(r, 1.0)
# Compute padding
ratio = r, r # width, height ratios
new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1] # wh padding
if auto: # minimum rectangle
dw, dh = np.mod(dw, 32), np.mod(dh, 32) # wh padding
elif scaleFill: # stretch
dw, dh = 0.0, 0.0
new_unpad = (new_shape[1], new_shape[0])
ratio = new_shape[1] / shape[1], new_shape[0] / shape[0] # width, height ratios
dw /= 2 # divide padding into 2 sides
dh /= 2
if shape[::-1] != new_unpad: # resize
img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color) # add border
return img, ratio, (dw, dh)
6 模型加载
加载模型的函数attempt_load()
def attempt_load(weights, map_location=None):
# Loads an ensemble of models weights=[a,b,c] or a single model weights=[a] or weights=a
# 也就是多模型融合的意思
model = Ensemble()
for w in weights if isinstance(weights, list) else [weights]:
# 尝试下载模型
attempt_download(w)
# 载入多个模型
model.append(torch.load(w, map_location=map_location)['model'].float().fuse().eval()) # load FP32 model
# torch.load()函数用来加载模型,与nn.Module.load_state_dict()的效果不同
# Compatibility updates
for m in model.modules():
if type(m) in [nn.Hardswish, nn.LeakyReLU, nn.ReLU, nn.ReLU6, nn.SiLU]:
m.inplace = True # pytorch 1.7.0 compatibility
elif type(m) is Conv:
m._non_persistent_buffers_set = set() # pytorch 1.6.0 compatibility
if len(model) == 1:
return model[-1] # return model
else:
print('Ensemble created with %s\n' % weights)
for k in ['names', 'stride']:
setattr(model, k, getattr(model[-1], k))
return model # return ensemble
7 Inference——前向推理
使用代码:
python detect.py
命令行参数:
--weights
:权重文件的路径;--source
:图像来源;--exist-ok
:(未使用)允许run记录文件夹同名;
主函数部分:
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--weights', nargs='+', type=str, default='yolov5s.pt', help='model.pt path(s)')
# 指定权重文件;
# nargs='+':表示至少读一个值
parser.add_argument('--source', type=str, default='data/images', help='source')
# file/folder, 0 for webcam
# 检测的来源
parser.add_argument('--img-size', type=int, default=640, help='inference size (pixels)')
parser.add_argument('--conf-thres', type=float, default=0.25, help='object confidence threshold')
parser.add_argument('--iou-thres', type=float, default=0.45, help='IOU threshold for NMS')
parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
parser.add_argument('--view-img', action='store_true', help='display results')
parser.add_argument('--save-txt', action='store_true', help='save results to *.txt')
parser.add_argument('--save-conf', action='store_true', help='save confidences in --save-txt labels')
parser.add_argument('--classes', nargs='+', type=int, help='filter by class: --class 0, or --class 0 2 3')
parser.add_argument('--agnostic-nms', action='store_true', help='class-agnostic NMS')
parser.add_argument('--augment', action='store_true', help='augmented inference')
parser.add_argument('--update', action='store_true', help='update all models')
parser.add_argument('--project', default='runs/detect', help='save results to project/name')
parser.add_argument('--name', default='exp', help='save results to project/name')
parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
opt = parser.parse_args()
print(opt)
check_requirements()
with torch.no_grad():
if opt.update: # update all models (to fix SourceChangeWarning)
for opt.weights in ['yolov5s.pt', 'yolov5m.pt', 'yolov5l.pt', 'yolov5x.pt']:
detect()
strip_optimizer(opt.weights)
else:
detect()
检测detect()
函数:《【YOLOv5学习笔记】——detect.py.detect()》
8 Train——模型训练
代码学习笔记: [Notes of train.py]
命令行参数:
--weights
:初始化的权重文件路径;--cfg
:模型的配置文件;--data
:数据集的配置信息;