

  1. 源码-https://github.com/ultralytics/yoloV5-U版YOLOv5-20200625
  2. 如何评价YOLOv5?
  3. 一文读懂YOLO V5 与 YOLO V4
  4. yolov4-AB源论文-YOLOv4: Optimal Speed and Accuracy of Object Detection-20200423
  5. 深入浅出YOLOv5
  6. 深入浅出Yolo系列之Yolov5核心基础知识完整讲解
  7. Yolov3&Yolov4&Yolov5模型权重及网络结构图资源下载



2.2 安装依赖库

pip install -r requirements.txt


pip install +X

2.3 下载模型



2.4 测试结果

Namespace(agnostic_nms=False, augment=False, classes=None, conf_thres=0.25, device='', img_size=640, iou_thres=0.45, output='inference/output', save_conf=False, save_txt=False, source='inference/images', update=False, view_img=False, weights='yolov5s.pt')
Using CUDA device0 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11011MB)

Fusing layers... 
Model Summary: 140 layers, 7.45958e+06 parameters, 0 gradients
image 1/2 /home/hjz/PycharmProjects/pythonProject/yolov5-master/inference/images/bus.jpg: 640x480 4 persons, 1 buss, 1 skateboards, Done. (0.069s)
image 2/2 /home/hjz/PycharmProjects/pythonProject/yolov5-master/inference/images/zidane.jpg: 384x640 2 persons, 2 ties, Done. (0.054s)
Results saved to inference/output
Done. (0.168s)

Process finished with exit code 0


python detect.py --source=inference/int/0.mp4 --output=inference/out/0.mp4 
coco_classes_names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
        'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
        'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
        'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
        'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
        'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
        'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
        'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
        'hair drier', 'toothbrush']



  1. 官方--https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data
  2. 使用YOLOv5训练自己的数据集
  3. Pytorch版YOLOV5训练自己的数据集

3.2 train coco128

python train.py --epochs=20


 1. coco128数据集放在项目同级目录下,和yolov5同级


Using CUDA device0 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11011MB)

Namespace(adam=False, batch_size=16, bucket='', cache_images=False, cfg='', data='data/coco128.yaml', device='', epochs=20, evolve=False, global_rank=-1, hyp='data/hyp.scratch.yaml', image_weights=False, img_size=[640, 640], local_rank=-1, logdir='runs/', multi_scale=False, name='', noautoanchor=False, nosave=False, notest=False, rect=False, resume=False, single_cls=False, sync_bn=False, total_batch_size=16, weights='yolov5s.pt', workers=8, world_size=1)
Start Tensorboard with "tensorboard --logdir runs/", view at http://localhost:6006/
Hyperparameters {'lr0': 0.01, 'lrf': 0.2, 'momentum': 0.937, 'weight_decay': 0.0005, 'warmup_epochs': 3.0, 'warmup_momentum': 0.8, 'warmup_bias_lr': 0.1, 'box': 0.05, 'cls': 0.5, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0, 'hsv_h': 0.015, 'hsv_s': 0.7, 'hsv_v': 0.4, 'degrees': 0.0, 'translate': 0.1, 'scale': 0.5, 'shear': 0.0, 'perspective': 0.0, 'flipud': 0.0, 'fliplr': 0.5, 'mosaic': 1.0, 'mixup': 0.0}

                 from  n    params  module                                  arguments                     
  0                -1  1      3520  models.common.Focus                     [3, 32, 3]                    
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1     19904  models.common.BottleneckCSP             [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  4                -1  1    161152  models.common.BottleneckCSP             [128, 128, 3]                 
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  6                -1  1    641792  models.common.BottleneckCSP             [256, 256, 3]                 
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  8                -1  1    656896  models.common.SPP                       [512, 512, [5, 9, 13]]        
  9                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1    378624  models.common.BottleneckCSP             [512, 256, 1, False]          
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1     95104  models.common.BottleneckCSP             [256, 128, 1, False]          
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1    313088  models.common.BottleneckCSP             [256, 256, 1, False]          
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1   1248768  models.common.BottleneckCSP             [512, 512, 1, False]          
 24      [17, 20, 23]  1    229245  models.yolo.Detect                      [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model Summary: 191 layers, 7.46816e+06 parameters, 7.46816e+06 gradients

Transferred 370/370 items from yolov5s.pt
Optimizer groups: 62 .bias, 70 conv.weight, 59 other
Scanning images: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 128/128 [00:00<00:00, 3224.53it/s]
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty, 0 duplicate, for 128 images): 128it [00:00, 4429.52it/s]
Scanning labels ../coco128/labels/train2017.cache (126 found, 0 missing, 2 empty, 0 duplicate, for 128 images): 128it [00:00, 14134.51it/s]

Analyzing anchors... anchors/target = 4.26, Best Possible Recall (BPR) = 0.9946
Image sizes 640 train, 640 test
Using 8 dataloader workers
Logging results to runs/exp1
Starting training for 20 epochs...

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
      0/19     5.24G   0.04188   0.06183   0.01566    0.1194       171       640: 100%|██████████████████████████████████████████████| 8/8 [00:05<00:00,  1.46it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|██████████████████████████████████| 8/8 [00:04<00:00,  1.77it/s]
                 all         128         929       0.405       0.761       0.698       0.442

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
      1/19     5.12G   0.04172   0.05666   0.01659     0.115       146       640: 100%|██████████████████████████████████████████████| 8/8 [00:01<00:00,  6.57it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|██████████████████████████████████| 8/8 [00:00<00:00,  9.97it/s]
                 all         128         929       0.399       0.765       0.699       0.447

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
      2/19     5.12G    0.0426   0.06244   0.01579    0.1208       196       640: 100%|██████████████████████████████████████████████| 8/8 [00:01<00:00,  6.65it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|██████████████████████████████████| 8/8 [00:00<00:00,  8.07it/s]
                 all         128         929       0.404       0.773       0.702       0.453

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
      3/19     5.12G   0.04476   0.06601   0.01603    0.1268       204       640: 100%|██████████████████████████████████████████████| 8/8 [00:01<00:00,  6.51it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|██████████████████████████████████| 8/8 [00:00<00:00,  9.67it/s]
                 all         128         929       0.396       0.778       0.705       0.455

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
      4/19     5.12G   0.04329   0.06541   0.01635    0.1251       252       640: 100%|██████████████████████████████████████████████| 8/8 [00:01<00:00,  6.58it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|██████████████████████████████████| 8/8 [00:00<00:00,  9.62it/s]
                 all         128         929        0.39       0.781       0.706       0.458

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
      5/19     5.12G     0.043   0.05926   0.01625    0.1185       146       640: 100%|██████████████████████████████████████████████| 8/8 [00:01<00:00,  6.51it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|██████████████████████████████████| 8/8 [00:00<00:00,  9.25it/s]
                 all         128         929        0.39       0.785       0.713       0.463

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
      6/19     5.12G   0.04202   0.06307   0.01541    0.1205       204       640: 100%|██████████████████████████████████████████████| 8/8 [00:01<00:00,  6.71it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|██████████████████████████████████| 8/8 [00:00<00:00,  9.38it/s]
                 all         128         929       0.388       0.791       0.719       0.467

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
      7/19     5.12G   0.04285   0.06677    0.0151    0.1247       204       640: 100%|██████████████████████████████████████████████| 8/8 [00:01<00:00,  6.55it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|██████████████████████████████████| 8/8 [00:00<00:00,  8.91it/s]
                 all         128         929       0.388       0.794       0.723       0.474

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
      8/19     5.12G   0.04252   0.05974   0.01529    0.1176       211       640: 100%|██████████████████████████████████████████████| 8/8 [00:01<00:00,  6.64it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|██████████████████████████████████| 8/8 [00:00<00:00,  9.37it/s]
                 all         128         929       0.386       0.794       0.726        0.48

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
      9/19     5.12G   0.04098   0.06076   0.01374    0.1155       227       640: 100%|██████████████████████████████████████████████| 8/8 [00:01<00:00,  6.52it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|██████████████████████████████████| 8/8 [00:00<00:00,  9.55it/s]
                 all         128         929       0.395       0.799        0.73       0.477

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
     10/19     5.12G   0.04312   0.06949    0.0154     0.128       185       640: 100%|██████████████████████████████████████████████| 8/8 [00:01<00:00,  6.53it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|██████████████████████████████████| 8/8 [00:00<00:00,  8.93it/s]
                 all         128         929       0.393       0.798        0.74       0.483

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
     11/19     5.12G   0.04207   0.05844    0.0155     0.116       190       640: 100%|██████████████████████████████████████████████| 8/8 [00:01<00:00,  6.64it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|██████████████████████████████████| 8/8 [00:00<00:00,  9.47it/s]
                 all         128         929         0.4       0.802       0.744       0.489

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
     12/19     5.12G   0.04147   0.06319   0.01335     0.118       234       640: 100%|██████████████████████████████████████████████| 8/8 [00:01<00:00,  6.57it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|██████████████████████████████████| 8/8 [00:00<00:00,  9.47it/s]
                 all         128         929       0.404       0.801       0.747       0.493

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
     13/19     5.12G   0.04178    0.0565   0.01371     0.112       225       640: 100%|██████████████████████████████████████████████| 8/8 [00:01<00:00,  6.52it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|██████████████████████████████████| 8/8 [00:00<00:00,  9.83it/s]
                 all         128         929       0.419       0.808       0.751       0.498

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
     14/19     5.12G   0.04076   0.05859   0.01472    0.1141       179       640: 100%|██████████████████████████████████████████████| 8/8 [00:01<00:00,  6.57it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|██████████████████████████████████| 8/8 [00:00<00:00,  9.78it/s]
                 all         128         929       0.408       0.815       0.751       0.496

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
     15/19     5.12G   0.04175   0.05848   0.01484    0.1151       181       640: 100%|██████████████████████████████████████████████| 8/8 [00:01<00:00,  6.41it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|██████████████████████████████████| 8/8 [00:00<00:00,  9.17it/s]
                 all         128         929       0.413       0.813       0.754       0.502

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
     16/19     5.12G   0.04283   0.05989   0.01417    0.1169       198       640: 100%|██████████████████████████████████████████████| 8/8 [00:01<00:00,  6.52it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|██████████████████████████████████| 8/8 [00:00<00:00,  9.83it/s]
                 all         128         929       0.415        0.82       0.754       0.503

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
     17/19     5.12G   0.04006   0.05161   0.01465    0.1063       156       640: 100%|██████████████████████████████████████████████| 8/8 [00:01<00:00,  6.54it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|██████████████████████████████████| 8/8 [00:00<00:00,  8.18it/s]
                 all         128         929       0.421       0.827        0.76       0.505

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
     18/19     5.12G   0.04003   0.06271   0.01228     0.115       196       640: 100%|██████████████████████████████████████████████| 8/8 [00:01<00:00,  6.48it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|██████████████████████████████████| 8/8 [00:00<00:00,  8.28it/s]
                 all         128         929        0.42       0.826       0.767       0.509

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
     19/19     5.12G   0.04196   0.06346   0.01286    0.1183       221       640: 100%|██████████████████████████████████████████████| 8/8 [00:01<00:00,  6.54it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|██████████████████████████████████| 8/8 [00:01<00:00,  4.44it/s]
                 all         128         929       0.411       0.822        0.77       0.515
Optimizer stripped from runs/exp1/weights/last.pt, 15.2MB
Optimizer stripped from runs/exp1/weights/best.pt, 15.2MB
20 epochs completed in 0.016 hours.


(pytorch) tensorboard --logdir runs
TensorFlow installation not found - running with reduced feature set.
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.3.0 at http://localhost:6006/ (Press CTRL+C to quit)


3.3 安全帽目标检测(ubuntu16.04)

3.3.1 数据及预处理


It includes 7581 images with 9044 human safety helmet wearing objects(positive) and 111514 normal head objects(not wearing or negative)
 1. 标签:hat&person二类
 2. 难点:在于把数据集划分训练测试和标签
 3. 另外,数据有几个是.JPG需要改成小写.jpg(ubuntu16.04


import os
from pathlib import Path
from shutil import copyfile

from PIL import Image, ImageDraw
from xml.dom.minidom import parse
import numpy as np

FILE_ROOT = f"/home/hjz/PycharmProjects/pythonProject"+"/"

IMAGE_SET_ROOT = FILE_ROOT + f"VOC2028/ImageSets/Main"  # 图片区分文件的路径
IMAGE_PATH = FILE_ROOT + f"VOC2028/JPEGImages"  # 图片的位置
ANNOTATIONS_PATH = FILE_ROOT + f"VOC2028/Annotations"  # 数据集标签文件的位置
LABELS_ROOT = FILE_ROOT + f"VOC2028/Labels"  # 进行归一化之后的标签位置

DEST_IMAGES_PATH = f"./custom_data/images"  # 区分训练集、测试集、验证集的图片目标路径
DEST_LABELS_PATH = f"./custom_data/labels"  # 区分训练集、测试集、验证集的标签文件目标路径

def cord_converter(size, box):
    将标注的 xml 文件标注转换为 darknet 形的坐标
    :param size: 图片的尺寸: [w,h]
    :param box: anchor box 的坐标 [左上角x,左上角y,右下角x,右下角y,]
    :return: 转换后的 [x,y,w,h]

    x1 = int(box[0])
    y1 = int(box[1])
    x2 = int(box[2])
    y2 = int(box[3])

    dw = np.float32(1. / int(size[0]))
    dh = np.float32(1. / int(size[1]))

    w = x2 - x1
    h = y2 - y1
    x = x1 + (w / 2)
    y = y1 + (h / 2)

    x = x * dw
    w = w * dw
    y = y * dh
    h = h * dh
    return [x, y, w, h]

def save_file(img_jpg_file_name, size, img_box):
    save_file_name = LABELS_ROOT + '/' + img_jpg_file_name + '.txt'
    file_path = open(save_file_name, "a+")
    for box in img_box:

        if box[0] == 'person':
            cls_num = 0
            cls_num = 1#两个类别

        new_box = cord_converter(size, box[1:])

        file_path.write(f"{cls_num} {new_box[0]} {new_box[1]} {new_box[2]} {new_box[3]}\n")


def test_dataset_box_feature(file_name, point_array):
    :param image_name: 图片文件名
    :param point_array: 全部的点 [建议框sx1,sy1,sx2,sy2]
    :return: None
    im = Image.open(rf"{IMAGE_PATH}\{file_name}")
    imDraw = ImageDraw.Draw(im)
    for box in point_array:
        x1 = box[1]
        y1 = box[2]
        x2 = box[3]
        y2 = box[4]
        imDraw.rectangle((x1, y1, x2, y2), outline='red')


def get_xml_data(file_path, img_xml_file):
    img_path = file_path + '/' + img_xml_file + '.xml'

    dom = parse(img_path)
    root = dom.documentElement
    img_name = root.getElementsByTagName("filename")[0].childNodes[0].data
    img_size = root.getElementsByTagName("size")[0]
    objects = root.getElementsByTagName("object")
    img_w = img_size.getElementsByTagName("width")[0].childNodes[0].data
    img_h = img_size.getElementsByTagName("height")[0].childNodes[0].data
    img_c = img_size.getElementsByTagName("depth")[0].childNodes[0].data
    # print("img_name:", img_name)
    # print("image_info:(w,h,c)", img_w, img_h, img_c)
    img_box = []
    for box in objects:
        cls_name = box.getElementsByTagName("name")[0].childNodes[0].data
        x1 = int(box.getElementsByTagName("xmin")[0].childNodes[0].data)
        y1 = int(box.getElementsByTagName("ymin")[0].childNodes[0].data)
        x2 = int(box.getElementsByTagName("xmax")[0].childNodes[0].data)
        y2 = int(box.getElementsByTagName("ymax")[0].childNodes[0].data)
        # print("box:(c,xmin,ymin,xmax,ymax)", cls_name, x1, y1, x2, y2)
        img_jpg_file_name = img_xml_file + '.jpg'
        img_box.append([cls_name, x1, y1, x2, y2])
    # print(img_box)

    # test_dataset_box_feature(img_jpg_file_name, img_box)
    save_file(img_xml_file, [img_w, img_h], img_box)

def copy_data(img_set_source, img_labels_root, imgs_source, type):
    file_name = img_set_source + '/' + type + ".txt"
    file = open(file_name)

    # 判断文件夹是否存在,不存在则创建
    root_file = Path(FILE_ROOT + DEST_IMAGES_PATH + '/' + type)
    if not root_file.exists():
        print(f"Path {root_file} is not exit")

    root_file = Path(FILE_ROOT + DEST_LABELS_PATH + '/' + type)
    if not root_file.exists():
        print(f"Path {root_file} is not exit")

    # 遍历文件夹
    for line in file.readlines():
        img_name = line.strip('\n')
        img_sor_file = imgs_source + '/' + img_name + '.jpg'
        label_sor_file = img_labels_root + '/' + img_name + '.txt'

        # print(img_sor_file)
        # print(label_sor_file)
        # im = Image.open(rf"{img_sor_file}")
        # im.show()

        # 复制图片
        DICT_DIR = FILE_ROOT + DEST_IMAGES_PATH + '/' + type
        img_dict_file = DICT_DIR + '/' + img_name + '.jpg'
        copyfile(img_sor_file, img_dict_file)

        # 复制 label
        DICT_DIR = FILE_ROOT + DEST_LABELS_PATH + '/' + type
        img_dict_file = DICT_DIR + '/' + img_name + '.txt'
        copyfile(label_sor_file, img_dict_file)

if __name__ == '__main__':
    # 生成标签
    files = os.listdir(root)
    for file in files:
        print("file name: ", file)
        file_xml = file.split(".")
        get_xml_data(root, file_xml[0])

    # 将文件进行 train 和 val 的区分
    img_set_root = IMAGE_SET_ROOT
    imgs_root = IMAGE_PATH
    img_labels_root = LABELS_ROOT
    copy_data(img_set_root, img_labels_root, imgs_root, "train")
    copy_data(img_set_root, img_labels_root, imgs_root, "val")
    copy_data(img_set_root, img_labels_root, imgs_root, "test")


3.3.2 修改配置文件

  1. hat.yaml:
# Custom data for safety helmet

# train and val data as 1) directory: path/images/, 2) file: path/images.txt, or 3) list: [path1/images/, path2/images/]
train: /home/hjz/PycharmProjects/pythonProject/custom_data/images/train
val: /home/hjz/PycharmProjects/pythonProject/custom_data/images/val
test: /home/hjz/PycharmProjects/pythonProject/custom_data/images/test

# number of classes
nc: 2

# class names
names: ['person', 'hat']
  1. hat_yolov5s.yaml
# parameters
nc: 2  # number of classes
depth_multiple: 0.33  # model depth multiple
width_multiple: 0.50  # layer channel multiple

# anchors,可以后期修改
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32

# YOLOv5 backbone
  # [from, number, module, args]
  [[-1, 1, Focus, [64, 3]],  # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
   [-1, 3, BottleneckCSP, [128]],
   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
   [-1, 9, BottleneckCSP, [256]],
   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16
   [-1, 9, BottleneckCSP, [512]],
   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32
   [-1, 1, SPP, [1024, [5, 9, 13]]],
   [-1, 3, BottleneckCSP, [1024, False]],  # 9

# YOLOv5 head
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]],  # cat backbone P4
   [-1, 3, BottleneckCSP, [512, False]],  # 13

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]],  # cat backbone P3
   [-1, 3, BottleneckCSP, [256, False]],  # 17 (P3/8-small)

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]],  # cat head P4
   [-1, 3, BottleneckCSP, [512, False]],  # 20 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]],  # cat head P5
   [-1, 3, BottleneckCSP, [1024, False]],  # 23 (P5/32-large)

   [[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)

3.3.3 预训练

python train.py --data=data/hat.yaml --cfg=data/hat_yolov5s.yaml --batch-size=16 --epochs=10
Analyzing anchors... anchors/target = 4.25, Best Possible Recall (BPR) = 0.9999
Image sizes 640 train, 640 test
Using 8 dataloader workers
Logging results to runs/exp14
Starting training for 10 epochs...

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
       0/9     4.51G   0.08594   0.07445   0.01321    0.1736        39       640: 100%|██████████████████████████████████████████| 342/342 [00:54<00:00,  6.28it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|████████████████████████████████| 38/38 [00:09<00:00,  3.96it/s]
                 all         607    2.98e+04       0.221       0.288        0.21      0.0712

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
       1/9     4.58G    0.0641     0.067  0.004142    0.1352         9       640: 100%|██████████████████████████████████████████| 342/342 [00:47<00:00,  7.17it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|████████████████████████████████| 38/38 [00:04<00:00,  9.50it/s]
                 all         607    2.98e+04       0.365         0.3       0.251       0.106

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
       2/9     4.58G   0.05703   0.06752  0.002748    0.1273       273       640: 100%|██████████████████████████████████████████| 342/342 [00:48<00:00,  6.98it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|████████████████████████████████| 38/38 [00:06<00:00,  5.97it/s]
                 all         607    2.98e+04       0.406       0.311       0.273       0.144

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
       3/9     4.58G   0.04976   0.06421  0.002333    0.1163         6       640: 100%|██████████████████████████████████████████| 342/342 [00:48<00:00,  6.98it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|████████████████████████████████| 38/38 [00:05<00:00,  6.33it/s]
                 all         607    2.98e+04       0.616       0.307       0.304        0.16

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
       4/9     4.58G   0.04688   0.06446  0.001753    0.1131       273       640: 100%|██████████████████████████████████████████| 342/342 [00:49<00:00,  6.98it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|████████████████████████████████| 38/38 [00:05<00:00,  6.38it/s]
                 all         607    2.98e+04       0.645       0.309       0.306       0.177

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
       5/9     4.58G   0.04377   0.06128  0.001416    0.1065        30       640: 100%|██████████████████████████████████████████| 342/342 [00:49<00:00,  6.96it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|████████████████████████████████| 38/38 [00:05<00:00,  6.42it/s]
                 all         607    2.98e+04       0.627       0.312       0.307       0.178

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
       6/9     4.58G   0.04228    0.0616  0.001187    0.1051       243       640: 100%|██████████████████████████████████████████| 342/342 [00:49<00:00,  6.91it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|████████████████████████████████| 38/38 [00:05<00:00,  6.67it/s]
                 all         607    2.98e+04       0.679       0.312       0.309       0.185

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
       7/9     4.58G   0.04071   0.05956  0.001062    0.1013        15       640: 100%|██████████████████████████████████████████| 342/342 [00:48<00:00,  7.01it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|████████████████████████████████| 38/38 [00:05<00:00,  6.54it/s]
                 all         607    2.98e+04       0.675       0.312       0.309       0.188

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
       8/9     4.58G   0.04015    0.0596 0.0008846    0.1006        48       640: 100%|██████████████████████████████████████████| 342/342 [00:48<00:00,  7.00it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|████████████████████████████████| 38/38 [00:05<00:00,  6.96it/s]
                 all         607    2.98e+04       0.688       0.312        0.31       0.189

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
       9/9     4.58G   0.03959    0.0595 0.0007798   0.09986        39       640: 100%|██████████████████████████████████████████| 342/342 [00:48<00:00,  7.00it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|████████████████████████████████| 38/38 [00:06<00:00,  5.79it/s]
                 all         607    2.98e+04       0.695       0.313       0.312       0.192
Optimizer stripped from runs/exp14/weights/last.pt, 14.8MB
Optimizer stripped from runs/exp14/weights/best.pt, 14.8MB
10 epochs completed in 0.156 hours.

3.3.4 测试

python detect.py 



3.4.1 锚框聚类


# -*- coding: utf-8 -*-
import numpy as np
import random
import argparse
import os

# 参数名称
parser = argparse.ArgumentParser(description='使用该脚本生成YOLO-V3的anchor boxes\n')
parser.add_argument('--input_annotation_txt_dir', required=True, type=str, help='输入存储图片的标注txt文件(注意不要有中文)')
parser.add_argument('--output_anchors_txt', required=True, type=str, help='输出的存储Anchor boxes的文本文件')
parser.add_argument('--input_num_anchors', required=True, default=6, type=int, help='输入要计算的聚类(Anchor boxes的个数)')
parser.add_argument('--input_cfg_width', required=True, type=int, help="配置文件中width")
parser.add_argument('--input_cfg_height', required=True, type=int, help="配置文件中height")
args = parser.parse_args()
centroids 聚类点 尺寸是 numx2,类型是ndarray
annotation_array 其中之一的标注框

def IOU(annotation_array, centroids):
    similarities = []
    # 其中一个标注框
    w, h = annotation_array
    for centroid in centroids:
        c_w, c_h = centroid
        if c_w >= w and c_h >= h:  # 第1中情况
            similarity = w * h / (c_w * c_h)
        elif c_w >= w and c_h <= h:  # 第2中情况
            similarity = w * c_h / (w * h + (c_w - w) * c_h)
        elif c_w <= w and c_h >= h:  # 第3种情况
            similarity = c_w * h / (w * h + (c_h - h) * c_w)
        else:  # 第3种情况
            similarity = (c_w * c_h) / (w * h)
    # 将列表转换为ndarray
    return np.array(similarities, np.float32)  # 返回的是一维数组,尺寸为(num,)

annotations_array 所有的标注框的宽高,N个标注框,尺寸是Nx2,类型是ndarray
centroids 聚类点 尺寸是 numx2,类型是ndarray

def k_means(annotations_array, centroids, eps=0.00005, iterations=200000):

    N = annotations_array.shape[0]  # C=2
    num = centroids.shape[0]
    # 损失函数
    distance_sum_pre = -1
    assignments_pre = -1 * np.ones(N, dtype=np.int64)
    iteration = 0
    # 循环处理
    while (True):
        iteration += 1
        distances = []
        # 循环计算每一个标注框与所有的聚类点的距离(IOU)
        for i in range(N):
            distance = 1 - IOU(annotations_array[i], centroids)
        # 列表转换成ndarray
        distances_array = np.array(distances, np.float32)  # 该ndarray的尺寸为 Nxnum
        # 找出每一个标注框到当前聚类点最近的点
        assignments = np.argmin(distances_array, axis=1)  # 计算每一行的最小值的位置索引
        # 计算距离的总和,相当于k均值聚类的损失函数
        distances_sum = np.sum(distances_array)
        # 计算新的聚类点
        centroid_sums = np.zeros(centroids.shape, np.float32)
        for i in range(N):
            centroid_sums[assignments[i]] += annotations_array[i]  # 计算属于每一聚类类别的和
        for j in range(num):
            centroids[j] = centroid_sums[j] / (np.sum(assignments == j))
        # 前后两次的距离变化
        diff = abs(distances_sum - distance_sum_pre)
        # 打印结果
        print("iteration: {},distance: {}, diff: {}, avg_IOU: {}\n".format(iteration, distances_sum, diff,
                                                                           np.sum(1 - distances_array) / (N * num)))
        # 三种情况跳出while循环:1:循环20000次,2:eps计算平均的距离很小 3:以上的情况
        if (assignments == assignments_pre).all():
        if diff < eps:
        if iteration > iterations:
        # 记录上一次迭代
        distance_sum_pre = distances_sum
        assignments_pre = assignments.copy()

if __name__ == '__main__':
    # 聚类点的个数,anchor boxes的个数
    num_clusters = args.input_num_anchors
    # 索引出文件夹中的每一个标注文件的名字(.txt)
    names = os.listdir(args.input_annotation_txt_dir)
    # 标注的框的宽和高
    annotations_w_h = []
    for name in names:
        txt_path = os.path.join(args.input_annotation_txt_dir, name)
        # 读取txt文件中的每一行
        f = open(txt_path, 'r',encoding="utf-8")
        for line in f.readlines():
            line = line.rstrip('\n')
            w, h = line.split(' ')[3:]  # 这时读到的w,h是字符串类型
            # eval()函数用来将字符串转换为数值型
            annotations_w_h.append((eval(w), eval(h)))
        # 将列表annotations_w_h转换为numpy中的array,尺寸是(N,2),N代表多少框
        annotations_array = np.array(annotations_w_h, dtype=np.float32)
    N = annotations_array.shape[0]
    # 对于k-means聚类,随机初始化聚类点
    random_indices = [random.randrange(N) for i in range(num_clusters)]  # 产生随机数
    centroids = annotations_array[random_indices]
    # k-means聚类
    k_means(annotations_array, centroids, 0.00005, 200000)
    # 对centroids按照宽排序,并写入文件
    widths = centroids[:, 0]
    sorted_indices = np.argsort(widths)
    anchors = centroids[sorted_indices]
    # 将anchor写入文件并保存
    f_anchors = open(args.output_anchors_txt, 'w')
    for anchor in anchors:
        f_anchors.write('%d,%d' % (int(anchor[0] * args.input_cfg_width), int(anchor[1] * args.input_cfg_height)))
python gen_anchors_kmeans.py --input_annotation_txt_dir=/home/hjz/PycharmProjects/pythonProject/VOC2028/Labels --output_anchors_txt=achors.txt --input_num_anchors=9 --input_cfg_width=640 --input_cfg_height=640
iteration: 189,distance: 2494381.0, diff: 2.75, avg_IOU: 0.23371242911610443



#  - [10,13, 16,30, 33,23]  # P3/8
#  - [30,61, 62,45, 59,119]  # P4/16
#  - [116,90, 156,198, 373,326]  # P5/32

  - [8,18, 12,26, 19,36]  # P3/8
  - [30,52, 45,77, 68,114]  # P4/16
  - [96,175, 153,250, 287,399]  # P5/32

3.4.2 train

python train.py --data data/hat.yaml --cfg data/hat_yolov5s.yaml --weights yolov5s.pt --batch-size 32 --epochs 100
     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
     98/99     5.95G   0.03468    0.0506  0.000218   0.08549       846       640: 100%|██████████████████████████████████████████| 171/171 [00:40<00:00,  4.24it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|████████████████████████████████| 19/19 [00:03<00:00,  5.64it/s]
                 all         607    2.98e+04       0.792       0.313       0.314         0.2

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
     99/99     5.95G   0.03429   0.05084 0.0002206   0.08535      1512       640: 100%|██████████████████████████████████████████| 171/171 [00:39<00:00,  4.28it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|████████████████████████████████| 19/19 [00:03<00:00,  4.78it/s]
                 all         607    2.98e+04       0.791       0.313       0.313       0.199
Optimizer stripped from runs/exp15/weights/last.pt, 14.8MB
Optimizer stripped from runs/exp15/weights/best.pt, 14.8MB
100 epochs completed in 1.214 hours.

3.4.3 test.py

python test.py --weights=/home/hjz/PycharmProjects/pythonProject/01-yolov5-master/runs/exp15/weights/last.pt --data=data/hat.yaml 
Fusing layers... 
Model Summary: 140 layers, 7.24922e+06 parameters, 0 gradients
Scanning labels /home/hjz/PycharmProjects/pythonProject/custom_data/labels/val.cache (607 found, 0 missing, 0 empty, 607 duplicate, for 607 images): 607it [00:00, 11843.14it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|████████████████████████████████| 19/19 [00:04<00:00,  4.13it/s]
                 all         607    2.98e+04       0.768       0.313       0.315       0.199
Speed: 1.2/1.0/2.2 ms inference/NMS/total per 640x640 image at batch-size 32

3.5 yolov5s训练600

   594/599     9.41G   0.02903    0.0438 0.0001633   0.07299      2100       640: 100%|██████████████████████████████████████████| 114/114 [00:37<00:00,  3.05it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|████████████████████████████████| 13/13 [00:02<00:00,  4.39it/s]
                 all         607    2.98e+04       0.805       0.312       0.311       0.195

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
   595/599     9.41G   0.02927   0.04337 0.0001394   0.07278      2184       640: 100%|██████████████████████████████████████████| 114/114 [00:37<00:00,  3.06it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|████████████████████████████████| 13/13 [00:02<00:00,  4.38it/s]
                 all         607    2.98e+04       0.804       0.312       0.311       0.195

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
   596/599     9.41G   0.02892   0.04282  0.000151   0.07189      1752       640: 100%|██████████████████████████████████████████| 114/114 [00:37<00:00,  3.05it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|████████████████████████████████| 13/13 [00:03<00:00,  4.30it/s]
                 all         607    2.98e+04       0.803       0.312       0.311       0.195

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
   597/599     9.41G    0.0288    0.0426 0.0001617   0.07156      2142       640: 100%|██████████████████████████████████████████| 114/114 [00:37<00:00,  3.05it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|████████████████████████████████| 13/13 [00:02<00:00,  4.46it/s]
                 all         607    2.98e+04       0.803       0.311       0.311       0.195

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
   598/599     9.41G   0.02896   0.04226 0.0001563   0.07137      1836       640: 100%|██████████████████████████████████████████| 114/114 [00:37<00:00,  3.07it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|████████████████████████████████| 13/13 [00:03<00:00,  4.31it/s]
                 all         607    2.98e+04       0.803       0.311        0.31       0.195

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
   599/599     9.41G   0.02936   0.04317 0.0002012   0.07273      1827       640: 100%|██████████████████████████████████████████| 114/114 [00:37<00:00,  3.05it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|████████████████████████████████| 13/13 [00:03<00:00,  3.40it/s]
                 all         607    2.98e+04       0.803       0.311       0.311       0.195
Optimizer stripped from runs/exp17/weights/last.pt, 14.8MB
Optimizer stripped from runs/exp17/weights/best.pt, 14.8MB
600 epochs completed in 6.774 hours.



3.6 yolov5x实测(ubuntu16.04)

3.6.1 train.py


python train.py --data data/hat.yaml --cfg data/hat_yolov5x.yaml --weights yolov5x.ptpt --batch-size 8 --epochs 300
     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
   298/299     8.42G   0.02804   0.04383 0.0002171   0.07209        30       640: 100%|██████████████████████████████████████████| 683/683 [04:02<00:00,  2.82it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|████████████████████████████████| 76/76 [00:08<00:00,  8.74it/s]
                 all         607    2.98e+04       0.814       0.315       0.314       0.202

     Epoch   gpu_mem       box       obj       cls     total   targets  img_size
   299/299     8.42G   0.02814   0.04331 0.0001779   0.07163        30       640: 100%|██████████████████████████████████████████| 683/683 [04:02<00:00,  2.82it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|████████████████████████████████| 76/76 [00:08<00:00,  8.54it/s]
                 all         607    2.98e+04       0.815       0.315       0.314       0.202
Optimizer stripped from runs/exp5/weights/last.pt, 177.5MB
Optimizer stripped from runs/exp5/weights/best.pt, 177.5MB
300 epochs completed in 21.455 hours.

3.6.2 test.py

python test.py --weights=/home/hjz/PycharmProjects/pythonProject/01-yolov5-master/runs/exp5/weights/last.pt --data=data/hat.yaml 
Fusing layers... 
Model Summary: 284 layers, 8.83973e+07 parameters, 0 gradients
Scanning labels /home/hjz/PycharmProjects/pythonProject/custom_data/labels/val.cache (607 found, 0 missing, 0 empty, 607 duplicate, for 607 images): 607it [00:00, 12405.50it/s]
               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|████████████████████████████████| 19/19 [00:08<00:00,  2.26it/s]
                 all         607    2.98e+04       0.812       0.315       0.315       0.202
Speed: 7.9/0.9/8.9 ms inference/NMS/total per 640x640 image at batch-size 32


3.6.3 detect.py



3.7 yolov5再测(windows10)





  1. 首先,下载数据集VOC2028,可以放在项目文件夹下
  2. 运行detect.py对数据集生成人标签0


python detect.py --save-txt --source=E:\01_hjz\01_work\pythonProject\Smart_Construction-master\VOC2028\JPEGImages
  1. 运行gen_head_helmet.py生成score文件夹训练验证测试划分
  2. 新建文件夹Labels,运行merge_data.py,把label=0生成到VOC2028label中

3.7.2 训练


train and val data as 1) directory: path/images/, 2) file: path/images.txt, or 3) list: [path1/images/, path2/images/]
train: ../Smart_Construction-master/score/images/train
val: ../Smart_Construction-master/score/images/val

# number of classes
nc: 3

# class names
names: ['person', 'head', 'helmet']


Anchors:[7.77, 15.87]
Anchors:[9.21, 20.2]
Anchors:[11.5, 23.23]
Anchors:[13.82, 28.93]
Anchors:[18.51, 35.12]
Anchors:[25.6, 44.74]
Anchors:[36.0, 61.16]
Anchors:[52.8, 89.0]
Anchors:[85.33, 147.99]
Ratios:[0.46, 0.48, 0.49, 0.49, 0.53, 0.57, 0.58, 0.59, 0.59]
******************** 1 ********************


python train.py --img 640 --batch 32 --epochs 100 --data ./data/custom_data.yaml --cfg ./models/custom_yolov5.yaml --weights ./weights/yolov5s.pt


               Class      Images     Targets           P           R      mAP@.5  mAP@.5:.95: 100%|███████████████████████████████████████████
██████████| 19/19 [00:29<00:00,  1.55s/it]
                 all         607    1.29e+04       0.897       0.893       0.875       0.611



python test.py --weights=./weights/helmet_head_person_s.pt --data=./data/custom_data.yaml
██████████| 19/19 [00:30<00:00,  1.60s/it]
                 all         607    1.29e+04       0.862       0.894       0.874       0.589
Speed: 1.4/1.1/2.5 ms inference/NMS/total per 640x640 image at batch-size 32



python detect.py --weights=runs/exp15/weights/best.pt --source=E:\01_hjz\01_work\pythonProject\Smart_Construction-master\inference\int\video


如果觉得窗口太小,可以在108行cv2.imshow(p, im0)前面加上一行cv2.namedWindow(p, cv2.WINDOW_NORMAL)
python detect.py --weights=runs/exp15/weights/last.pt --source=0

