

1.1 yolov9

YOLOv9意味着实时目标检测的重大进步,引入了可编程梯度信息(PGI)和通用高效层聚合网络(GELAN)等开创性技术。该模型在效率、准确性和适应性方面取得了显著改进,在MS COCO数据集上建立了新的基准。YOLOv9项目由一个独立的开源团队开发,建立在Ultralytics YOLOv5提供的强大代码库的基础上,展示了人工智能研究社区的合作精神。

1.2 gdip介绍

gdip-yolo是2022年提出了一个端到端的图像自适应目标检测框架,其论文中的效果展示了良好的图像增强效果。其提出了gdip模块 |mdgip模块 |GDIP regularizer模块等模块,并表明这是效果提升的关键。


基于将gidp模块、ipam集成到ultralytics项目中实现支持预训练权重的gidp-yolov8、ipam-yolov8 所实现的项目代码进行实现。

2.1 创建yaml文件

将以下代码保存为yolov9c-gdip.yaml,如果是要使用IPAM模块,则将- [-1, 1, GatedDIP, [256,7,"gdip-RTTS.pt"]] # GDIP模块修改为 - [-1, 1, IPAM, []] # ia-seg模块

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv9c
# 618 layers, 25590912 parameters, 104.0 GFLOPs

# parameters
nc: 80  # number of classes

# gelan backbone
  - [-1, 1, GatedDIP, [256,7,"gdip-RTTS.pt"]] # GDIP模块
  - [-1, 1, Conv, [64, 3, 2]]  # 0-P1/2
  - [-1, 1, Conv, [128, 3, 2]]  # 1-P2/4
  - [-1, 1, RepNCSPELAN4, [256, 128, 64, 1]]  # 2
  - [-1, 1, ADown, [256]]  # 3-P3/8
  - [-1, 1, RepNCSPELAN4, [512, 256, 128, 1]]  # 4
  - [-1, 1, ADown, [512]]  # 5-P4/16
  - [-1, 1, RepNCSPELAN4, [512, 512, 256, 1]]  # 6
  - [-1, 1, ADown, [512]]  # 7-P5/32
  - [-1, 1, RepNCSPELAN4, [512, 512, 256, 1]]  # 8
  - [-1, 1, SPPELAN, [512, 256]]  # 9

  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 7], 1, Concat, [1]]  # cat backbone P4
  - [-1, 1, RepNCSPELAN4, [512, 512, 256, 1]]  # 12

  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 5], 1, Concat, [1]]  # cat backbone P3
  - [-1, 1, RepNCSPELAN4, [256, 256, 128, 1]]  # 15 (P3/8-small)

  - [-1, 1, ADown, [256]]
  - [[-1, 13], 1, Concat, [1]]  # cat head P4
  - [-1, 1, RepNCSPELAN4, [512, 512, 256, 1]]  # 18 (P4/16-medium)

  - [-1, 1, ADown, [512]]
  - [[-1, 10], 1, Concat, [1]]  # cat head P5
  - [-1, 1, RepNCSPELAN4, [512, 512, 256, 1]]  # 21 (P5/32-large)

  - [[16, 19, 22], 1, Detect, [nc]]  # Detect(P3, P4, P5)

2.2 生成gidp-yolov9模型

打开https://docs.ultralytics.com/models/yolov9/#performance-on-ms-coco-dataset 下载yolov9c模型

参考将gidp模块、ipam集成到ultralytics项目中实现支持预训练权重的gidp-yolov8、ipam-yolov8 中3.3 使用yolov8预训练权重 节中的代码,保存gidp-yolov9.pt模型


2.3 使用yolov9c-gdip模型


from ultralytics import YOLO
if __name__ == '__main__':
    # 使用模型
    model.train(data="coco128.yaml", epochs=3,batch=4)  # 训练模型
    metrics = model.val(data="coco128.yaml")  # 在验证集上评估模型性能
    results = model("https://ultralytics.com/images/bus.jpg")  # 对图像进行预测
    success = model.export(format="onnx") 


256 7
load pretrain model from gdip-RTTS.pt
WARNING ⚠️ The file 'gidp-yolov9c.pt' appears to be improperly saved or formatted. For optimal results, use model.save('filename.p
t') to correctly save YOLO models.
Transferred 963/963 items from pretrained weights
New https://pypi.org/project/ultralytics/8.2.1 available 😃 Update with 'pip install -U ultralytics'
Ultralytics YOLOv8.2.0 🚀 Python-3.8.16 torch-2.1.1+cu121 CUDA:0 (NVIDIA GeForce RTX 3060 Laptop GPU, 12288MiB)
engine\trainer: task=detect, mode=train, model=yolov9c-gdip.yaml, data=coco128.yaml, epochs=3, time=None, patience=100, batch=4, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=train, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, bgr=0.0, mosaic=1.0, mixup=0.0, copy_paste=0.0, auto_augment=randaugment, erasing=0.4, crop_fraction=1.0, cfg=None, tracker=botsort.yaml, save_dir=runs\detect\train

                   from  n    params  module                                       arguments
256 7
load pretrain model from gdip-RTTS.pt
  0                  -1  1   6538646  ultralytics.nn.modules.GDIP.GatedDIP         [256, 7, 'gdip-RTTS.pt']      
  1                  -1  1      1856  ultralytics.nn.modules.conv.Conv             [3, 64, 3, 2]
  2                  -1  1     73984  ultralytics.nn.modules.conv.Conv             [64, 128, 3, 2]
  3                  -1  1    212864  ultralytics.nn.modules.block.RepNCSPELAN4    [128, 256, 128, 64, 1]
  4                  -1  1    164352  ultralytics.nn.modules.block.ADown           [256, 256]
  5                  -1  1    847616  ultralytics.nn.modules.block.RepNCSPELAN4    [256, 512, 256, 128, 1]
  6                  -1  1    656384  ultralytics.nn.modules.block.ADown           [512, 512]
  7                  -1  1   2857472  ultralytics.nn.modules.block.RepNCSPELAN4    [512, 512, 512, 256, 1]       
  8                  -1  1    656384  ultralytics.nn.modules.block.ADown           [512, 512]
  9                  -1  1   2857472  ultralytics.nn.modules.block.RepNCSPELAN4    [512, 512, 512, 256, 1]       
 10                  -1  1    656896  ultralytics.nn.modules.block.SPPELAN         [512, 512, 256]
 11                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 12             [-1, 7]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 13                  -1  1   3119616  ultralytics.nn.modules.block.RepNCSPELAN4    [1024, 512, 512, 256, 1]      
 14                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 15             [-1, 5]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 16                  -1  1    912640  ultralytics.nn.modules.block.RepNCSPELAN4    [1024, 256, 256, 128, 1]      
 17                  -1  1    164352  ultralytics.nn.modules.block.ADown           [256, 256]
 18            [-1, 13]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 19                  -1  1   2988544  ultralytics.nn.modules.block.RepNCSPELAN4    [768, 512, 512, 256, 1]       
 20                  -1  1    656384  ultralytics.nn.modules.block.ADown           [512, 512]
 21            [-1, 10]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 22                  -1  1   3119616  ultralytics.nn.modules.block.RepNCSPELAN4    [1024, 512, 512, 256, 1]      
 23        [16, 19, 22]  1   5644480  ultralytics.nn.modules.head.Detect           [80, [256, 512, 512]]
YOLOv9c-gdip summary: 660 layers, 32129558 parameters, 32129542 gradients, 159.2 GFLOPs

Transferred 963/963 items from pretrained weights

Logging results to runs\detect\train
Starting training for 3 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
        1/3      5.35G     0.9777      1.222      1.194         54        640: 100%|██████████| 32/32 [00:11<00:00,  2.67it/s]     
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 16/16 [00:04<00:00,  3  
                   all        128        929      0.805      0.711      0.814       0.65

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
        2/3      5.83G     0.9605     0.9842      1.164         44        640: 100%|██████████| 32/32 [00:11<00:00,  2.83it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 16/16 [00:03<00:00,  4.
                   all        128        929      0.836      0.706      0.821      0.654

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
        3/3      5.22G     0.9349     0.8878      1.175         85        640: 100%|██████████| 32/32 [00:11<00:00,  2.84it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 16/16 [00:03<00:00,  4.
                   all        128        929      0.824      0.723      0.824      0.659

3 epochs completed in 0.022 hours.
Optimizer stripped from runs\detect\train\weights\last.pt, 64.8MB
Optimizer stripped from runs\detect\train\weights\best.pt, 64.8MB

Validating runs\detect\train\weights\best.pt...
Ultralytics YOLOv8.2.0 🚀 Python-3.8.16 torch-2.1.1+cu121 CUDA:0 (NVIDIA GeForce RTX 3060 Laptop GPU, 12288MiB)
YOLOv9c-gdip summary (fused): 426 layers, 31919574 parameters, 0 gradients, 157.8 GFLOPs
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 16/16 [00:03<00:00,  4.
                   all        128        929      0.829      0.722      0.824       0.66
                person        128        254      0.959      0.646      0.858      0.666
               bicycle        128          6      0.847        0.5      0.687      0.529
                   car        128         46          1      0.367      0.653      0.336
            motorcycle        128          5      0.916          1      0.995      0.831
              airplane        128          6       0.95          1      0.995      0.921
                   bus        128          7       0.93      0.714      0.857      0.753
                 train        128          3      0.896          1      0.995       0.93
                 truck        128         12      0.927        0.5      0.715      0.431
                  boat        128          6      0.648      0.333      0.571       0.47
         traffic light        128         14      0.958      0.429       0.47      0.275
             stop sign        128          2      0.872          1      0.995      0.946
                 bench        128          9          1      0.633       0.94      0.724
                  bird        128         16      0.985          1      0.995      0.711
                   cat        128          4      0.908          1      0.995      0.946
                   dog        128          9          1      0.876      0.995      0.884
                 horse        128          2      0.778          1      0.995      0.754
              elephant        128         17      0.883      0.941      0.944      0.815
                  bear        128          1      0.761          1      0.995      0.895
                 zebra        128          4      0.919          1      0.995      0.943
               giraffe        128          9      0.921          1      0.995      0.858
              backpack        128          6      0.914        0.5       0.64      0.468
              umbrella        128         18      0.815      0.833      0.896      0.669
               handbag        128         19      0.693      0.263      0.507      0.401
                   tie        128          7          1      0.694      0.839      0.665
              suitcase        128          4      0.924          1      0.995      0.648
               frisbee        128          5      0.984        0.8      0.962      0.788
                  skis        128          1       0.83          1      0.995      0.895
             snowboard        128          7      0.679      0.714      0.855      0.637
           sports ball        128          6      0.625        0.5      0.533      0.304
                  kite        128         10      0.788      0.376      0.582      0.165
          baseball bat        128          4      0.948          1      0.995      0.663
        baseball glove        128          7          1      0.407       0.44       0.31
            skateboard        128          5      0.588        0.6      0.646       0.53
         tennis racket        128          7          1      0.667      0.721      0.587
                bottle        128         18      0.761      0.556      0.694       0.45
            wine glass        128         16      0.643      0.812      0.788      0.538
                   cup        128         36      0.849      0.782      0.862      0.612
                  fork        128          6      0.585      0.333       0.75      0.589
                 knife        128         16      0.673       0.75       0.79       0.58
                 spoon        128         22      0.864      0.682      0.751       0.62
                  bowl        128         28      0.825      0.786      0.812      0.732
                banana        128          1      0.782          1      0.995      0.995
              sandwich        128          2      0.639          1      0.995      0.995
                orange        128          4      0.934          1      0.995      0.765
              broccoli        128         11      0.766      0.302      0.531      0.375
                carrot        128         24      0.768      0.828      0.844      0.612
               hot dog        128          2      0.641          1      0.995      0.995
                 pizza        128          5      0.826      0.954      0.962      0.874
                 donut        128         14      0.664          1      0.972      0.901
                  cake        128          4        0.9          1      0.995      0.904
                 chair        128         35      0.721      0.514      0.751      0.547
                 couch        128          6      0.805       0.69      0.839      0.697
          potted plant        128         14          1      0.623      0.868      0.672
                   bed        128          3      0.671          1      0.995      0.929
          dining table        128         13      0.812      0.385      0.689      0.585
                toilet        128          2      0.416        0.5      0.497       0.45
                    tv        128          2      0.852          1      0.995      0.895
                laptop        128          3      0.758      0.667      0.723       0.68
                 mouse        128          2          1          0      0.497      0.204
                remote        128          8      0.921        0.5       0.69      0.625
            cell phone        128          8          1      0.462      0.614      0.426
             microwave        128          3      0.827          1      0.995      0.897
                  oven        128          5      0.437        0.4        0.4      0.251
                  sink        128          6          1      0.422      0.623      0.452
          refrigerator        128          5      0.529          1       0.92      0.787
                  book        128         29      0.738      0.293      0.583      0.335
                 clock        128          9      0.937      0.889      0.975       0.82
                  vase        128          2      0.727          1      0.995      0.995
              scissors        128          1          1          0      0.995      0.199
            teddy bear        128         21      0.865      0.857      0.913       0.66
            toothbrush        128          5      0.855          1      0.995      0.856
Speed: 0.2ms preprocess, 26.0ms inference, 0.0ms loss, 0.8ms postprocess per image

Downloading https://ultralytics.com/images/bus.jpg to 'bus.jpg'...
100%|███████████████████████████████████████████████████████████████████████████████████████████| 476k/476k [00:00<00:00, 650kB/s] 
image 1/1 D:\yolo_seq\ultralytics-main\bus.jpg: 640x480 5 persons, 1 bus, 143.0ms
Speed: 1.0ms preprocess, 143.0ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 480)
Ultralytics YOLOv8.2.0 🚀 Python-3.8.16 torch-2.1.1+cu121 CPU (12th Gen Intel Core(TM) i7-12700H)

PyTorch: starting from 'runs\detect\train\weights\best.pt' with input shape (1, 3, 640, 640) BCHW and output shape(s) (1, 84, 8400) (61.8 MB)

ONNX: starting export with onnx 1.13.1 opset 17...
ONNX: export success ✅ 104.5s, saved as 'runs\detect\train\weights\best.onnx' (124.5 MB)

Export complete (107.6s)
Results saved to D:\yolo_seq\ultralytics-main\runs\detect\train\weights
Predict:         yolo predict task=detect model=runs\detect\train\weights\best.onnx imgsz=640  
Validate:        yolo val task=detect model=runs\detect\train\weights\best.onnx imgsz=640 data=D:\yolo_seq\ultralytics-main\ultralytics\cfg\datasets\coco128.yaml  
Visualize:       https://netron.app
