Yolov8-pose：从零开始训练Yolov8关键点检测模型

CITY_OF_MO_GY

于 2024-10-01 16:09:52 发布

阅读量308

点赞数 7

文章标签： YOLO 深度学习人工智能

本文链接：https://blog.csdn.net/CITY_OF_MO_GY/article/details/142670864

版权

一、关键点检测模型推理

1. 拉取yolov8源码

# 克隆官方源代码
git clone https://gitee.com/monkeycc/ultralytics.git
cd ./ultralytics
# 创建预训练模型文件夹，并下载关键点检测预训练模型
mkdir weights
cd ./weights
wget https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8n-pose.pt
cd ..

2. 创建虚拟环境

# 创建anaconda虚拟环境
conda create -n yolov8-pose python=3.10
conda activate yolov8-pose

3. 移动并修改预测代码predict.py

# 复制predict.py代码到项目根目录下
cp ./ultralytics/models/yolo/pose/predict.py ./

# 编辑predict.py
vim ./predict.py

# 在predict.py最下面添加如下代码, model为下载的预训练模型路径，source为待识别的图片路径
if __name__ == '__main__':
    from ultralytics.utils import ASSETS
    from ultralytics.models.yolo.pose import PosePredictor

    args = dict(model="./weights/yolov8n-pose.pt", source="./test_imgs/g3.jpg")
    predictor = PosePredictor(overrides=args)
    predictor.predict_cli()

4. 修改default.yaml配置文件（如果在上面指定了model和source参数，配置文件中可以不做任何修改）

# default.yaml

# Ultralytics YOLO 🚀, AGPL-3.0 license
# Default training settings and hyperparameters for medium-augmentation COCO training

task: pose # (str) YOLO task, i.e. detect, segment, classify, pose
mode: predict # (str) YOLO mode, i.e. train, val, predict, export, track, benchmark

# Train settings -------------------------------------------------------------------------------------------------------
model: ./weights/yolov8n-pose.pt # (str, optional) path to model file, i.e. yolov8n.pt, yolov8n.yaml
data: # (str, optional) path to data file, i.e. coco8.yaml
epochs: 100 # (int) number of epochs to train for
time: # (float, optional) number of hours to train for, overrides epochs if supplied
patience: 100 # (int) epochs to wait for no observable improvement for early stopping of training
batch: 16 # (int) number of images per batch (-1 for AutoBatch)
imgsz: 640 # (int | list) input images size as int for train and val modes, or list[h,w] for predict and export modes
save: True # (bool) save train checkpoints and predict results
save_period: -1 # (int) Save checkpoint every x epochs (disabled if < 1)
cache: False # (bool) True/ram, disk or False. Use cache for data loading
device: cpu # (int | str | list, optional) device to run on, i.e. cuda device=0 or device=0,1,2,3 or device=cpu
workers: 8 # (int) number of worker threads for data loading (per RANK if DDP)
project: CSDN # (str, optional) project name
name: pose-1 # (str, optional) experiment name, results saved to 'project/name' directory
exist_ok: False # (bool) whether to overwrite existing experiment
pretrained: True # (bool | str) whether to use a pretrained model (bool) or a model to load weights from (str)
optimizer: auto # (str) optimizer to use, choices=[SGD, Adam, Adamax, AdamW, NAdam, RAdam, RMSProp, auto]
verbose: True # (bool) whether to print verbose output
seed: 0 # (int) random seed for reproducibility
deterministic: True # (bool) whether to enable deterministic mode
single_cls: False # (bool) train multi-class data as single-class
rect: False # (bool) rectangular training if mode='train' or rectangular validation if mode='val'
cos_lr: False # (bool) use cosine learning rate scheduler
close_mosaic: 10 # (int) disable mosaic augmentation for final epochs (0 to disable)
resume: False # (bool) resume training from last checkpoint
amp: True # (bool) Automatic Mixed Precision (AMP) training, choices=[True, False], True runs AMP check
fraction: 1.0 # (float) dataset fraction to train on (default is 1.0, all images in train set)
profile: False # (bool) profile ONNX and TensorRT speeds during training for loggers
freeze: None # (int | list, optional) freeze first n layers, or freeze list of layer indices during training
multi_scale: False # (bool) Whether to use multiscale during training
# Segmentation
overlap_mask: True # (bool) masks should overlap during training (segment train only)
mask_ratio: 4 # (int) mask downsample ratio (segment train only)
# Classification
dropout: 0.0 # (float) use dropout regularization (classify train only)

# Val/Test settings ----------------------------------------------------------------------------------------------------
val: True # (bool) validate/test during training
split: val # (str) dataset split to use for validation, i.e. 'val', 'test' or 'train'
save_json: False # (bool) save results to JSON file
save_hybrid: False # (bool) save hybrid version of labels (labels + additional predictions)
conf: # (float, optional) object confidence threshold for detection (default 0.25 predict, 0.001 val)
iou: 0.7 # (float) intersection over union (IoU) threshold for NMS
max_det: 300 # (int) maximum number of detections per image
half: False # (bool) use half precision (FP16)
dnn: False # (bool) use OpenCV DNN for ONNX inference
plots: True # (bool) save plots and images during train/val

# Predict settings -----------------------------------------------------------------------------------------------------
source: ./test_imgs/g3.jpg # (str, optional) source directory for images or videos
vid_stride: 1 # (int) video frame-rate stride
stream_buffer: False # (bool) buffer all streaming frames (True) or return the most recent frame (False)
visualize: False # (bool) visualize model features
augment: False # (bool) apply image augmentation to prediction sources
agnostic_nms: False # (bool) class-agnostic NMS
classes: # (int | list[int], optional) filter results by class, i.e. classes=0, or classes=[0,2,3]
retina_masks: False # (bool) use high-resolution segmentation masks
embed: # (list[int], optional) return feature vectors/embeddings from given layers

# Visualize settings ---------------------------------------------------------------------------------------------------
show: False # (bool) show predicted images and videos if environment allows
save_frames: False # (bool) save predicted individual video frames
save_txt: False # (bool) save results as .txt file
save_conf: False # (bool) save results with confidence scores
save_crop: False # (bool) save cropped images with results
show_labels: True # (bool) show prediction labels, i.e. 'person'
show_conf: True # (bool) show prediction confidence, i.e. '0.99'
show_boxes: True # (bool) show prediction boxes
line_width: # (int, optional) line width of the bounding boxes. Scaled to image size if None.

# Export settings ------------------------------------------------------------------------------------------------------
format: torchscript # (str) format to export to, choices at https://docs.ultralytics.com/modes/export/#export-formats
keras: False # (bool) use Kera=s
optimize: False # (bool) TorchScript: optimize for mobile
int8: False # (bool) CoreML/TF INT8 quantization
dynamic: False # (bool) ONNX/TF/TensorRT: dynamic axes
simplify: True # (bool) ONNX: simplify model using `onnxslim`
opset: # (int, optional) ONNX: opset version
workspace: 4 # (int) TensorRT: workspace size (GB)
nms: False # (bool) CoreML: add NMS

# Hyperparameters ------------------------------------------------------------------------------------------------------
lr0: 0.01 # (float) initial learning rate (i.e. SGD=1E-2, Adam=1E-3)
lrf: 0.01 # (float) final learning rate (lr0 * lrf)
momentum: 0.937 # (float) SGD momentum/Adam beta1
weight_decay: 0.0005 # (float) optimizer weight decay 5e-4
warmup_epochs: 3.0 # (float) warmup epochs (fractions ok)
warmup_momentum: 0.8 # (float) warmup initial momentum
warmup_bias_lr: 0.1 # (float) warmup initial bias lr
box: 7.5 # (float) box loss gain
cls: 0.5 # (float) cls loss gain (scale with pixels)
dfl: 1.5 # (float) dfl loss gain
pose: 12.0 # (float) pose loss gain
kobj: 1.0 # (float) keypoint obj loss gain
label_smoothing: 0.0 # (float) label smoothing (fraction)
nbs: 64 # (int) nominal batch size
hsv_h: 0.015 # (float) image HSV-Hue augmentation (fraction)
hsv_s: 0.7 # (float) image HSV-Saturation augmentation (fraction)
hsv_v: 0.4 # (float) image HSV-Value augmentation (fraction)
degrees: 0.0 # (float) image rotation (+/- deg)
translate: 0.1 # (float) image translation (+/- fraction)
scale: 0.5 # (float) image scale (+/- gain)
shear: 0.0 # (float) image shear (+/- deg)
perspective: 0.0 # (float) image perspective (+/- fraction), range 0-0.001
flipud: 0.0 # (float) image flip up-down (probability)
fliplr: 0.5 # (float) image flip left-right (probability)
bgr: 0.0 # (float) image channel BGR (probability)
mosaic: 1.0 # (float) image mosaic (probability)
mixup: 0.0 # (float) image mixup (probability)
copy_paste: 0.0 # (float) segment copy-paste (probability)
auto_augment: randaugment # (str) auto augmentation policy for classification (randaugment, autoaugment, augmix)
erasing: 0.4 # (float) probability of random erasing during classification training (0-0.9), 0 means no erasing, must be less than 1.0.
crop_fraction: 1.0 # (float) image crop fraction for classification (0.1-1), 1.0 means no crop, must be greater than 0.

# Custom config.yaml ---------------------------------------------------------------------------------------------------
cfg: # (str, optional) for overriding defaults.yaml

# Tracker settings ------------------------------------------------------------------------------------------------------
tracker: botsort.yaml # (str) tracker type, choices=[botsort.yaml, bytetrack.yaml]

5. 模型推理

python ./predict.py

二、数据集准备

训练数据标注这里使用labelme标注工具进行数据标注，labelme的具体使用方法请参考这篇博客，自己训练模型需要创建一个数据集的配置文件，这里就以官方的coco8-pose.yaml为例讲解一下创建数据集配置文件的关键参数及其含义，下面是coco8-pose.yaml文件的全部内容：

# COCO8-pose.yaml

# Ultralytics YOLO 🚀, AGPL-3.0 license
# COCO8-pose dataset (first 8 images from COCO train2017) by Ultralytics
# Documentation: https://docs.ultralytics.com/datasets/pose/coco8-pose/
# Example usage: yolo train data=coco8-pose.yaml
# parent
# ├── ultralytics
# └── datasets
#     └── coco8-pose  ← downloads here (1 MB)

# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/coco8-pose # dataset root dir
train: images/train # train images (relative to 'path') 4 images
val: images/val # val images (relative to 'path') 4 images
test: # test images (optional)

# Keypoints
kpt_shape: [17, 3] # number of keypoints, number of dims (2 for x,y or 3 for x,y,visible)
flip_idx: [0, 2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 12, 11, 14, 13, 16, 15]

# Classes
names:
  0: person

# Download script/URL (optional)
download: https://github.com/ultralytics/assets/releases/download/v0.0.0/coco8-pose.zip

path：数据集的相对路径

train：训练集的图片路径

val：验证集的图片路径

test：测试集的图片路径

kpt_shape：标签的维度；[17, 3]表示该数据集有17个待训练的点，每个点由3个值来表示，这三个值分别问x坐标、y坐标、可见度；

flip_idx：表示镜像数据点映射，一般可忽略；

names：目标检测框的信息

备注：训练集、验证集、测试集的划分比例一般建议为7:2:1；

训练集文件夹的保存架构如下：

Dataset
├── images
    └── 1.jpg
└── labels
    └── 1.txt

三、模型训练

1. 移动并修改预测代码train.py

# 复制train.py代码到项目根目录下
cp ./ultralytics/models/yolo/pose/train.py ./

# 编辑predict.py
vim ./predict.py

# 在train.py最下面添加如下代码, model为下载的预训练模型路径，data为训练集配置文件路径

if __name__ == "__main__":
    from ultralytics.models.yolo.pose import PoseTrainer
    args = dict(model="yolov8n-pose.pt", data="coco8-pose.yaml", epochs=3)
    trainer = PoseTrainer(overrides=args)
    trainer.train()

其中训练模型常用参数有：

epochs：表示需要训练的轮数

batch：表示一次训练的图片数

imgsz：表示输入模型张量的尺寸

cache：表示是否加载训练集到内存，可以提高训练效率

device：表示选择训练的硬件设备CPU或者GPU

workers：表示开始的线程数

备注：更多的参数调整可以到./ultralytics/cfg/defaule.yaml中进行赋值；

2. 开启训练

python train.py

四、自训练模型测试

训练完成后，训练结果模型会保存在指定的文件夹下，可以修改predict.py中的model参数的路径进行图片测试，查看自己模型的训练效果，具体如下：

# predict.py
# 假设训练好的数据保存在./result/1/best-pose.pt

if __name__ == '__main__':
    from ultralytics.utils import ASSETS
    from ultralytics.models.yolo.pose import PosePredictor

    args = dict(model="./result/1/best-pose.pt", source="./test_imgs/g3.jpg")
    predictor = PosePredictor(overrides=args)
    predictor.predict_cli()