使用yolov8训练coco_whole_body进行人脸检测

AQUILIOS

已于 2024-03-11 09:37:03 修改

阅读量2.3k

点赞数 30

文章标签： YOLO

于 2024-03-10 22:29:30 首次发布

本文链接：https://blog.csdn.net/AQUILIOS/article/details/136609470

版权

剧透：主要是讲我怎么生成合适的数据标注文件

文章目录

1 前情提要
- 1.1 yolov8数据集的要求格式
- 1.2 coco_whole_body的格式
2 文件格式转换
3 实验
4 结果展示

1 前情提要

我使用的yolov8是ultralytics版本
coco_whole_body的json标注文件下载地址（谷歌云盘：train/val）

1.1 yolov8数据集的要求格式

以coco为例，我们从ultralytics官网下载会得到以下结果(这是我整理过的，不过下载完成，解压压缩文件，差不多就是这样）

coco
├── annotations
├── coco
├── coco2017labels.zip
├── images
├── labels
├── README.txt
├── test-dev2017.txt
├── train2017.txt
└── val2017.txt

其中，真正有用的只有images,labels,train2017.txt,val2017.txt这四个（当然如果想用test测试集，test-dev2017.txt也重要= =；）

coco
├── images
│   ├── test2017
│   ├── train2017
│   └── val2017
├── labels
│   ├── train2017
│   ├── train2017.cache
│   ├── val2017
│   └── val2017.cache
├── test-dev2017.txt
├── train2017.txt
└── val2017.txt

images文件包括三个文件夹（train2017,val2017,test2017），各文件夹下放的全都是jpg图片
labels同理，各文件夹下放的都是单张图片的txt标注,cache是yolov8训练的时候它生成的，忽略
train2017.txt,val2017.txt则是存放images中图片的相对路径，训练的时候就是根据这两个文件去寻找图片的
以./images/val2017/000000128699.jpg为例，它的标注文件是./labels/val2017/000000128699.txt,它的相对路径被存放在./val2017.txt中

000000128699.txt的内容：
每一行表示一个标注框
第一个数字代表种类，后面的四个数字依次代表目标中心x,目标中心y，目标框宽，目标框高。
这四个数字都是被归一化后的结果，也就是处于[0,1]之间

0 0.553974 0.26226 0.403333 0.32584
0 0.108761 0.83748 0.105897 0.09888
0 0.452322 0.7319 0.121339 0.07872
0 0.695655 0.6635 0.1149 0.06324
0 0.607009 0.68055 0.118519 0.09518
36 0.426595 0.43405 0.212735 0.07442
0 0.0479202 0.86819 0.0958405 0.1103
0 0.379288 0.73733 0.0603989 0.0873
0 0.968675 0.59442 0.0626496 0.13864
0 0.791481 0.6273 0.134473 0.07496
0 0.734245 0.63832 0.0813675 0.0784
0 0.638917 0.65376 0.065698 0.04848
0 0.334501 0.76684 0.0702564 0.06752

1.2 coco_whole_body的格式

以coco_wholebody_val_v1.0.json为例，这个json文件的字段如下

'images':[
{'license': 4,
 'file_name': '000000397133.jpg',
 'coco_url': 'http://images.cocodataset.org/val2017/000000397133.jpg',
 'height': 427,
 'width': 640,
 'date_captured': '2013-11-14 17:02:52',
 'flickr_url': 'http://farm7.staticflickr.com/6116/6255196340_da26cf2c9e_z.jpg',
 'id': 397133}
 ...
 }
annotations[
{
"segmentation": []
"face_box": list([x, y, w, h]),
"lefthand_box": list([x, y, w, h]),
"righthand_box": list([x, y, w, h]),

"foot_kpts": list([x, y, v] * 6),
"face_kpts": list([x, y, v] * 68),
"lefthand_kpts": list([x, y, v] * 21),
"righthand_kpts": list([x, y, v] * 21),

"face_valid": bool,
"lefthand_valid": bool,
"righthand_valid": bool,
"foot_valid": bool,

"[cloned]": ...,
}
...
]

由于我这里搞人脸检测，所以只关注face_box,face_valid,id,file_name,height,width这几个字段。具体的格式介绍可以看这里。
face_box的（x,y,w,h）是左上角坐标和目标框的宽高，且未归一化。

因此我们现在的目标就是根据这两个json文件去生成labels里的单图片标注和图片相对路径文件

2 文件格式转换

首先要创建新文件夹，我的目录如下
因为已经下载过coco数据集了，图片我直接挂了一个软链

whole
├── images -> /home/marryyoo/yolov8/datasets/coco/images
├── labels
│   ├── train2017
│   └── val2017

2.1生成归一化坐标框后的图片字典

# get_whole_json.py
import json 

# 指向你的json文件路径
whole_file = './coco_wholebody_train_v1.0.json'
# 生成train的图片字典就写'train'，不然就是'val'
file_kind ='train'

with open(whole_file, 'r') as f: 
    content = json.load(f)

images = content['images']
print(f'there are {len(images)} images in total')
anns = content['annotations']
print(f'there are {len(anns)} annotations in total')

img_dict = {}
for img in images:
    img_dict[img['id']]={'img_path':img['file_name'], 'height':img['height'], 'width':img['width'], 'face_box':[]}

for ann in anns:
    if ann['face_valid'] == True:
        left,top,w,h = ann['face_box']
        p_width, p_height =img_dict[ann['image_id']]['width'], img_dict[ann['image_id']]['height'] 
        center_x = (left + w/2) / p_width 
        center_y = (top + h/2) / p_height
        norm_w = w/p_width
        norm_h = h/p_height
        norm_box = [center_x,center_y,norm_w,norm_h]
        img_dict[ann['image_id']]['face_box'].append(norm_box)


with open(f'whole_face_{file_kind}.json','w') as f: 
    json.dump(img_dict,f)
print(f'数据集字典已构造完毕')

程序运行完成后便可以得到whole_face_val.json和whole_face_train.json这两个文件
这两个文件都是大字典，key是图片的id,value是包含图片路径，图片高宽，图片归一化标注框的小字典
图片归一化标注框是列表，长度为零就代表图片上没有人脸框

# whole_face_val.json
{
"397133": {"img_path": "000000397133.jpg", "height": 427, "width": 640, "face_box": []}, 
"37777": {"img_path": "000000037777.jpg", "height": 230, "width": 352, "face_box": []}, 
"252219": {"img_path": "000000252219.jpg", "height": 428, "width": 640, "face_box": [[0.1482578125, 0.4451051401869159, 0.023296874999999995, 0.037500000000000026], [0.8494843750000001, 0.454053738317757, 0.024406250000000008, 0.04058411214953272]]},
...
}

2.2 根据字典生成labels

# get_whole_txt.py
import json
# 修改成你的路径
dict_file = './whole_face_train.json'
save_path = './whole/labels/train2017/'

# 读取字典
with open(dict_file,'r') as f:
    img_dict = json.load(f)

# 写文件
for item in img_dict.values():
    if len(item['face_box']) ==0:
        pass
    else:
        file_name = item['img_path'].split('.')[0]
        ab_fn = save_path+file_name+'.txt'
        try:
            with open(ab_fn,'w') as f: 
                for box in item['face_box']:
                    bs = [str(b) for b in box]
                    bs = ' '.join(bs)
                    f.write(f'0 {bs}\n')
        except:
            raise 'writing file failed'

2.3 根据labels生成路径txt文件

# get_whole_set_txt.py
import os 

# 修改路径
root = './whole/labels/train2017/'
lis = []
for file in os.listdir(root):
    name = file.split('.')[0]
    path = './images/train2017/'+name+'.jpg'+'\n'
    lis.append(path) 

# 这一步生成完成后把txt文件移动到和labels同一个位置
with open('./train2017.txt','w') as f:
    f.writelines(lis)

走完这三步，就可以得到如下的文件树
cache是我训练yolov8，它自动生成的，忽略它

whole
├── images -> /home/marryyoo/yolov8/datasets/coco/images
├── labels
│   ├── train2017
│   ├── train2017.cache
│   ├── val2017
│   └── val2017.cache
├── train2017.txt
└── val2017.txt

3 实验

yolov8检测任务训练就不多说了，准备一个yaml文件，配置好路径

# whole.yaml
# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: /home/marryyoo/whole  # dataset root dir
train: train2017.txt  # train images (relative to 'path') 
val: val2017.txt  # val images (relative to 'path') 
test: 

# Classes
names:
  0: face

写个一个py文件开始训练

# yolo_whole.py
from ultralytics import YOLO

# 加载模型
model = YOLO('./yolov8n.pt')  # 加载官方模型
# model = YOLO('path/to/best.pt')  # 加载自定义模型
result = model.train(data='/home/marryyoo/yolov8/whole.yaml',epochs=50)


# 验证模型
# metrics = model.predict(source='/home/marryyoo/imgs/road.jpg')  # 无需参数，数据集和设置记忆
# # results = model.predict(source='/home/marryyoo/road.jpg')
# metrics.box.map    # map50-95
# metrics.box.map50  # map50
# metrics.box.map75  # map75
# metrics.box.maps   # 包含每个类别的map50-95列表
# total = sum([param.nelement() for param in model.parameters()])
# print(total)
# print(total/1e6)%