YOLOv5实例分割_yolov5分割数据集格式-CSDN博客

本文链接：https://blog.csdn.net/qq_53545309/article/details/137467570

该博客详细介绍了如何进行YOLOv5的实例分割，包括数据集的准备、训练与测试、检测过程以及评价指标。内容涵盖数据集的标签解析、格式转换，特别是从COCO格式转为TXT标签文件，还提供了训练脚本(train.py)和检测脚本(detect.py)的说明。此外，讨论了关键性能指标如P、R、mAP@.5和mAP@.5:95，帮助评估模型效果。

摘要由CSDN通过智能技术生成

一，准备工作

1.1 标签数据解释：

1.2 数据集格式转换方法汇总

图片和JSON在一个文件夹的形式，通过下面的代码会再相同文件夹下生成对应的txt文件

方式2：

二，训练、测试、检测

一，准备工作

用conda创建自己的环境

安装项目路径下的requirements.txt

数据集准备：coco128-seg

1.1 标签数据解释：

最前面的是类别信息的索引

后面每两个数字代表一个点对于整张图像的相对位置

每一行代表图像中的一个mask

可视化标签数据：

1.2 数据集格式转换方法汇总

方式1：

通过labelme自己标注生成一个个的小json文件，这样你就可以得到全部图片对应的json文件了，然后将图片放在一个文件夹，所有的标注信息放在一个文件夹；或者所有图片和JSON再一个文件夹的形式。然后我们通过下面的代码将其转换为TXT标签文件。

import glob
import numpy as np
import json
import os
import cv2

# 根据原图和JSON格式的标签文件生成对应的YOLO的TXT标签文件保存到json_path路径下（保存文件名字和原来文件的名字一样，后缀换成txt）
json_path = r"./labelme/train2014" # 原始的JSON标签文件
TXT_path = r"./labelme/TXT_file" # 保存的TXT文件夹
image_path = r"./images/" # 原图
label_dict = {'mat': 0, 'class 2': 1, 'class 3': 2} # 类别情况
json_files = glob.glob(json_path + "/*.json")
for json_file in json_files:
    f = open(json_file)
    json_info = json.load(f)
    img = cv2.imread(os.path.join(image_path, json_info["imagePath"][0]))
    height, width, _ = img.shape
    np_w_h = np.array([[width, height]], np.int32)
    txt_file = json_file.split("\\")[-1].replace(".json", ".txt")
    txt_file = os.path.join(TXT_path, txt_file)
    f = open(txt_file, "a")
    for point_json in json_info["shapes"]:
        txt_content = ""
        np_points = np.array(point_json["points"], np.int32)
        label = point_json["label"]
        label_index = label_dict.get(label, None)
        np_points = np.array(point_json["points"], np.int32)
        norm_points = np_points / np_w_h
        norm_points_list = norm_points.tolist()
        txt_content += f"{label_index} " + " ".join([" ".join([str(cell[0]), str(cell[1])]) for cell in norm_points_list]) + "\n"
        f.write(txt_content)

图片和JSON在一个文件夹的形式，通过下面的代码会再相同文件夹下生成对应的txt文件

import os, cv2, json
import numpy as np

classes = ['square', 'triangle'] # 修改成对应的类别

base_path = '../dataset/labelme_dataset' # 指定json和图片的位置
path_list = [i.split('.')[0] for i in os.listdir(base_path)]
for path in path_list:
    image = cv2.imread(f'{base_path}/{path}.jpg')
    h, w, c = image.shape
    with open(f'{base_path}/{path}.json') as f:
        masks = json.load(f)['shapes']
    with open(f'{base_path}/{path}.txt', 'w+') as f:
        for idx, mask_data in enumerate(masks):
            mask_label = mask_data['label']
            if '_' in mask_label:
                mask_label = mask_label.split('_')[0]
            mask = np.array([np.array(i) for i in mask_data['points']], dtype=np.float)
            mask[:, 0] /= w
            mask[:, 1] /= h
            mask = mask.reshape((-1))
            if idx != 0:
                f.write('\n')
            f.write(f'{classes.index(mask_label)} {" ".join(list(map(lambda x:f"{x:.6f}", mask)))}')

方式2：

如果我们下载的数据集是COCO格式的，只有一个很大的JSON文件，然后还有对应的图片文件，这时候我们就需要将大的JSON文件转换一个个小的JSON文件，然后再按照方式1的方法来进行处理。转换代码如下：

import json
import os

def coco_to_labelme(coco_file, output_dir):
    with open(coco_file, 'r') as f:
        data = json.load(f)

    images = data['images']
    annotations = data['annotations']
    categories = {category['id']: category['name'] for category in data['categories']}

    for image in images:
        image_id = image['id']
        image_file = image['file_name']
        print(image['file_name'].rsplit('\\', 1))
        # dir, image_file_1 = image['file_name'].rsplit('\\', 1) # 如果包含路径则需要通过这种方式获取文件名
        image_file_1 = image['file_name'].rsplit('\\', 1)
        image_width = image['width']
        image_height = image['height']

        labelme_data = {
            "version": "5.0.1",
            "flags": {},
            "shapes": [],
            "imagePath": image_file_1,
            "imageData": None,
            "imageHeight": image_height,
            "imageWidth": image_width
        }

        for annotation in annotations:
            if annotation['image_id'] == image_id:
                category_id = annotation['category_id']
                category_name = categories[category_id]
                bbox = annotation['bbox']
                segmentation = annotation['segmentation'][0]

                # Convert segmentation to polygon format
                polygon = []
                for i in range(0, len(segmentation), 2):
                    x = segmentation[i]
                    y = segmentation[i + 1]
                    polygon.append([x, y])

                shape_data = {
                    "label": category_name,
                    "points": polygon,
                    "group_id": None,
                    "shape_type": "polygon",
                    "flags": {}
                }

                labelme_data['shapes'].append(shape_data)

        image_name = os.path.splitext(os.path.basename(image_file))[0]
        labelme_output_file = os.path.join(output_dir, image_name + '.json')

        with open(labelme_output_file, 'w') as f:
            json.dump(labelme_data, f, indent=4)

        print(f"Converted {image_file} to {labelme_output_file}")

# 使用示例
coco_file = r'annotations/instances_train2014.json' # 这里是原始的COCO大JSON文件
output_dir = r'labelme/train2014' # 这里是保存的位置

coco_to_labelme(coco_file, output_dir)

1.3 数据集验证

转换之后我们就可以得到一个原始图片文件夹，还有就是用于YOLO训练的TXT标签文件夹，这时候我们应该先考虑标签转换的正确性，如果不正确后续的训练肯定是有问题的，验证代码如下:

针对单个图片和标签可视化

import cv2
import numpy as np

# 只需要给定图片和txt标签文件即可（单独的）
pic_path = r"./images/2023060111212345_11.jpg"
txt_path = r"./labelme/TXT_file/2023060111212345_11.txt"
 
img = cv2.imread(pic_path)
img0 = img.copy()
height, width, _ = img.shape
 
file_handle = open(txt_path)
cnt_info = file_handle.readlines()
new_cnt_info = [line_str.replace("\n", "").split(" ") for line_str in cnt_info]
 
color_map = [(0, 255, 255), (255, 0, 255), (255, 255, 0)]
for new_info in new_cnt_info:
    s = []
    for i in range(1, len(new_info), 2):
        b = [float(tmp) for tmp in new_info[i:i + 2]]
        s.append([int(b[0] * width), int(b[1] * height)])
    class_ = new_info[0]
    index = int(class_)
    cv2.polylines(img, [np.array(s, np.int32)], True, color_map[index], thickness = 3)

img = cv2.resize(img, (800,416))
img0 = cv2.resize(img0, (800,416))

cv2.imshow('ori', img0)
cv2.imshow('result', img)
cv2.waitKey(0)

针对文件夹下的多个图片和标签可视化

import cv2
import numpy as np
import glob
 
# 只需要给定图片文件夹和txt标签文件夹即可
pic_path = r"./images/"
txt_path = r"./labelme/TXT_file/"
 
pic = glob.glob(pic_path + "*.jpg")
 
for pic_file in pic:
    img = cv2.imread(pic_file)
    # print("***:",pic_file)
    substrings = pic_file.split('/')
    substrings = substrings[-1].split('.')
    # print("***:",substrings)
    num=substrings[0].split("\\")[1]
    height, width, _ = img.shape
    txt_file = txt_path + num + ".txt"
    file_handle = open(txt_file)
    cnt_info = file_handle.readlines()
    print("***:",cnt_info)
    new_cnt_info = [line_str.replace("\n", "").split(" ") for line_str in cnt_info]
    # print("***:",new_cnt_info)
    color_map = [(0, 255, 255), (255, 0, 255), (255, 255, 0)]
    for new_info in new_cnt_info:
        s = []
        for i in range(1, len(new_info), 2):
            b = [float(tmp) for tmp in new_info[i:i + 2]]
            s.append([int(b[0] * width), int(b[1] * height)])
        class_ = new_info[0]
        index = int(class_)
        cv2.polylines(img, [np.array(s, np.int32)], True, color_map[index], thickness = 3)
 
    save_path = 'labelme/all/' + num + '.jpg'
    # cv2.imwrite(save_path, img)
    img = cv2.resize(img, (800,416))
    cv2.imshow("{}".format(num), img)
    cv2.waitKey(0)

1.4 数据集的划分

经过上面的操作，我们的数据集转换是没有问题的，但是我们还不能直接用于网络的训练，需要划分数据集，这时候就需要通过下面的代码操作（只需要指定原图和标签TXT图片的位置，还有保存的目标位置）：

import os, shutil, random
import numpy as np

TXT_path = 'labelme/TXT_file' # 原TXT文件
Image_path = 'images' # 原图片文件
dataset_path = 'dataset/custom_dataset' # 保存的目标位置
val_size, test_size = 0.1, 0.2

os.makedirs(dataset_path, exist_ok=True)
os.makedirs(f'{dataset_path}/images', exist_ok=True)
os.makedirs(f'{dataset_path}/images/train', exist_ok=True)
os.makedirs(f'{dataset_path}/images/val', exist_ok=True)
os.makedirs(f'{dataset_path}/images/test', exist_ok=True)
os.makedirs(f'{dataset_path}/labels/train', exist_ok=True)
os.makedirs(f'{dataset_path}/labels/val', exist_ok=True)
os.makedirs(f'{dataset_path}/labels/test', exist_ok=True)

path_list = np.array([i.split('.')[0] for i in os.listdir(TXT_path) if 'txt' in i])
random.shuffle(path_list)
train_id = path_list[:int(len(path_list) * (1 - val_size - test_size))]
val_id = path_list[int(len(path_list) * (1 - val_size - test_size)):int(len(path_list) * (1 - test_size))]
test_id = path_list[int(len(path_list) * (1 - test_size)):]

for i in train_id:
    shutil.copy(f'{Image_path}/{i}.jpg', f'{dataset_path}/images/train/{i}.jpg')
    shutil.copy(f'{TXT_path}/{i}.txt', f'{dataset_path}/labels/train/{i}.txt')

for i in val_id:
    shutil.copy(f'{Image_path}/{i}.jpg', f'{dataset_path}/images/val/{i}.jpg')
    shutil.copy(f'{TXT_path}/{i}.txt', f'{dataset_path}/labels/val/{i}.txt')

for i in test_id:
    shutil.copy(f'{Image_path}/{i}.jpg', f'{dataset_path}/images/test/{i}.jpg')
    shutil.copy(f'{TXT_path}/{i}.txt', f'{dataset_path}/labels/test/{i}.txt')

二，训练、测试、检测

2.1 train.py

weigeht是预训练模型，可以使用自己的，也可以下载官方的。

data是数据集格式，改成刚才创建的yaml名称。

hyp是数据增强，可以按照需求在文件内自行增改。

epochs为训练轮数，训练中会自动保存最好与最后一次的模型参数。

batch--size为一次训练的图片个数，配置够的话可以增加。

imgsz等为训练时的图像重置大小。

2.2 detect.py

weight修改为训练得到的自己的权重，默认位置在run/train-seg/exp/weight下。

source为需要预测的图像文件夹。

data设置与train相同。

imgsz建议使用默认的640，数值增大，结果可能会更精细，但是检测框可能会不够大。

conf-thres为置信阈值，只有检测框的概率高于阈值时，才会被留下。

iou-thres为交并比阈值，在进行NMS（非极大值抑制）时，超过阈值的检测框会被删除。

max-det为一张图中检测框存在的最大数量，因为个人需要，我会设置为1。

save-txt会将分割结果按照YOLOv5的格式保存为txt文件，可以通过txt文件再转换为需要的mask。

save-crop会将检测框内部图像截图保存。

如果不加别的处理，原始图像经过predict.py后，会得到一张实例分割的图像。

可视化：

parser.add_argument('--view-img', default='True', help='show results')

三，评价指标

主要看mask的P、R、mAP@.5和mAP@.5:95。

P（Precision）：在图像分割中，P表示预测的像素被正确分类为目标的比例。换句话说，P衡量了模型对于像素级别分割的准确性。
R（Recall）：在图像分割中，R表示实际为目标的像素被正确预测为目标的比例。换句话说，R衡量了模型对于像素级别分割的召回率。
mAP@.5：在图像分割中，mAP@.5是指在IoU阈值为0.5时的平均精度。mAP是对P-R曲线下的面积进行计算，用于综合评估模型的性能。mAP@.5衡量了模型在像素级别分割中的平均准确性。
mAP@.5:95：在图像分割中，mAP@.5:95是指在IoU阈值从0.5到0.95变化时的平均精度。与mAP@.5相比，mAP@.5:95更全面地评估了模型在不同IoU阈值下的性能，对于更严格的分割要求提供了评估。