从COCO等公开数据集自制数据集，实现YOLOv5等目标检测模型训练

Neo_YH

已于 2022-07-18 10:42:38 修改

阅读量2.4k

点赞数 8

文章标签：目标检测深度学习计算机视觉人工智能 python

于 2022-06-15 16:43:14 首次发布

本文链接：https://blog.csdn.net/Mirage_nd/article/details/125299276

版权

从COCO等公开数据集自制数据集，实现YOLOv5等目标检测模型训练

初衷

每次训练自制数据集时总要写个小脚本，长此以往不如系统地整理一个吧。该脚本可从公开数据集抽取自己想要目标训练，例如想进行人的检测，可从COCO等各种带txt格式数据集中，把人单独抽出来，拼接成一个更大的人的数据集，实现单目标检测的训练。该方法同样支持自由组合多种标签训练。

COCO数据集txt格式的label下载

链接: https://github.com/ultralytics/yolov5/releases/download/v1.0/coco2017labels.zip

Python脚本

如果tqdm已安装，会以进度条形式输出，未安装则显示百分比
依据注释更改对应路径，组织文件夹结构即可使用

import sys
import os
try:
    from tqdm import tqdm
    module_found = True
except:
    module_found = False

def label_txt(dataset):
    if dataset == "train":
        folder_official = "D:/Data/2D/database/yolov5/coco/labels_default/" # change the path according to your original dataset such as coco2017
        folder_custom = "D:/Data/2D/database/yolov5/coco/labels/" # change the path according to the dataset you want to generate
        folder_type = "train2017" #
    if dataset == "val":
        folder_official = "D:/Data/2D/database/yolov5/coco/labels_default/" # change the path according to your original dataset such as coco2017
        folder_custom = "D:/Data/2D/database/yolov5/coco/labels/" # change the path according to the dataset you want to generate
        folder_type = "val2017" #

    files = list(os.listdir(folder_official+folder_type))
    print(files)
    cnt_obj = 0
    all_files = len(files)
    cnt_files = 0

    if module_found:
        process_bar = tqdm(total=all_files)
        process_bar.set_description('Processing:')
    for file in files:
        custom_label = [] 
        open_txt = False
        if module_found:
            process_bar.update(1)
        else:
            cnt_files += 1
            progress = str("%.2f" % (cnt_files/all_files*100))
            sys.stdout.write('\r'+ f"Progress:{progress}%")

        with open(os.path.join(folder_official+folder_type, file), 'r') as f:
            strs = [x.split() for x in f.read().strip().splitlines()]
            for single_line in strs:
                if single_line[0] == '0': # change "0" to the label you want; modifying the codes to add labels accordingly; "0" is the label of "person" in coco2017
                    custom_label.append(single_line)
                    cnt_obj += 1
                    open_txt = True
            f.close()
        if open_txt == True:
            with open(os.path.join(folder_custom+folder_type, file), 'w') as fp:
                for line in custom_label:
                    newline = " ".join(line) + "\n"
                    fp.writelines(newline)
                fp.close()
            
    print(f'\nnumber of target:{cnt_obj}')

label_txt("train")
label_txt("val")

def whole_txt(dataset):
    if dataset == "train":
        folder = "train2017"
        files = list(os.listdir(folder))
        with open('../train2017_person.txt', 'w') as ftrain:
            for file in files:
                line = './images/train2017/' + file.replace("txt", "jpg") + '\n'
                ftrain.writelines(line)
            ftrain.close()
    
    if dataset == "val":
        folder = "val2017"
        files = list(os.listdir(folder))
        with open('../val2017_person.txt', 'w') as fval:
            for file in files:
                line = './images/val2017/' + file.replace("txt", "jpg") + '\n'
                fval.writelines(line)
            fval.close()

whole_txt("train")
whole_txt("val")

import shutil
def create_img_folder():
    img_folder = "D:/Data/2D/database/yolov5/coco/images/train2017/"
    new_img_folder = "D:/Data/2D/database/yolov5/coco_person/images/train2017/"
    folder = "train2017"
    files = list(os.listdir(folder))
    all_files = len(files)
    if not os.path.exists(new_img_folder):
        os.makedirs(new_img_folder)
    if module_found:
        process_bar = tqdm(total=all_files)
        process_bar.set_description('Processing:')
    for file in files:
        process_bar.update(1)
        img = img_folder + file.replace("txt", "jpg")
        shutil.copy(img, new_img_folder)

# create_img_folder()

在这里插入图片描述

B站小教程

提供该自制数据集用于YOLOv5训练的演示
https://www.bilibili.com/video/BV15r4y1u7T3?spm_id_from=333.999.0.0&vd_source=addcaa01f75ba41a03ef7559c37453ab

Github

可参考该结构组织文件夹
Github链接: https://github.com/Neo-YH/EasyToolforYourCustomDataset

Neo_YH

关注

8
点赞
踩
33

收藏

觉得还不错? 一键收藏
0
评论
从COCO等公开数据集自制数据集，实现YOLOv5等目标检测模型训练

每次训练自制数据集时总要写个小脚本，长此以往不如系统地整理一个吧。该脚本可从公开数据集抽取自己想要目标训练，例如想进行人的检测，可从COCO等各种带txt格式数据集中，把人单独抽出来，拼接成一个更大的人的数据集，实现单目标检测的训练。该方法同样支持自由组合多种标签训练。............
复制链接

扫一扫