yolov5如何运行自己的数据集_在目标检测和关键点检测任务中如何将自己的数据集转为coco格式...

最新推荐文章于 2024-07-23 23:58:48 发布

weixin_39954464

最新推荐文章于 2024-07-23 23:58:48 发布

阅读量538

点赞数

文章标签： yolov5如何运行自己的数据集制作自己的segnet数据集将图片储存在dataset 将数据移到堆中

前段时间使用RFBNet来做目标检测任务，虽然可以修改其数据准备代码顺利跑通训练流程，但是发现要测试时就没那么好搞了，索性将自己的数据集全部转换为coco格式后再进行训练和测试，以下是对coco格式的理解及如何将自己的数据集转为coco格式的介绍，主要是目标检测框和关键点两部分的数据标签如何转为coco格式。

MS COCO数据集是微软构建的一个大规模图像数据集，包含目标检测、像素分割及看图说话等任务数据，该数据集的特性如下：

COCO数据集的官网介绍：Common Objects in Context

该数据集的标注信息主要存放为JSON格式，注释类型包含五大类，分别为目标检测、关键点检测、素材分割、全景分割及看图说话，其中五大注释类型共用如下的基础数据结构如下：

json文件里存放的主要是一个大的字典信息，该字典里又包含有更低一级的字典，共用部分的详细介绍如下：

其中，每部分的具体介绍如下：

info字典段：

info{
"year" : int,   # 年份
"version" : str,   # 版本
"description" : str,   # 详细描述信息
"contributor" : str,   # 作者
"url" : str,   # 协议链接
"date_created" : datetime,  # 生成日期
}

image字典段：

image{
"id" : int,   # 图像id，可从0开始
"width" : int,   # 图像的宽
"height" : int,  # 图像的高
"file_name" : str,   # 文件名
"license" : int,   # 遵循哪个协议
"flickr_url" : str,   # flickr图片链接url
"coco_url" : str,   # COCO图片链接url
"date_captured" : datetime, # 获取数据的日期
}

license字典段：

license{
"id" : int,   # 协议id编号
"name" : str,   # 协议名
"url" : str,   # 协议链接
}

而annotation和categories字典段则根据不同的任务来组织其数据结构。

在目标检测和素材分割任务中，annotation和categories字典段的形式如下：

annotation{
"id" : int,  # 注释id编号
"image_id" : int,  # 图像id编号
"category_id" : int,  # 类别id编号
"segmentation" : RLE or [polygon],  # 分割具体数据
"area" : float,  # 目标检测的区域大小
"bbox" : [x,y,width,height],  # 目标检测框的坐标详细位置信息
"iscrowd" : 0 or 1,  # 目标是否被遮盖，默认为0
}

categories[{
"id" : int,  # 类别id编号
"name" : str,  # 类别名字
"supercategory" : str, # 类别所属的大类，如哈巴狗和狐狸犬都属于犬科这个大类
}]

注释id号和图像id号可一致。

在关键点检测任务中，annotation和categories字典段的形式如下：

annotation{
"keypoints" : [x1,y1,v1,...],  # 关键点坐标，其中V字段表示关键点属性，0表示未标注，1表示已标注但不可见，2表示已标注且可见
"num_keypoints" : int,  # 多少个关键点
"[cloned]" : ...,  # 当同时存在多个任务时其他任务格式的annotation字典段信息直接放在这个位置，不分先后组合
}

categories[{
"keypoints" : [str],  # 关键点名字，注意要与num_keypoints对应起来
"skeleton" : [edge],  # 概略描述
"[cloned]" : ...,  # 当同时存在多个任务时其他任务格式的categories字典段信息直接放在这个位置，不分先后组合
}]

在全景分割任务中，annotation和categories字典段的形式如下：

annotation{
"image_id" : int,  # 图像id编号
"file_name" : str,  # 文件名
"segments_info" : [segment_info], # 分割数据字典段列表
}

segment_info{
"id" : int, # 分割id编号
"category_id" : int,  # 类别id编号
"area" : int,  # 分割区域面积大小
"bbox" : [x,y,width,height],  # 检测框的坐标详细位置信息
"iscrowd" : 0 or 1, # 该实例是否被遮挡
}

categories[{
"id" : int,  # 类别id编号
"name" : str,  # 类别名
"supercategory" : str,  # 该类别所属的大类
"isthing" : 0 or 1,  # 是否为物体
"color" : [R,G,B],  # 分割标示的像素颜色
}]

在看图说话任务中，每张特定的图像至少有5个话题，annotation字典段的形式如下：

annotation{
"id" : int,  # 话题id编号
"image_id" : int,  # 图像id编号
"caption" : str,  # 话题的详细描述语句
}

值得注意的是，在多任务中，各个字典段是可以进行组合的，如目标检测和关键点检测这两个任务有时候需要同时训练，那么相应的字典段就需要进行相对应的组合。此外，annotation字典段中的图像id号要与image字典段里图像对应id编号保持一致，以表示注释的是哪一张图。

以下是使用RFBNet目标检测网络进行车牌检测任务时，将个人的数据标签转为coco数据格式的一个转换脚本，其中，原有的数据标签形式如下：

文件名标注框关键点类别

脚本一开始先将info、licenses及categories进行定义，然后再遍历图像信息将其转换为对应的coco格式，最终生成相应的json文件，仅供参考，可自行修改满足实践需求。

# *_* : coding: utf-8 *_*

'''
datasets process for object detection project.
for convert customer dataset format to coco data format,
'''

import traceback
import argparse
import datetime
import json
import cv2
import os

__CLASS__ = ['__background__', 'lpr']   # class dictionary, background must be in first index.

def argparser():
    parser = argparse.ArgumentParser("define argument parser for pycococreator!")
    parser.add_argument("-r", "--root_path", default="/home/andy/workspace/ccpd_300x300", help="path of root directory")
    parser.add_argument("-p", "--phase_folder", default=["ccpd_base_coco"], help="datasets path of [train, val, test]")
    parser.add_argument("-po", "--have_points", default=True, help="if have points we will deal it!")

    return parser.parse_args()

def MainProcessing(args):
    '''main process source code.'''
    annotations = {}    # annotations dictionary, which will dump to json format file.
    root_path = args.root_path
    phase_folder = args.phase_folder

    # coco annotations info.
    annotations["info"] = {
        "description": "customer dataset format convert to COCO format",
        "url": "http://cocodataset.org",
        "version": "1.0",
        "year": 2019,
        "contributor": "andy.wei",
        "date_created": "2019/01/24"
    }
    # coco annotations licenses.
    annotations["licenses"] = [{
        "url": "https://www.apache.org/licenses/LICENSE-2.0.html",
        "id": 1,
        "name": "Apache License 2.0"
    }]
    # coco annotations categories.
    annotations["categories"] = []
    for cls, clsname in enumerate(__CLASS__):
        if clsname == '__background__':
            continue
        annotations["categories"].append(
            {
                "supercategory": "object",
                "id": cls,
                "name": clsname
            }
        )
        for catdict in annotations["categories"]:
            if "lpr" == catdict["name"] and args.have_points:
                catdict["keypoints"] = ["top_left", "top_right", "bottom_right", "bottom_left"]
                catdict["skeleton"] = [[]]

    for phase in phase_folder:
        annotations["images"] = []
        annotations["annotations"] = []
        label_path = os.path.join(root_path, phase+".txt")
        filename_mapping_path = os.path.join(root_path, phase + "_" + "filename" + "_" + "mapping" + ".txt")
        images_folder = os.path.join(root_path, phase)

        fd = open(label_path, "w")
        for f in os.listdir(images_folder):
            # ff = os.path.join(images_folder, f)
            infos = f.split("-")
            pbs = []
            if len(infos) != 7:
                assert ("Error!")
            for info in infos:
                if info:
                    pbs.append(info)
            bboxtemp = pbs[2].split("_")
            bbox = bboxtemp[0].split("&") + bboxtemp[1].split("&")
            pointstemp = pbs[3].split("_")
            points = pointstemp[0].split("&") + pointstemp[1].split("&") + pointstemp[2].split("&") + pointstemp[
                3].split("&")
            bbox = [int(b) for b in bbox]
            points = [int(p) for p in points]
            line = f + " " + str(bbox[0]) + "," + str(bbox[1]) + "," + str(bbox[2]) + "," + str(bbox[3]) 
                   + " " + str(points[4]) + "," + str(points[5]) + "," + str(points[6]) + "," + str(points[7]) 
                   + "," + str(points[0]) + "," + str(points[1]) + "," + str(points[2]) + "," + str(points[3]) 
                   + " " + "0"
            fd.write(line+"n")
        fd.close()

        if os.path.isfile(label_path) and os.path.exists(images_folder):
            print("convert datasets {} to coco format!".format(phase))
            fd = open(label_path, "r")
            fd_w = open(filename_mapping_path, "w")
            step = 0
            for id, line in enumerate(fd.readlines()):
                if line:
                    label_info = line.split()

                    image_name = label_info[0]
                    bbox = [int(x) for x in label_info[1].split(",")]
                    cls = int(label_info[-1])

                    filename = os.path.join(images_folder, image_name)
                    img = cv2.imread(filename)
                    height, width, _ = img.shape
                    x1 = bbox[0]
                    y1 = bbox[1]
                    bw = bbox[2] - bbox[0]
                    bh = bbox[3] - bbox[1]

                    # coco annotations images.
                    file_name = 'COCO_' + phase + '_' + str(id).zfill(12) + '.jpg'
                    newfilename = os.path.join(images_folder, file_name)
                    os.rename(filename, newfilename)

                    filename_mapping = file_name + " " + image_name + "n"
                    fd_w.write(filename_mapping)

                    annotations["images"].append(
                        {
                            "license": 1,
                            "file_name": file_name,
                            "coco_url": "",
                            "height": height,
                            "width": width,
                            "date_captured": datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
                            "flickr_url": "",
                            "id": id
                        }
                    )
                    # coco annotations annotations.
                    annotations["annotations"].append(
                        {
                            "id": id,
                            "image_id": id,
                            "category_id": cls+1,
                            "segmentation": [[]],
                            "area": bw*bh,
                            "bbox": [x1, y1, bw, bh],
                            "iscrowd": 0,
                        }
                    )
                    if args.have_points:
                        v = 2
                        catdict = annotations["annotations"][id]
                        if "lpr" == __CLASS__[catdict["category_id"]]:
                            points = [int(p) for p in label_info[2].split(",")]
                            catdict["keypoints"] = [points[0], points[1], v, points[2], points[3], v, 
                                                    points[4], points[5], v, points[6], points[7], v]
                            catdict["num_keypoints"] = 4

                    step += 1
                    if step % 100 == 0:
                        print("processing {} ...".format(step))
            fd.close()
            fd_w.close()
        else:
            print("WARNNING: file path incomplete, please check!")

        json_path = os.path.join(root_path, phase+".json")
        with open(json_path, "w") as f:
            json.dump(annotations, f)


if __name__ == "__main__":
    print("begining to convert customer format to coco format!")
    args = argparser()
    try:
        MainProcessing(args)
    except Exception as e:
        traceback.print_exc()
    print("successful to convert customer format to coco format")

生成的json文件在linux下可通过jq进行查看，先安装jq：sudo apt-get install jq，然后运行命令cat xxx.json | jq即可查看json文件，当然也可以重定向生成到文本在进行查看，这样更方便，最终生成的COCO格式简略图如下: