前段时间使用RFBNet来做目标检测任务,虽然可以修改其数据准备代码顺利跑通训练流程,但是发现要测试时就没那么好搞了,索性将自己的数据集全部转换为coco格式后再进行训练和测试,以下是对coco格式的理解及如何将自己的数据集转为coco格式的介绍,主要是目标检测框和关键点两部分的数据标签如何转为coco格式。
MS COCO数据集是微软构建的一个大规模图像数据集,包含目标检测、像素分割及看图说话等任务数据,该数据集的特性如下:
COCO数据集的官网介绍:Common Objects in Context
该数据集的标注信息主要存放为JSON格式,注释类型包含五大类,分别为目标检测、关键点检测、素材分割、全景分割及看图说话,其中五大注释类型共用如下的基础数据结构如下:
json文件里存放的主要是一个大的字典信息,该字典里又包含有更低一级的字典,共用部分的详细介绍如下:
{
其中,每部分的具体介绍如下:
info字典段:
info{
"year" : int, # 年份
"version" : str, # 版本
"description" : str, # 详细描述信息
"contributor" : str, # 作者
"url" : str, # 协议链接
"date_created" : datetime, # 生成日期
}
image字典段:
image{
"id" : int, # 图像id,可从0开始
"width" : int, # 图像的宽
"height" : int, # 图像的高
"file_name" : str, # 文件名
"license" : int, # 遵循哪个协议
"flickr_url" : str, # flickr图片链接url
"coco_url" : str, # COCO图片链接url
"date_captured" : datetime, # 获取数据的日期
}
license字典段:
license{
"id" : int, # 协议id编号
"name" : str, # 协议名
"url" : str, # 协议链接
}
而annotation和categories字典段则根据不同的任务来组织其数据结构。
在目标检测和素材分割任务中,annotation和categories字典段的形式如下:
annotation{
"id" : int, # 注释id编号
"image_id" : int, # 图像id编号
"category_id" : int, # 类别id编号
"segmentation" : RLE or [polygon], # 分割具体数据
"area" : float, # 目标检测的区域大小
"bbox" : [x,y,width,height], # 目标检测框的坐标详细位置信息
"iscrowd" : 0 or 1, # 目标是否被遮盖,默认为0
}
categories[{
"id" : int, # 类别id编号
"name" : str, # 类别名字
"supercategory" : str, # 类别所属的大类,如哈巴狗和狐狸犬都属于犬科这个大类
}]
注释id号和图像id号可一致。
在关键点检测任务中,annotation和categories字典段的形式如下:
annotation{
"keypoints" : [x1,y1,v1,...], # 关键点坐标,其中V字段表示关键点属性,0表示未标注,1表示已标注但不可见,2表示已标注且可见
"num_keypoints" : int, # 多少个关键点
"[cloned]" : ..., # 当同时存在多个任务时其他任务格式的annotation字典段信息直接放在这个位置,不分先后组合
}
categories[{
"keypoints" : [str], # 关键点名字,注意要与num_keypoints对应起来
"skeleton" : [edge], # 概略描述
"[cloned]" : ..., # 当同时存在多个任务时其他任务格式的categories字典段信息直接放在这个位置,不分先后组合
}]
在全景分割任务中,annotation和categories字典段的形式如下:
annotation{
"image_id" : int, # 图像id编号
"file_name" : str, # 文件名
"segments_info" : [segment_info], # 分割数据字典段列表
}
segment_info{
"id" : int, # 分割id编号
"category_id" : int, # 类别id编号
"area" : int, # 分割区域面积大小
"bbox" : [x,y,width,height], # 检测框的坐标详细位置信息
"iscrowd" : 0 or 1, # 该实例是否被遮挡
}
categories[{
"id" : int, # 类别id编号
"name" : str, # 类别名
"supercategory" : str, # 该类别所属的大类
"isthing" : 0 or 1, # 是否为物体
"color" : [R,G,B], # 分割标示的像素颜色
}]
在看图说话任务中,每张特定的图像至少有5个话题,annotation字典段的形式如下:
annotation{
"id" : int, # 话题id编号
"image_id" : int, # 图像id编号
"caption" : str, # 话题的详细描述语句
}
值得注意的是,在多任务中,各个字典段是可以进行组合的,如目标检测和关键点检测这两个任务有时候需要同时训练,那么相应的字典段就需要进行相对应的组合。此外,annotation字典段中的图像id号要与image字典段里图像对应id编号保持一致,以表示注释的是哪一张图。
以下是使用RFBNet目标检测网络进行车牌检测任务时,将个人的数据标签转为coco数据格式的一个转换脚本,其中,原有的数据标签形式如下:
文件名 标注框 关键点 类别
脚本一开始先将info、licenses及categories进行定义,然后再遍历图像信息将其转换为对应的coco格式,最终生成相应的json文件,仅供参考,可自行修改满足实践需求。
# *_* : coding: utf-8 *_*
'''
datasets process for object detection project.
for convert customer dataset format to coco data format,
'''
import traceback
import argparse
import datetime
import json
import cv2
import os
__CLASS__ = ['__background__', 'lpr'] # class dictionary, background must be in first index.
def argparser():
parser = argparse.ArgumentParser("define argument parser for pycococreator!")
parser.add_argument("-r", "--root_path", default="/home/andy/workspace/ccpd_300x300", help="path of root directory")
parser.add_argument("-p", "--phase_folder", default=["ccpd_base_coco"], help="datasets path of [train, val, test]")
parser.add_argument("-po", "--have_points", default=True, help="if have points we will deal it!")
return parser.parse_args()
def MainProcessing(args):
'''main process source code.'''
annotations = {} # annotations dictionary, which will dump to json format file.
root_path = args.root_path
phase_folder = args.phase_folder
# coco annotations info.
annotations["info"] = {
"description": "customer dataset format convert to COCO format",
"url": "http://cocodataset.org",
"version": "1.0",
"year": 2019,
"contributor": "andy.wei",
"date_created": "2019/01/24"
}
# coco annotations licenses.
annotations["licenses"] = [{
"url": "https://www.apache.org/licenses/LICENSE-2.0.html",
"id": 1,
"name": "Apache License 2.0"
}]
# coco annotations categories.
annotations["categories"] = []
for cls, clsname in enumerate(__CLASS__):
if clsname == '__background__':
continue
annotations["categories"].append(
{
"supercategory": "object",
"id": cls,
"name": clsname
}
)
for catdict in annotations["categories"]:
if "lpr" == catdict["name"] and args.have_points:
catdict["keypoints"] = ["top_left", "top_right", "bottom_right", "bottom_left"]
catdict["skeleton"] = [[]]
for phase in phase_folder:
annotations["images"] = []
annotations["annotations"] = []
label_path = os.path.join(root_path, phase+".txt")
filename_mapping_path = os.path.join(root_path, phase + "_" + "filename" + "_" + "mapping" + ".txt")
images_folder = os.path.join(root_path, phase)
fd = open(label_path, "w")
for f in os.listdir(images_folder):
# ff = os.path.join(images_folder, f)
infos = f.split("-")
pbs = []
if len(infos) != 7:
assert ("Error!")
for info in infos:
if info:
pbs.append(info)
bboxtemp = pbs[2].split("_")
bbox = bboxtemp[0].split("&") + bboxtemp[1].split("&")
pointstemp = pbs[3].split("_")
points = pointstemp[0].split("&") + pointstemp[1].split("&") + pointstemp[2].split("&") + pointstemp[
3].split("&")
bbox = [int(b) for b in bbox]
points = [int(p) for p in points]
line = f + " " + str(bbox[0]) + "," + str(bbox[1]) + "," + str(bbox[2]) + "," + str(bbox[3])
+ " " + str(points[4]) + "," + str(points[5]) + "," + str(points[6]) + "," + str(points[7])
+ "," + str(points[0]) + "," + str(points[1]) + "," + str(points[2]) + "," + str(points[3])
+ " " + "0"
fd.write(line+"n")
fd.close()
if os.path.isfile(label_path) and os.path.exists(images_folder):
print("convert datasets {} to coco format!".format(phase))
fd = open(label_path, "r")
fd_w = open(filename_mapping_path, "w")
step = 0
for id, line in enumerate(fd.readlines()):
if line:
label_info = line.split()
image_name = label_info[0]
bbox = [int(x) for x in label_info[1].split(",")]
cls = int(label_info[-1])
filename = os.path.join(images_folder, image_name)
img = cv2.imread(filename)
height, width, _ = img.shape
x1 = bbox[0]
y1 = bbox[1]
bw = bbox[2] - bbox[0]
bh = bbox[3] - bbox[1]
# coco annotations images.
file_name = 'COCO_' + phase + '_' + str(id).zfill(12) + '.jpg'
newfilename = os.path.join(images_folder, file_name)
os.rename(filename, newfilename)
filename_mapping = file_name + " " + image_name + "n"
fd_w.write(filename_mapping)
annotations["images"].append(
{
"license": 1,
"file_name": file_name,
"coco_url": "",
"height": height,
"width": width,
"date_captured": datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
"flickr_url": "",
"id": id
}
)
# coco annotations annotations.
annotations["annotations"].append(
{
"id": id,
"image_id": id,
"category_id": cls+1,
"segmentation": [[]],
"area": bw*bh,
"bbox": [x1, y1, bw, bh],
"iscrowd": 0,
}
)
if args.have_points:
v = 2
catdict = annotations["annotations"][id]
if "lpr" == __CLASS__[catdict["category_id"]]:
points = [int(p) for p in label_info[2].split(",")]
catdict["keypoints"] = [points[0], points[1], v, points[2], points[3], v,
points[4], points[5], v, points[6], points[7], v]
catdict["num_keypoints"] = 4
step += 1
if step % 100 == 0:
print("processing {} ...".format(step))
fd.close()
fd_w.close()
else:
print("WARNNING: file path incomplete, please check!")
json_path = os.path.join(root_path, phase+".json")
with open(json_path, "w") as f:
json.dump(annotations, f)
if __name__ == "__main__":
print("begining to convert customer format to coco format!")
args = argparser()
try:
MainProcessing(args)
except Exception as e:
traceback.print_exc()
print("successful to convert customer format to coco format")
生成的json文件在linux下可通过jq进行查看,先安装jq:sudo apt-get install jq,然后运行命令cat xxx.json | jq即可查看json文件,当然也可以重定向生成到文本在进行查看,这样更方便,最终生成的COCO格式简略图如下: