COCO目标检测数据集的读取方法与Python工具脚本

大局观选手周弈帆

已于 2022-08-12 15:48:08 修改

阅读量2.3k

点赞数 1

分类专栏：深度学习炼金手册文章标签：目标检测 python 计算机视觉

于 2022-08-12 15:43:05 首次发布

本文链接：https://blog.csdn.net/a119334/article/details/126304213

版权

深度学习炼金手册专栏收录该内容

6 篇文章 0 订阅

订阅专栏

COCO (Common Objects in COntext) 是一个大型的图像数据集，提供了目标检测、分割、看图说话等多个任务的标签。COCO的标注文件是用json格式编写的，初次接触时需要花十来分钟熟悉一下COCO的标注格式。

本文将简明地介绍COCO目标检测数据集的读取方法，并给出可以调用的Python脚本。读取其他任务的标签时也可以借鉴这些思路。

完整代码：https://github.com/SingleZombie/DL-Demos/tree/master/dldemos/MyYOLO/load_coco.py

格式介绍

COCO的官方网站给出了标注格式的介绍。其中，目标检测的格式是这样的：

{
	"info": info, 
	"images": [image], 
	"annotations": [annotation], 
	"licenses": [license],
}

info{
	"year": int, 
	"version": str, 
	"description": str, 
	"contributor": str, 
	"url": str, 
	"date_created": datetime,
}

image{
	"id": int, 
	"width": int, 
	"height": int, 
	"file_name": str, 
	"license": int, 
	"flickr_url": str, 
	"coco_url": str, 
	"date_captured": datetime,
}

license{
	"id": int, 
	"name": str, 
	"url": str,
}

annotation{
	"id": int, 
	"image_id": int, 
	"category_id": int, 
	"segmentation": RLE or [polygon], 
	"area": float, 
	"bbox": [x,y,width,height], 
	"iscrowd": 0 or 1,
}

categories[{
	"id": int, 
	"name": str, 
	"supercategory": str,
}]

这里面很多信息是冗余的。我们主要关心图像的信息和检测框的信息。假设该json文件读取进来后叫做root，我们就可以用root['images']获取图像信息的列表。在列表中，每一条图像信息的主要属性有：

image{
	"id": int, 
	"width": int, 
	"height": int, 
	"file_name": str, 
}

id是用来唯一标记一张图片的。之后我们需要根据这个id把图像与其标签绑定起来。其他三个属性都是常见的图像属性。

我们可以用root['annotation']获取标注信息的列表。每一条标注信息表示一个物体的检测信息，一幅图可能有多条检测信息。在目标检测中，我们主要关注检测框位置、大小、类别这几个信息。因此，我们最终要关注的属性有：

annotation{
	"id": int, 
	"image_id": int, 
	"category_id": int, 
	"bbox": [x,y,width,height], 
}

image_id与之前图像的id对应。我们可以根据这个域把图像和标注绑定起来。category_id可以与下文介绍的分类类别信息绑定起来。bbox则标记出了每一个检测框的位置和大小。’

最后，我们要获取每个类别id对应的类别名，方便检验目标检测的结果是否正确。类别信息的列表可以由root['categories']获得，每一条记录的格式为：

categories[{
	"id": int, 
	"name": str, 
	"supercategory": str,
}]

其中，id与前文categroy_id对应。name是具体的类别，supercategory是大类。一般我们只关注name就行。

读取脚本

知道了数据格式的原理，就可以写脚本读取它们了。

首先介绍一下Python的json库的用法。使用该库时，要先import json，之后用json.load(fp)就可以读取一个被打开的json文件指针fp了。以下是一个示例，路径请根据实际情况自行更改。

import json

def print_json():
    with open('data/coco/annotations/instances_val2014.json') as fp:
        root = json.load(fp)

打开了文件后，我们可以输出一些信息，熟悉一下json的API，同时具体查看一下COCO标注文件的格式。

import json

def print_json():
    with open('data/coco/annotations/instances_val2014.json') as fp:
        root = json.load(fp)
    print('info:')
    print(root['info'])
    print('categories:')
    print(root['categories'])
    print('Length of images:', len(root['images']))
    print(root['images'][0])
    print('Length of annotations:', len(root['annotations']))
    print(root['annotations'][0])

def main():
    print_json()

if __name__ == '__main__':
    main()

json库能以Python词典的形式访问json的对象，以列表的形式访问json的数组。这段代码中，root是文件的根节点。root['info']就是根节点的info属性的对象。root['categories'], root['images'], root['annotations']分别是类别、图像、标注信息的列表。

运行这个脚本，我的输出大概是：

info:
{'description': 'COCO 2014 Dataset', 'url': 'http://cocodataset.org', 'version': '1.0', 'year': 2014, 'contributor': 'COCO Consortium', 'date_created': '2017/09/01'}
categories:
[{'supercategory': 'person', 'id': 1, 'name': 'person'} ...
Length of images: 40504
{'license': 3, 'file_name': 'COCO_val2014_000000391895.jpg', 'coco_url': 'http://images.cocodataset.org/val2014/COCO_val2014_000000391895.jpg', 'height': 360, 'width': 640, 'date_captured': '2013-11-14 11:18:45', 'flickr_url': 'http://farm9.staticflickr.com/8186/8119368305_4e622c8349_z.jpg', 'id': 391895}
Length of annotations: 291875
{'segmentation': [[239.97, 260.24, 222.04, 270.49, 199.84, 253.41, 213.5, 227.79, 259.62, 200.46, 274.13, 202.17, 277.55, 210.71, 249.37, 253.41, 237.41, 264.51, 242.54, 261.95, 228.87, 271.34]], 'area': 2765.1486500000005, 'iscrowd': 0, 'image_id': 558840, 'bbox': [199.84, 200.46, 77.71, 70.88], 'category_id': 58, 'id': 156}

接下来，我们可以编写一个读取图像文件名及其对应检测框的函数。利用这个函数，我们就可以构建一个目标检测项目的数据集了。

def load_img_ann(ann_path='data/coco/annotations/instances_val2014.json'):
    """return [{img_name, [{x, y, h, w, label}]}]"""
    with open(ann_path) as fp:
        root = json.load(fp)
    img_dict = {}
    for img_info in root['images']:
        img_dict[img_info['id']] = {'name': img_info['file_name'], 'anns': []}
    for ann_info in root['annotations']:
        img_dict[ann_info['image_id']]['anns'].append(
            ann_info['bbox'] + [ann_info['category_id']])

    return img_dict

在这个函数中，我们想构造一个词典img_dict。它的key是图像id，value是一个属性词典。属性词典的格式是:

{
	'name': ...,
	'anns': [[x, y, w, h, label], ...]
}

图像文件名name可以从root['images']里获取，标注信息可以从root['annotations']里获取。跑两个循环，取出对应的信息，把信息组合一下塞入词典即可。

注意！ COCO 2014里的category id不是连续的。看上去category id最多到了90，但实际上一共只有80个类别。在项目中，还应该自己写一层0-79到categroy id的映射。

在自己的目标检测项目中，直接调load_img_ann(ann_path)就行了。根据具体的目标检测算法，再进一步预处理检测框。

可视化验证

我自己的项目里有一个可视化bbox的函数draw_bbox。为了验证读取数据集的函数是否正确，我还写了一个可视化COCO标签的函数。整个脚本如下：


import json
import os

def load_img_ann(ann_path='data/coco/annotations/instances_val2014.json'):
    """return [{img_name, [ (x, y, h, w, label), ... ]}]"""
    with open(ann_path) as fp:
        root = json.load(fp)
    img_dict = {}
    for img_info in root['images']:
        img_dict[img_info['id']] = {'name': img_info['file_name'], 'anns': []}
    for ann_info in root['annotations']:
        img_dict[ann_info['image_id']]['anns'].append(
            ann_info['bbox'] + [ann_info['category_id']])

    return img_dict


def show_img_ann(img_info):
    from PIL import Image
    from dldemos.nms.show_bbox import draw_bbox
    print(img_info)

    with open('data/coco/annotations/instances_val2014.json') as fp:
        root = json.load(fp)
    categories = root['categories']
    category_dict = {int(c['id']): c['name'] for c in categories}

    img_path = os.path.join('data/coco/val2014', img_info['name'])
    img = Image.open(img_path)
    for ann in img_info['anns']:
        x, y, w, h = ann[0:4]
        x1, y1, x2, y2 = x, y, x + w, y + h
        draw_bbox(img, (x1, y1, x2, y2), 1.0, text=category_dict[ann[4]])

    img.save('work_dirs/tmp.jpg')


def main():
    img_dict = load_img_ann()
    keys = list(img_dict.keys())
    show_img_ann(img_dict[keys[1]])


if __name__ == '__main__':
    main()