coco 关键点json文件解析

最新推荐文章于 2023-01-11 16:59:56 发布

言希0127

最新推荐文章于 2023-01-11 16:59:56 发布

阅读量6.9k

点赞数 1

分类专栏：深度学习

深度学习专栏收录该内容

15 篇文章 0 订阅

订阅专栏

COCO是微软提供的一个图像识别的数据集。其中包括3个tasks，分别是object instances, object keypoints, 和image captions，存储格式为JSON。

基本的数据结构定义如下：

{
"info" : info, "images" : [image], "annotations" : [annotation], "licenses" : [license],
}
 
info{
"year" : int, "version" : str, "description" : str, "contributor" : str, "url" : str, "date_created" : datetime,
}
 
image{
"id" : int, "width" : int, "height" : int, "file_name" : str, "license" : int, "flickr_url" : str, "coco_url" : str, "date_captured" : datetime,
}
 
license{
"id" : int, "name" : str, "url" : str,
}

其中images和licenses是包含多个实例的数组。

三个tasks的格式相近，这里以一段object keypoints的实例代码为例进行解析。

{
"info" : info, "images" : [image], "annotations" : [annotation], "licenses" : [license],
}
"info":{
    "description":"This is stable 1.0 version of the 2014 MS COCO dataset.",
    "url":"http:\/\/mscoco.org",
    "version":"1.0","year":2014,
    "contributor":"Microsoft COCO group",
    "date_created":"2015-01-27 09:11:52.357475"
},
"image":{
    "license":3,
    "file_name":"COCO_val2014_000000391895.jpg",
    "coco_url":"http:\/\/mscoco.org\/images\/391895",
    "height":360,"width":640,"date_captured":"2013-11-14 11:18:45",
    "flickr_url":"http:\/\/farm9.staticflickr.com\/8186\/8119368305_4e622c8349_z.jpg",
    "id":391895
},
"licenses":{
    "url":"http:\/\/creativecommons.org\/licenses\/by-nc-sa\/2.0\/",
    "id":1,
    "name":"Attribution-NonCommercial-ShareAlike License"
},
"annotations":{
    "segmentation": [[125.12,539.69,140.94,522.43...]],
    "num_keypoints": 10,
    "area": 47803.27955,
    "iscrowd": 0,
    "keypoints": [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,142,309,1,177,320,2,191,398...],
    "image_id": 425226,"bbox": [73.35,206.02,300.58,372.5],"category_id": 1,
    "id": 183126
},
"category":{
    "supercategory": "person",
    "id": 1,
    "name": "person",
    "keypoints": ["nose","left_eye","right_eye","left_ear","right_ear","left_shoulder","right_shoulder","left_elbow","right_elbow","left_wrist","right_wrist","left_hip","right_hip","left_knee","right_knee","left_ankle","right_ankle"],
    "skeleton": [[16,14],[14,12],[17,15],[15,13],[12,13],[6,12],[7,13],[6,7],[6,8],[7,9],[8,10],[9,11],[2,3],[1,2],[1,3],[2,4],[3,5],[4,6],[5,7]]
}

先看annotation字段。segmentation格式取决于这个实例是一个单个的对象（即iscrowd=0，将使用polygons格式）还是一组对象（即iscrowd=1，将使用RLE格式）。单个的对象（iscrowd=0)可能需要多个polygon来表示，比如这个对象在图像中被挡住了。而iscrowd=1时（将标注一组对象，比如一群人）的segmentation使用的就是RLE格式。而只要是iscrowd=0那么segmentation就是polygon格式；只要iscrowd=1那么segmentation就是RLE格式。另外，每个对象（不管是iscrowd=0还是iscrowd=1）都会有一个矩形框bbox ，矩形框左上角的坐标和矩形框的长宽会以数组的形式提供，数组第一个元素就是左上角的横坐标值。area是area of encoded masks，是标注区域的面积。如果是矩形框，那就是高乘宽；如果是polygon或者RLE，那就复杂点。

keypoints是一个长度为3*k的数组，其中k是category中keypoints的总数量。每一个keypoint是一个长度为3的数组，第一和第二个元素分别是x和y坐标值，第三个元素是个标志位v，v为0时表示这个关键点没有标注（这种情况下x=y=v=0），v为1时表示这个关键点标注了但是不可见（被遮挡了），v为2时表示这个关键点标注了同时也可见。num_keypoints表示这个目标上被标注的关键点的数量（v>0），比较小的目标上可能就无法标注关键点。

再看category字段，存储的是当前对象所属的category的id，以及所属的supercategory的name（在object keypoint任务中只有一类，即person）。keypoints是一个长度为k的数组，包含了每个关键点的名字；skeleton定义了各个关键点之间的连接性（比如人的左手腕和左肘就是连接的，但是左手腕和右手腕就不是）。

注：人体关键点检测任务，COCO中有18个关键点，而MPI有15个。COCO关键点顺序如下：

References:

http://cocodataset.org/#format-data

https://zhuanlan.zhihu.com/p/29393415

https://blog.csdn.net/u010925447/article/details/77411335

言希0127

关注

1
点赞
踩
8

收藏

觉得还不错? 一键收藏
3
评论
coco 关键点json文件解析

COCO是微软提供的一个图像识别的数据集。其中包括3个tasks，分别是object instances, object keypoints, 和image captions，存储格式为JSON。基本的数据结构定义如下：{"info" : info, "images" : [image], "annotations" : [annotation], "licenses" : [lice...
复制链接

扫一扫

专栏目录