COCO数据集整理及使用

最新推荐文章于 2025-03-07 06:21:19 发布

loovelj

最新推荐文章于 2025-03-07 06:21:19 发布

阅读量5.6k

点赞数 3

分类专栏：目标检测 coco

本文链接：https://blog.csdn.net/loovelj/article/details/108733426

版权

目标检测同时被 2 个专栏收录

1 篇文章

订阅专栏

coco

1 篇文章

订阅专栏

1、简介

COCO数据集是微软团队获取的一个可以用来图像recognition+segmentation+captioning 数据集，其官方说明网址：http://mscoco.org/。引用

首先看COCO数据集的构造,COCO数据集使用:

2、数据结构

1、通用字段

info{
    "year"            : int,    # 数据集年份号
    "version"         : str,    # 数据集版本
    "description"     : str,    # 数据集描述
    "contributor"     : str,    # 贡献者
    "url"             : str,    # 数据集官方网址
    "date_created"    : datetime,    # 数据集创建详细时间
}

image{           # 这个是图像对应的字典数
    "id"               : int,     # 图像id
    "width"            : int,     # 图像宽度
    "height"           : int,     # 图像高度
    "file_name"        : str,     # 图像文件名
    "license"          : int,     # 许可证
    "flickr_url"       : str,     # flickr链接
    "coco_url"         : str,     # coco链接
    "date_captured"    : datetime,    # 拍摄时间
}

license{
    "id"     : int,    # license的编号，1-8
    "name"   : str,    # 许可证名称
    "url"    : str,    # 许可证网址
}

对于目标检测的字段为：

annotation{   这个是框的个数，对应的字典数
    "id"            : int,    # annotation的id，每个对象对应一个annotation
    "image_id"      : int,     # 该annotation的对象所在图片的id
    "category_id"   : int,     # 类别id，每个对象对应一个类别
    "segmentation"  : RLE or [polygon],
    "area"          : float,     # 面积
    "bbox"          : [x,y,width,height],     # x,y为左上角坐标
    "iscrowd"       : 0 or 1,    # 0时segmentation为REL，1为polygon
}

categories[{    #这个是定义类别的，比如80类，就80个字典
    "id"　　　　　　　　 : int,    # 类别id
    "name"            : str,     # 类别名称
    "supercategory"　　: str,    # 类别的父类，例如：bicycle的父类是vehicle
}]

3、使用pycocotools脚本读取

1、初始化
annFile='./annotations/instances_2017.json'
# initialize COCO api for instance annotations
coco = COCO(annFile)
#loading annotations into memory…
#Done (t=4.19s)
#creating index…
#index created!

它里面包括了把每个图片的框收集到一起，比如这张图，你调用的时候，就会有
``json
{ 289343: [
{‘id’: 1768, ‘area’: 702.1057499999998,
‘segmentation’: [[510.66, … 510.45, 423.01]],
‘bbox’: [473.07, 395.93, 38.65, 28.67],
‘category_id’: 18,
‘iscrowd’: 0,
‘image_id’: 289343
},
…
]
}

一般获取的流程是根据条件获得 index 数组，然后用load函数获的最终信息
在这里插入图片描述
例如加载指定图片：

# get all images containing given categories, select one at random
catIds = coco.getCatIds(catNms=['person','dog','skateboard']);
imgIds = coco.getImgIds(catIds=catIds );
imgIds = coco.getImgIds(imgIds = [324158])
// loadImgs() 返回的是只有一个元素的列表, 使用[0]来访问这个元素
// 列表中的这个元素又是字典类型, 关键字有: ["license", "file_name", 
//  "coco_url", "height", "width", "date_captured", "id"]
img = coco.loadImgs(imgIds[np.random.randint(0,len(imgIds))])[0]

# 加载并显示图片,可以使用两种方式: 1) 加载本地图片, 2) 在线加载远程图片
# 1) 使用本地路径, 对应关键字 "file_name"
# I = io.imread('%s/images/%s/%s'%(dataDir,dataType,img['file_name']))  

# 2) 使用 url, 对应关键字 "coco_url"
I = io.imread(img['coco_url'])        
plt.axis('off')
plt.imshow(I)
plt.show()