目标检测三大数据格式VOC,YOLO,COCO的详细介绍

xiaobai_Ry

已于 2023-02-19 12:24:23 修改

阅读量8.8k

点赞数 23

分类专栏：目标检测学习笔记文章标签：目标检测 YOLO 数据转换数据介绍 COCO

于 2023-02-19 12:18:15 首次发布

本文链接：https://blog.csdn.net/qq_41895003/article/details/129109057

版权

目标检测学习笔记专栏收录该内容

22 篇文章

订阅专栏

本文介绍了目标检测领域的几个常见数据集，包括VOC数据集的XML格式，COCO数据集的JSON格式，以及YOLO数据集的TXT格式。VOC包含20类对象，COCO则有80类，并且COCO数据集的挑战更大，更符合实际场景。文章还提到了不同格式之间的转换方法，并提供了相关博客资源。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

注：本文仅供学习，未经同意请勿转载

说明：该博客来源于xiaobai_Ry:2020年3月笔记

对应的PDF下载链接在：待上传

目标检测常见数据集总结

V0C数据集(Annotation的格式是xmI)

A. 数据集包含种类:

B. V0C2007和V0C2012的区别:

C. 数据集格式:

D. 标注信息是用xmI文件组织的如下:

E. 各文件部分展示

COCO数据集（Annotation的格式是json）

目标检测常见数据集总结

这里先总结一下,我自己看完这三个常见目标检测数据集:

V0C数据集(Annotation的格式是xmI)

A. 数据集包含种类:

一共包含了20类。一共包含了20类。Person,bird, cat, cow, dog, horse, sheep,aeroplane, bicycle, boat, bus, car, motorbike, train, bottle, chair, dining table, potted plant, sofa, tv/monitor.

B. V0C2007和V0C2012的区别:

（图片来源于某博客，忘记是哪个博客了，如果博友知道，方便告诉，我补上链接）

VOC2007中包含9963张标注过的图片，由train/val/test三部分组成，共标注出24,640个物体。

对于检测任务，VOC2012的trainval/test包含08-11年的所有对应图片。 trainval有11540张图片共27450个物体。

C. 数据集格式:

. ├── Annotations 【Annotations下存放的是xml文件,每个xml对应JPEGImage中的一张图片描述

| 了图片信息】

├── ImageSets【包含三个子文件夹 Layout、Main、Segmentation】

│ ├── Action【Action下存放的是人的动作（例如running、jumping等等）】

│ ├── Layout 【Layout下存放的是具有人体部位的数据】

│ ├── Main 【Main下存放的是图像物体识别的数据，总共分为20类。】

│ └── Segmentation 【Segmentation下存放的是可用于分割的数据】

├── JPEGImages 【主要提供的是PASCAL VOC所提供的所有的图片信息，包括训练图片，测

试图片

\| \|	这些图像就是用来进行训练和测试验证的图像数据。注：是没有标记时的原图】

├── SegmentationClass 【存放按照 class 分割的图片；目标检测不需要】

└── SegmentationObject【存放按照 object 分割的图片；目标检测不需要】

D. 标注信息是用xmI文件组织的如下:

	<annotation>
		<folder>VOC2007</folder>
		<filename>000001.jpg</filename>  # 文件名 
		<source>
			<database>The VOC2007 Database</database>
			<annotation>PASCAL VOC2007</annotation>
			<image>flickr</image>
			<flickrid>341012865</flickrid>
		</source>
		<owner>
			<flickrid>Fried Camels</flickrid>
			<name>Jinky the Fruit Bat</name>
		</owner>
		<size>  # 图像尺寸, 用于对 bbox 左上和右下坐标点做归一化操作
			<width>353</width>
			<height>500</height>
			<depth>3</depth>
		</size>
		<segmented>0</segmented>  # 是否用于分割
		<object>
			<name>dog</name>  # 物体类别
			<pose>Left</pose>  # 拍摄角度：front, rear, left, right, unspecified 
			<truncated>1</truncated>  # 目标是否被截断（比如在图片之外），或者被遮挡（超过15%）
			<difficult>0</difficult>  # 检测难易程度，这个主要是根据目标的大小，光照变化，图片质量来判断
			<bndbox>
				<xmin>48</xmin>
				<ymin>240</ymin>
				<xmax>195</xmax>
				<ymax>371</ymax>
			</bndbox>
		</object>
		<object>
			<name>person</name>
			<pose>Left</pose>
			<truncated>1</truncated>
			<difficult>0</difficult>
			<bndbox>
				<xmin>8</xmin>
				<ymin>12</ymin>
				<xmax>352</xmax>
				<ymax>498</ymax>
			</bndbox>
		</object>
</annotation>

E. 各文件部分展示

(1)JPEGImages:

(2)Annotations

COCO数据集（Annotation的格式是json）

图像来源链接：点击此处

A. 总类别:

80类

B. 文件说明:

3种标注类型，使用json文件存储，每种类型包含了训练和验证
object instances（目标实例）：也就是目标检测object detection；object keypoints（目标上的关键点）； image captions（看图说话）

C. 数据格式:

	{
	    "info": info,
	    "licenses": [license],
	    "images": [image],
	    "annotations": [annotation],
	}
	    
	info{
	    "year": int,
	    "version": str,
	    "description": str,
	    "contributor": str,
	    "url": str,
	    "date_created": datetime,
	}
	license{
	    "id": int,
	    "name": str,
	    "url": str,
	} 
	image{
	    "id": int,
	    "width": int,
	    "height": int,
	    "file_name": str,
	    "license": int,
	    "flickr_url": str,
	    "coco_url": str,
	    "date_captured": datetime,
	}