VO数据集详解

Visual Object Classes Challenge 2012 (VOC2012)

Introduction

The main goal of this challenge is to recognize objects from a number of visual object classes in realistic scenes (i.e. not pre-segmented objects). It is fundamentally a supervised learning learning problem in that a training set of labelled images is provided. The twenty object classes that have been selected are:

  • Person: person
  • Animal: bird, cat, cow, dog, horse, sheep
  • Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train
  • Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor

20 classes. The train/val data has 11,530 images containing 27,450 ROI annotated objects and 6,929 segmentations.
There are three main object recognition competitions: classification, detection, and segmentation, a competition on action classification, and a competition on large scale recognition run by ImageNet. In addition there is a “taster” competition on person layout.

Classification/Detection Competitions

  1. Classification: For each of the twenty classes, predicting presence/absence of an example of that class in the test image.
  2. Detection: Predicting the bounding box and label of each object from the twenty target classes in the test image.

Segmentation Competition

  • Segmentation: Generating pixel-wise segmentations giving the class of the object visible at each pixel, or “background” otherwise.

Action Classification Competition

  • Action Classification: Predicting the action(s) being performed by a person in a still image.

  • VOC2012

    • Annotations
      • 2008_003420.xml
    • ImageSets
      • Action
      • Layout
      • Main
      • Segmentation
    • JPEGImages
    • SegmentationClass
    • SegmentationObject

    Annotations:中主要存放xml文件,每一个xml对应一张图像,并且每个xml中存放的是标记的各个目标的位置和类别信息,命名通常与对应的原始图像一样
    JPEGImages:自己的原始图像放在JPEGImages文件夹
    ImageSets:

    • Action 预测静态图像中人做出的动作(running、jumping等等)
    • Layout 即人体轮廓布局。该任务的目标是预测人体部位(head、hand、feet等等)的bounding box和对应的label
    • Main 存放的是目标识别的数据,总共分为20类,主要有xxx_test.txt , xxx_train.txt, xxx_val.txt,xxx_trainval.txt四个文件,前面的表示图像的name,后面的1代表正样本,-1代表负样本。如
      tail -n5 person_train.txt
      2011_003253 -1   //2011_003253.jpg 图片中没有person  
      2011_003255  1   //2011_003255.jpg 图片中有person
      2011_003259  1  
      2011_003274 -1
      2011_003276 -1
      

    Segmentation 存放分割的数据,train.txt中存放的是训练集的图片编号,val.txt中存放的是验证集的图片编号,trainval是上面两者的合并集合.

    VOC2012/ImageSets/Main/train.txt 保存了所有训练集的文件名,从 VOC2012/JPEGImages/ 找到文件名对应的图片文件
    VOC2012/Annotations/ 找到文件名对应的标签文件

Annotations

Annotations文件夹中存放的是xml格式的标签文件,每一个xml文件都对应于JPEGImages文件夹中的一张图片。
在这里插入图片描述
xml的文件格式如下所示:

<annotation>
    <filename>2012_000056.jpg</filename>      // 文件名
    <folder>VOC2012</folder>
    <object>                                 // 检测到到物体信息
        <name>person</name>                  // 物体类别
        <actions>                            // 做什么
            <jumping>0</jumping>
            <other>0</other>
            <phoning>1</phoning>
            <playinginstrument>0</playinginstrument>
            <reading>0</reading>
            <ridingbike>0</ridingbike>
            <ridinghorse>0</ridinghorse>
            <running>0</running>
            <takingphoto>0</takingphoto>
            <usingcomputer>0</usingcomputer>
            <walking>0</walking>
        </actions>
        <bndbox>                 // bbox info,[left,top,right,bottom]
            <xmax>63</xmax>
            <xmin>1</xmin>
            <ymax>375</ymax>
            <ymin>84</ymin>
        </bndbox>
        <difficult>0</difficult>  // 目标是否难以识别(0表示容易识别)
        <pose>Unspecified</pose>  // 物体的姿态
        <point>                   // if the object has a reference point annotated
            <x>26</x>
            <y>183</y>
        </point>
    </object>
    <segmented>0</segmented>          // 是否用于分割
    <size>                            // 图像大小whc
        <depth>3</depth>
        <height>375</height>
        <width>500</width>
    </size>
    <source>                         // 图片来源
        <annotation>PASCAL VOC2012</annotation>
        <database>The VOC2012 Database</database>
        <image>flickr</image>
    </source>
    </annotation>

2012_00005.xml

ImageSets

ImageSets存放的是每一种类型的challenge对应的图像数据。在ImageSets下有四个文件夹:

  • Action下存放的是人的动作(例如running、jumping等等,这也是VOC challenge的一部分)
  • Layout下存放的是具有人体部位的数据(人的head、hand、feet等等,这也是VOC challenge的一部分)
  • Main下存放的是图像物体识别的数据,总共分为20类。
  • Segmentation下存放的是可用于分割的数据。

Main文件夹下包含了20个分类的***_train.txt、***_val.txt和***_trainval.txt。这些txt中的内容都差不多。前面的表示图像的name,后面的1代表正样本,-1代表负样本。_train中存放的是训练使用的数据,每一个class的train数据都有5717个。_val中存放的是验证结果使用的数据,每一个class的val数据都有5823个。_trainval将上面两个进行了合并,每一个class有11540个。需要保证的是train和val两者没有交集,也就是训练数据和验证数据不能有重复,在选取训练数据的时候 ,也应该是随机产生的。

JPEGImages

在这里插入图片描述
JPEGImages文件夹中包含了PASCAL VOC所提供的所有的图片,包含训练图片和测试图片,共有17125张。这些图像都是以“年份_编号.jpg”格式命名的。图片的像素尺寸大小不一,但是横向图的尺寸大约在500375左右,纵向图的尺寸大约在375500左右,基本不会偏差超过100。在之后的训练中,第一步就是将这些图片都resize到300300或是500500,所有原始图片不能离这个标准过远。这些图像就是用来进行训练和测试验证的图像数据。

SegmentationClass

含了2913张图片,每一张图片都对应JPEGImages里面的相应编号的图片,图片的像素颜色共有20种,对应20类物体。

SegmentationObject

包含了2913张图片,图片编号都与Class里面的图片编号相同。这里面的图片和Class里面图片的区别在于,这是针对Object的。在Class里面,一张图片里如果有多架飞机,那么会全部标注为红色。而在Object里面,同一张图片里面的飞机会被不同颜色标注出来。

制作 VOC2012数据集

制作 VOC数据集主要包括以下几步:

  • 数据准备
  • 标定图片:生成label文件,文件内容为类别及boundingbox信息
  • 生成符合VOC格式要求的文件 主要是Annotations/.xml ImageSets/main/.txt

Step1 make voc2012 directory

-- VOC2012    
    |-- Annotations   
    |-- ImageSets   
    |   |-- Action   
    |   |-- Layout   
    |   |-- Main   
    |   `-- Segmentation  
    |-- JPEGImages
    |-- SegmentationClass
    `-- SegmentationObject

Step2 生成Annotations目录下的XML文件

choose Step2.1 or Step2.2 or both to run

Step2.1 生成相应的Annotations目录下的XML文件

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import os, sys
import cv2
import numpy
from lxml.etree import Element, SubElement, tostring
from xml.dom.minidom import parseString

def save_Annotations_xml(root_dir, label_list, Annotations_dir, JPEGImages_dir):
    """
    """
    filename = os.path.join(root_dir, label_list)
    print('-------------label list filename:', filename)
    # note i is the star index, if image in JPEGImages include
    # train val or test images, change i
    i = 1
    with open(filename, 'r') as f:
        lines = f.readlines()
        print('save_Annotations_xml read lines:', len(lines))
        for line in lines:
            line_info = line.rstrip('\n').split(' ')
            imgname = os.path.join(root_dir, line_info[0])
            img = cv2.imread(imgname)
            height, width, channel = img.shape                
            image_name = '%09d' % i + '.jpg'
            # save JPEGImages
            new_image_name = JPEGImages_dir + '/%09d' % i + '.jpg'
            # construct annotation
            node_root = Element('annotation')
            node_folder = SubElement(node_root, 'folder')
            node_folder.text = 'JPEGImages'
            # image name
            node_filename = SubElement(node_root, 'filename')
            node_filename.text = image_name
            # image width height channel
            node_size = SubElement(node_root, 'size')
            node_depth = SubElement(node_size, 'depth')
            node_depth.text = '%s' % channel
            node_height = SubElement(node_size, 'height')
            node_height.text = '%s' % height
            node_width = SubElement(node_size, 'width')
            node_width.text = '%s' % width
            
            write_infile = False
            # bbounding box info 
            line_info = [int(b) for b in line_info[1:]]
            array=numpy.array(line_info[:-1])
            bboxs = array.reshape(-1, 4)
            for bbox in bboxs:
                x, y, w, h = [int(b) for b in bbox]
                # add data filter
                if w < 12 or h < 32:
                    continue
                    
                write_infile=True
                left, top, right, bottom = x, y, x + w, y + h
                node_object = SubElement(node_root, 'object')
                node_name = SubElement(node_object, 'name')
                node_name.text = 'person'
                node_difficult = SubElement(node_object, 'difficult')
                node_difficult.text = '0'
                node_bndbox = SubElement(node_object, 'bndbox')
                node_xmin = SubElement(node_bndbox, 'xmin')
                node_xmin.text = '%s' % left
                node_ymin = SubElement(node_bndbox, 'ymin')
                node_ymin.text = '%s' % top
                node_xmax = SubElement(node_bndbox, 'xmax')
                node_xmax.text = '%s' % right
                node_ymax = SubElement(node_bndbox, 'ymax')
                node_ymax.text = '%s' % bottom
            
            if write_infile:
                # to string
                xml = tostring(node_root, pretty_print=True)  
                dom = parseString(xml)
                # save_xml 
                save_xml = os.path.join(Annotations_dir, image_name.replace('jpg', 'xml'))
                with open(save_xml, 'wb') as f:
                    f.write(xml)

                cv2.imwrite(new_image_name, img)               
                i = i + 1
                
    print('******************* make Annotations xml Done *******************', i)

if __name__ == '__main__':
    # dataset to convert
    # test_label is format as filename tx0 ty0 bx0 by0 tx1 ty1 bx1 by1 ... classes
    root_dir = '/opt/notebook_files/datasets/PedestrianDataset/Caltech'
    test_label='caltech_train_label.txt'
    # Voc dataset
    Annotations_dir='VOC2012/Annotations'
    JPEGImages_dir='VOC2012/JPEGImages'
    save_Annotations_xml(root_dir, test_label, Annotations_dir, JPEGImages_dir)

Step2.2 在Annotations目录增量式添加XML文件

此接口是向已存在的VOC数据集目录下添加新的数据集

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import os, sys
import cv2
import numpy
from lxml.etree import Element, SubElement, tostring
from xml.dom.minidom import parseString

def add_Annotations_xml(root_dir, label_list, Annotations_dir, JPEGImages_dir):
    """
    
    """
    filename = os.path.join(root_dir, label_list)
    print('-------------label list filename:', filename)
    start_idx = 1
    Annotations_xmls = os.listdir(Annotations_dir)
    print('Annotations_xmls:',len(Annotations_xmls))
    
    start_idx = len(Annotations_xmls)   
    with open(filename, 'r') as f:
        lines = f.readlines()
        print('save_Annotations_xml read lines:', len(lines))
        for line in lines:
            line_info = line.rstrip('\n').split(' ')
            imgname = os.path.join(root_dir, line_info[0])
            img = cv2.imread(imgname)
            height, width, channel = img.shape                
            image_name = '%09d' % start_idx + '.jpg'
            # save JPEGImages
            new_image_name = JPEGImages_dir + '/%09d' % start_idx + '.jpg'
            
            # construct annotation
            node_root = Element('annotation')
            node_folder = SubElement(node_root, 'folder')
            node_folder.text = 'JPEGImages'
            # image name
            node_filename = SubElement(node_root, 'filename')
            node_filename.text = image_name
            # image width height channel
            node_size = SubElement(node_root, 'size')
            node_depth = SubElement(node_size, 'depth')
            node_depth.text = '%s' % channel
            node_height = SubElement(node_size, 'height')
            node_height.text = '%s' % height
            node_width = SubElement(node_size, 'width')
            node_width.text = '%s' % width
            
            write_infile = False
            # bbounding box info 
            line_info = [int(b) for b in line_info[1:]]
            array=numpy.array(line_info[:-1])
            bboxs = array.reshape(-1, 4)
            for bbox in bboxs:
                x, y, w, h = [int(b) for b in bbox]
                # add data filter
                if w < 12 or h < 32:
                    continue
                    
                write_infile=True
                left, top, right, bottom = x, y, x + w, y + h
                node_object = SubElement(node_root, 'object')
                node_name = SubElement(node_object, 'name')
                node_name.text = 'person'
                node_difficult = SubElement(node_object, 'difficult')
                node_difficult.text = '0'
                node_bndbox = SubElement(node_object, 'bndbox')
                node_xmin = SubElement(node_bndbox, 'xmin')
                node_xmin.text = '%s' % left
                node_ymin = SubElement(node_bndbox, 'ymin')
                node_ymin.text = '%s' % top
                node_xmax = SubElement(node_bndbox, 'xmax')
                node_xmax.text = '%s' % right
                node_ymax = SubElement(node_bndbox, 'ymax')
                node_ymax.text = '%s' % bottom
            
            if write_infile:
                # to string
                xml = tostring(node_root, pretty_print=True)  
                dom = parseString(xml)
                # save_xml 
                save_xml = os.path.join(Annotations_dir, image_name.replace('jpg', 'xml'))
                with open(save_xml, 'wb') as f:
                    f.write(xml)

                cv2.imwrite(new_image_name, img)               
                start_idx += 1
                
    print('*******************Have Annotations xmls:{} *******************'.format(start_idx))

if __name__ == '__main__':
    # dataset to convert
    # test_label is format as filename tx0 ty0 bx0 by0 tx1 ty1 bx1 by1 ... classes
    root_dir = '/opt/notebook_files/datasets/PedestrianDataset/Caltech'
    test_label='caltech_train_label.txt'
    # Voc dataset
    Annotations_dir='VOC2012/Annotations'
    JPEGImages_dir='VOC2012/JPEGImages'
    add_Annotations_xml(root_dir, test_label, Annotations_dir, JPEGImages_dir)

Step3 Convert Test

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import os, sys
import cv2
import matplotlib.pyplot as plt
import xml.etree.ElementTree as ET
import helper
import numpy as np
%matplotlib inline

VOC_CLASSES = ["__background__", "person"]
classes = VOC_CLASSES

class_to_ind = dict(zip(VOC_CLASSES, range(len(VOC_CLASSES))))
def AnnotationTransform(xml_path):
    """
    get xml info
    """
    xml_file = open(xml_path, 'r')
    # xml
    tree=ET.parse(xml_file)
    # targets
    root = tree.getroot()
    # Transforms a VOC annotation into a Tensor of bbox coords and label index
    res = np.empty((0, 5))
    for obj in root.iter('object'):
        difficult = obj.find('difficult').text
        name = obj.find('name').text.lower().strip()
        if name not in classes or int(difficult) == 1:
            continue
        
        bbox = obj.find('bndbox')
        pts = ['xmin', 'ymin', 'xmax', 'ymax']
        bndbox = []
        for i, pt in enumerate(pts):
#             print('bbox.find(pt).text:', bbox.find(pt).text)
            cur_pt = int(bbox.find(pt).text) - 1
            bndbox.append(cur_pt)

        label_idx = class_to_ind[name]
        bndbox.append(label_idx)

        res = np.vstack((res, bndbox))
        
    return res

JPEGImages_dir = 'VOC2012/JPEGImages'
Annotations_dir = 'VOC2012/Annotations'

result =[]
JPEGImages = os.listdir(JPEGImages_dir)
for img in JPEGImages[:10]:
    imgname = os.path.join(JPEGImages_dir, img)
    Annotation = os.path.join(Annotations_dir, img.replace('jpg', 'xml'))
    img = cv2.imread(imgname)
    bboxs = AnnotationTransform(Annotation)
    for bbox in bboxs:
        tx, ty, bx, by, label = [int(b) for b in bbox]
        cv2.rectangle(img, (tx, ty), (bx, by), (0, 0, 255), 1)
    result.append([img, label])
    

win_idx = 1
for k in range(len(result)):
    try:
        plt.ion()
        plt.figure(win_idx)
        plt.title(result[k][1])
        plt.imshow(result[k][0][:,:,::-1])
    except:
        pass
    finally:
        win_idx+=1

Step4 生成Main目录下的txt文件

即生成测试、验证数据集合等等,然后存储成txt文件

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import os, sys
import random
def make_ImageSets_label(ImageSets_Main_dir, Annotations_dir, label_list):
    """
    
    """
    trainval_percent = 0.1
    train_percent = 0.9
    total_xml = os.listdir(Annotations_dir)
    num = len(total_xml)
    print('image nums:', num)
    total_xml_list = range(num)
    label_list_path = os.path.join(ImageSets_Main_dir, label_list)
    print('label_list_path:', label_list_path)
    with open(label_list_path, 'w') as f:
        for i in total_xml_list:
            name = total_xml[i][:-4] + '\n'
            f.write(name)
    
    print('******************* make ImageSets Main Done *******************')
    
if __name__ == '__main__':
    ImageSets_Main_dir='VOC2012/ImageSets/Main'
    Annotations_dir='VOC2012/Annotations'
    label_list ='trainval.txt'
    make_ImageSets_label(ImageSets_Main_dir, Annotations_dir, label_list)
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

血_影

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值