VO数据集详解

最新推荐文章于 2024-05-15 21:56:42 发布

血_影

最新推荐文章于 2024-05-15 21:56:42 发布

阅读量997

点赞数

分类专栏： PyTorch Notes

本文链接：https://blog.csdn.net/xxboy61/article/details/102707316

版权

PyTorch Notes 专栏收录该内容

11 篇文章 1 订阅

订阅专栏

Visual Object Classes Challenge 2012 (VOC2012)

Introduction

The main goal of this challenge is to recognize objects from a number of visual object classes in realistic scenes (i.e. not pre-segmented objects). It is fundamentally a supervised learning learning problem in that a training set of labelled images is provided. The twenty object classes that have been selected are:

Person: person
Animal: bird, cat, cow, dog, horse, sheep
Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train
Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor

20 classes. The train/val data has 11,530 images containing 27,450 ROI annotated objects and 6,929 segmentations.
There are three main object recognition competitions: classification, detection, and segmentation, a competition on action classification, and a competition on large scale recognition run by ImageNet. In addition there is a “taster” competition on person layout.

Classification/Detection Competitions

Classification: For each of the twenty classes, predicting presence/absence of an example of that class in the test image.
Detection: Predicting the bounding box and label of each object from the twenty target classes in the test image.

Segmentation Competition

Segmentation: Generating pixel-wise segmentations giving the class of the object visible at each pixel, or “background” otherwise.

Action Classification Competition

Action Classification: Predicting the action(s) being performed by a person in a still image.
VOC2012
- Annotations
  - 2008_003420.xml
- ImageSets
  - Action
  - Layout
  - Main
  - Segmentation
- JPEGImages
- SegmentationClass
- SegmentationObject
Annotations:中主要存放xml文件，每一个xml对应一张图像，并且每个xml中存放的是标记的各个目标的位置和类别信息，命名通常与对应的原始图像一样
JPEGImages:自己的原始图像放在JPEGImages文件夹
ImageSets:
- Action 预测静态图像中人做出的动作（running、jumping等等)
- Layout 即人体轮廓布局。该任务的目标是预测人体部位（head、hand、feet等等）的bounding box和对应的label
- Main 存放的是目标识别的数据，总共分为20类，主要有xxx_test.txt , xxx_train.txt, xxx_val.txt，xxx_trainval.txt四个文件,前面的表示图像的name，后面的1代表正样本，-1代表负样本。如
```
tail -n5 person_train.txt
2011_003253 -1   //2011_003253.jpg 图片中没有person  
2011_003255  1   //2011_003255.jpg 图片中有person
2011_003259  1  
2011_003274 -1
2011_003276 -1
```
Segmentation 存放分割的数据,train.txt中存放的是训练集的图片编号,val.txt中存放的是验证集的图片编号,trainval是上面两者的合并集合.

VOC2012/ImageSets/Main/train.txt 保存了所有训练集的文件名，从 VOC2012/JPEGImages/ 找到文件名对应的图片文件
VOC2012/Annotations/ 找到文件名对应的标签文件

Annotations

Annotations文件夹中存放的是xml格式的标签文件，每一个xml文件都对应于JPEGImages文件夹中的一张图片。
在这里插入图片描述
xml的文件格式如下所示：

<annotation>
    <filename>2012_000056.jpg</filename>      // 文件名
    <folder>VOC2012</folder>
    <object>                                 // 检测到到物体信息
        <name>person</name>                  // 物体类别
        <actions>                            // 做什么
            <jumping>0</jumping>
            <other>0</other>
            <phoning>1</phoning>
            <playinginstrument>0</playinginstrument>
            <reading>0</reading>
            <ridingbike>0</ridingbike>
            <ridinghorse>0</ridinghorse>
            <running>0</running>
            <takingphoto>0</takingphoto>
            <usingcomputer>0</usingcomputer>
            <walking>0</walking>
        </actions>
        <bndbox>                 // bbox info，[left,top,right,bottom]
            <xmax>63</xmax>
            <xmin>1</xmin>
            <ymax>375</ymax>
            <ymin>84</ymin>
        </bndbox>
        <difficult>0</difficult>  // 目标是否难以识别（0表示容易识别）
        <pose>Unspecified</pose>  // 物体的姿态
        <point>                   // if the object has a reference point annotated
            <x>26</x>
            <y>183</y>
        </point>
    </object>
    <segmented>0</segmented>          // 是否用于分割
    <size>                            // 图像大小whc
        <depth>3</depth>
        <height>375</height>
        <width>500</width>
    </size>
    <source>                         // 图片来源
        <annotation>PASCAL VOC2012</annotation>
        <database>The VOC2012 Database</database>
        <image>flickr</image>
    </source>
    </annotation>

2012_00005.xml

ImageSets

ImageSets存放的是每一种类型的challenge对应的图像数据。在ImageSets下有四个文件夹：

Action下存放的是人的动作（例如running、jumping等等，这也是VOC challenge的一部分）
Layout下存放的是具有人体部位的数据（人的head、hand、feet等等，这也是VOC challenge的一部分）
Main下存放的是图像物体识别的数据，总共分为20类。
Segmentation下存放的是可用于分割的数据。

Main文件夹下包含了20个分类的***_train.txt、***_val.txt和***_trainval.txt。这些txt中的内容都差不多。前面的表示图像的name，后面的1代表正样本，-1代表负样本。_train中存放的是训练使用的数据，每一个class的train数据都有5717个。_val中存放的是验证结果使用的数据，每一个class的val数据都有5823个。_trainval将上面两个进行了合并，每一个class有11540个。需要保证的是train和val两者没有交集，也就是训练数据和验证数据不能有重复，在选取训练数据的时候，也应该是随机产生的。

JPEGImages

在这里插入图片描述
JPEGImages文件夹中包含了PASCAL VOC所提供的所有的图片，包含训练图片和测试图片，共有17125张。这些图像都是以“年份_编号.jpg”格式命名的。图片的像素尺寸大小不一，但是横向图的尺寸大约在500375左右，纵向图的尺寸大约在375500左右，基本不会偏差超过100。在之后的训练中，第一步就是将这些图片都resize到300300或是500500，所有原始图片不能离这个标准过远。这些图像就是用来进行训练和测试验证的图像数据。

SegmentationClass

含了2913张图片，每一张图片都对应JPEGImages里面的相应编号的图片,图片的像素颜色共有20种，对应20类物体。

SegmentationObject

包含了2913张图片，图片编号都与Class里面的图片编号相同。这里面的图片和Class里面图片的区别在于，这是针对Object的。在Class里面，一张图片里如果有多架飞机，那么会全部标注为红色。而在Object里面，同一张图片里面的飞机会被不同颜色标注出来。

制作 VOC2012数据集

制作 VOC数据集主要包括以下几步：

数据准备
标定图片:生成label文件,文件内容为类别及boundingbox信息
生成符合VOC格式要求的文件主要是Annotations/.xml ImageSets/main/.txt

Step1 make voc2012 directory

-- VOC2012    
    |-- Annotations   
    |-- ImageSets   
    |   |-- Action   
    |   |-- Layout   
    |   |-- Main   
    |   `-- Segmentation  
    |-- JPEGImages
    |-- SegmentationClass
    `-- SegmentationObject

Step2 生成Annotations目录下的XML文件

choose Step2.1 or Step2.2 or both to run

Step2.1 生成相应的Annotations目录下的XML文件

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import os, sys
import cv2
import numpy
from lxml.etree import Element, SubElement, tostring
from xml.dom.minidom import parseString

def save_Annotations_xml(root_dir, label_list, Annotations_dir, JPEGImages_dir):
    """
    """
    filename = os.path.join(root_dir, label_list)
    print('-------------label list filename:', filename)
    # note i is the star index, if image in JPEGImages include
    # train val or test images, change i
    i = 1
    with open(filename, 'r') as f:
        lines = f.readlines()
        print('save_Annotations_xml read lines:', len(lines))
        for line in lines:
            line_info = line.rstrip('\n').split(' ')
            imgname = os.path.join(root_dir, line_info[0])
            img = cv2.imread(imgname)
            height, width, channel = img.shape                
            image_name = '%09d' % i + '.jpg'
            # save JPEGImages
            new_image_name = JPEGImages_dir + '/%09d' % i + '.jpg'
            # construct annotation
            node_root = Element('annotation')
            node_folder = SubElement(node_root, 'folder')
            node_folder.text = 'JPEGImages'
            # image name
            node_filename = SubElement(node_root, 'filename')
            node_filename.text = image_name
            # image width height channel
            node_size = SubElement(node_root, 'size')
            node_depth = SubElement(node_size, 'depth')
            node_depth.text = '%s' % channel
            node_height = SubElement(node_size, 'height')
            node_height.text = '%s' % height
            node_width = SubElement(node_size, 'width')
            node_width.text = '%s' % width
            
            write_infile = False
            # bbounding box info 
            line_info = [int(b) for b in line_info[1:]]
            array=numpy.array(line_info[:-1])
            bboxs = array.reshape(-1, 4)
            for bbox in bboxs:
                x, y, w, h = [int(b) for b in bbox]
                # add data filter
                if w < 12 or h < 32:
                    continue
                    
                write_infile=True
                left, top, right, bottom = x, y, x + w, y + h
                node_object = SubElement(node_root, 'object')
                node_name = SubElement(node_object, 'name')
                node_name.text = 'person'
                node_difficult = SubElement(node_object, 'difficult')
                node_difficult.text = '0'
                node_bndbox = SubElement(node_object, 'bndbox')
                node_xmin = SubElement(node_bndbox, 'xmin')
                node_xmin.text = '%s' % left
                node_ymin = SubElement(node_bndbox, 'ymin')
                node_ymin.text = '%s' % top
                node_xmax = SubElement(node_bndbox, 'xmax')
                node_xmax.text = '%s' % right
                node_ymax = SubElement(node_bndbox, 'ymax')
                node_ymax.text = '%s' % bottom
            
            if write_infile:
                # to string
                xml = tostring(node_root, pretty_print=True)  
                dom = parseString(xml)
                # save_xml 
                save_xml = os.path.join(Annotations_dir, image_name.replace('jpg', 'xml'))
                with open(save_xml, 'wb') as f:
                    f.write(xml)

                cv2.imwrite(new_image_name, img)               
                i = i + 1
                
    print('******************* make Annotations xml Done *******************', i)

if __name__ == '__main__':
    # dataset to convert
    # test_label is format as filename tx0 ty0 bx0 by0 tx1 ty1 bx1 by1 ... classes
    root_dir = '/opt/notebook_files/datasets/PedestrianDataset/Caltech'
    test_label='caltech_train_label.txt'
    # Voc dataset
    Annotations_dir='VOC2012/Annotations'
    JPEGImages_dir='VOC2012/JPEGImages'
    save_Annotations_xml(root_dir, test_label, Annotations_dir, JPEGImages_dir)

Step2.2 在Annotations目录增量式添加XML文件

此接口是向已存在的VOC数据集目录下添加新的数据集

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import os, sys
import cv2
import numpy
from lxml.etree import Element, SubElement, tostring
from xml.dom.minidom import parseString

def add_Annotations_xml(root_dir, label_list, Annotations_dir, JPEGImages_dir):
    """
    
    """
    filename = os.path.join(root_dir, label_list)
    print('-------------label list filename:', filename)
    start_idx = 1
    Annotations_xmls = os.listdir(Annotations_dir)
    print('Annotations_xmls:',len(Annotations_xmls))
    
    start_idx = len(Annotations_xmls)   
    with open(filename, 'r') as f:
        lines = f.readlines()
        print('save_Annotations_xml read lines:', len(lines))
        for line in lines:
            line_info = line.rstrip('\n').split(' ')
            imgname = os.path.join(root_dir, line_info[0])
            img = cv2.imread(imgname)
            height, width, channel = img.shape                
            image_name = '%09d' % start_idx + '.jpg'
            # save JPEGImages
            new_image_name = JPEGImages_dir + '/%09d' % start_idx + '.jpg'
            
            # construct annotation
            node_root = Element('annotation')
            node_folder = SubElement(node_root, 'folder')
            node_folder.text = 'JPEGImages'
            # image name
            node_filename = SubElement(node_root, 'filename')
            node_filename.text = image_name
            # image width height channel
            node_size = SubElement(node_root, 'size')
            node_depth = SubElement(node_size, 'depth')
            node_depth.text = '%s' % channel
            node_height = SubElement(node_size, 'height')
            node_height.text = '%s' % height
            node_width = SubElement(node_size, 'width')
            node_width.text = '%s' % width
            
            write_infile = False
            # bbounding box info 
            line_info = [int(b) for b in line_info[1:]]
            array=numpy.array(line_info[:-1])
            bboxs = array.reshape(-1, 4)
            for bbox in bboxs:
                x, y, w, h = [int(b) for b in bbox]
                # add data filter
                if w < 12 or h < 32:
                    continue
                    
                write_infile=True
                left, top, right, bottom = x, y, x + w, y + h
                node_object = SubElement(node_root, 'object')
                node_name = SubElement(node_object, 'name')
                node_name.text = 'person'
                node_difficult = SubElement(node_object, 'difficult')
                node_difficult.text = '0'
                node_bndbox = SubElement(node_object, 'bndbox')
                node_xmin = SubElement(node_bndbox, 'xmin')
                node_xmin.text = '%s' % left
                node_ymin = SubElement(node_bndbox, 'ymin')
                node_ymin.text = '%s' % top
                node_xmax = SubElement(node_bndbox, 'xmax')
                node_xmax.text = '%s' % right
                node_ymax = SubElement(node_bndbox, 'ymax')
                node_ymax.text = '%s' % bottom
            
            if write_infile:
                # to string
                xml = tostring(node_root, pretty_print=True)  
                dom = parseString(xml)
                # save_xml 
                save_xml = os.path.join(Annotations_dir, image_name.replace('jpg', 'xml'))
                with open(save_xml, 'wb') as f:
                    f.write(xml)

                cv2.imwrite(new_image_name, img)               
                start_idx += 1
                
    print('*******************Have Annotations xmls:{} *******************'.format(start_idx))

if __name__ == '__main__':
    # dataset to convert
    # test_label is format as filename tx0 ty0 bx0 by0 tx1 ty1 bx1 by1 ... classes
    root_dir = '/opt/notebook_files/datasets/PedestrianDataset/Caltech'
    test_label='caltech_train_label.txt'
    # Voc dataset
    Annotations_dir='VOC2012/Annotations'
    JPEGImages_dir='VOC2012/JPEGImages'
    add_Annotations_xml(root_dir, test_label, Annotations_dir, JPEGImages_dir)

Step3 Convert Test

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import os, sys
import cv2
import matplotlib.pyplot as plt
import xml.etree.ElementTree as ET
import helper
import numpy as np
%matplotlib inline

VOC_CLASSES = ["__background__", "person"]
classes = VOC_CLASSES

class_to_ind = dict(zip(VOC_CLASSES, range(len(VOC_CLASSES))))
def AnnotationTransform(xml_path):
    """
    get xml info
    """
    xml_file = open(xml_path, 'r')
    # xml
    tree=ET.parse(xml_file)
    # targets
    root = tree.getroot()
    # Transforms a VOC annotation into a Tensor of bbox coords and label index
    res = np.empty((0, 5))
    for obj in root.iter('object'):
        difficult = obj.find('difficult').text
        name = obj.find('name').text.lower().strip()
        if name not in classes or int(difficult) == 1:
            continue
        
        bbox = obj.find('bndbox')
        pts = ['xmin', 'ymin', 'xmax', 'ymax']
        bndbox = []
        for i, pt in enumerate(pts):
#             print('bbox.find(pt).text:', bbox.find(pt).text)
            cur_pt = int(bbox.find(pt).text) - 1
            bndbox.append(cur_pt)

        label_idx = class_to_ind[name]
        bndbox.append(label_idx)

        res = np.vstack((res, bndbox))
        
    return res

JPEGImages_dir = 'VOC2012/JPEGImages'
Annotations_dir = 'VOC2012/Annotations'

result =[]
JPEGImages = os.listdir(JPEGImages_dir)
for img in JPEGImages[:10]:
    imgname = os.path.join(JPEGImages_dir, img)
    Annotation = os.path.join(Annotations_dir, img.replace('jpg', 'xml'))
    img = cv2.imread(imgname)
    bboxs = AnnotationTransform(Annotation)
    for bbox in bboxs:
        tx, ty, bx, by, label = [int(b) for b in bbox]
        cv2.rectangle(img, (tx, ty), (bx, by), (0, 0, 255), 1)
    result.append([img, label])
    

win_idx = 1
for k in range(len(result)):
    try:
        plt.ion()
        plt.figure(win_idx)
        plt.title(result[k][1])
        plt.imshow(result[k][0][:,:,::-1])
    except:
        pass
    finally:
        win_idx+=1

Step4 生成Main目录下的txt文件

即生成测试、验证数据集合等等，然后存储成txt文件

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import os, sys
import random
def make_ImageSets_label(ImageSets_Main_dir, Annotations_dir, label_list):
    """
    
    """
    trainval_percent = 0.1
    train_percent = 0.9
    total_xml = os.listdir(Annotations_dir)
    num = len(total_xml)
    print('image nums:', num)
    total_xml_list = range(num)
    label_list_path = os.path.join(ImageSets_Main_dir, label_list)
    print('label_list_path:', label_list_path)
    with open(label_list_path, 'w') as f:
        for i in total_xml_list:
            name = total_xml[i][:-4] + '\n'
            f.write(name)
    
    print('******************* make ImageSets Main Done *******************')
    
if __name__ == '__main__':
    ImageSets_Main_dir='VOC2012/ImageSets/Main'
    Annotations_dir='VOC2012/Annotations'
    label_list ='trainval.txt'
    make_ImageSets_label(ImageSets_Main_dir, Annotations_dir, label_list)

血_影

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
1
评论
VO数据集详解

Visual Object Classes Challenge 2012 (VOC2012)IntroductionThe main goal of this challenge is to recognize objects from a number of visual object classes in realistic scenes (i.e. not pre-segmented o...
复制链接

扫一扫