WiderPerson行人检测数据集

最新推荐文章于 2025-04-11 17:53:19 发布

松菇

最新推荐文章于 2025-04-11 17:53:19 发布

阅读量1.9w

点赞数 28

分类专栏：目标检测数据集转换文章标签：目标检测行人检测

本文链接：https://blog.csdn.net/songwsx/article/details/102757137

版权

目标检测数据集转换专栏收录该内容

8 篇文章

订阅专栏

1. 简介

WiderPerson数据集是比较拥挤场景的行人检测基准数据集，其图像是从多种场景中选择的，不再局限于交通场景。选择13382张图像，并用各种遮挡标记约40万个注释。我们随机选择8000/1000/4382图像作为训练，验证和测试集。与CityPersons和WIDER FACE数据集相似，不发布测试图像的标注文件。

可以在官网上进行下载：http://www.cbsr.ia.ac.cn/users/sfzhang/WiderPerson/

2. 标注详解

随便打开一个标注文件如下所示：

1. 第一行代表了标注框的数目，从第二行开始才是真正的标注框

2. 从第二行开始，第一列代表了这个标注框的类别，后面分别是xmin ymin xmax ymax也就是左上角右下角坐标

[class_label, xmin, ymin, xmax, ymax]

写了个可视化程序进行查看：

import os
import cv2


if __name__ == '__main__':
    path = '../WiderPerson/trainval.txt'
    with open(path, 'r') as f:
        img_ids = [x for x in f.read().splitlines()]

    for img_id in img_ids: # '000040'
        img_path = '../WiderPerson/images/' + img_id + '.jpg'
        img = cv2.imread(img_path)

        im_h = img.shape[0]
        im_w = img.shape[1]

        label_path = img_path.replace('images', 'Annotations') + '.txt'

        with open(label_path) as file:
            line = file.readline()
            count = int(line.split('\n')[0]) # 里面行人个数
            line = file.readline()
            while line:
                cls = int(line.split(' ')[0])
                # < class_label =1: pedestrians > 行人
                # < class_label =2: riders >      骑车的
                # < class_label =3: partially-visible persons > 遮挡的部分行人
                # < class_label =4: ignore regions > 一些假人，比如图画上的人
                # < class_label =5: crowd > 拥挤人群，直接大框覆盖了
                if cls == 1 or cls == 2 or cls == 3:
                    xmin = float(line.split(' ')[1])
                    ymin = float(line.split(' ')[2])
                    xmax = float(line.split(' ')[3])
                    ymax = float(line.split(' ')[4].split('\n')[0])
                    img = cv2.rectangle(img, (int(xmin), int(ymin)), (int(xmax), int(ymax)), (0, 255, 0), 2)
                line = file.readline()
        cv2.imshow('result', img)
        cv2.waitKey(0)

可以看到都是一些比较拥挤场景的行人图片，不过跟实际比较贴切一些，不像其他一些数据集都是在国外开奔驰的场景，感觉对自己的实际场景应该会有帮助：

需要注意000040.jpg.txt貌似有问题，自己删掉就好了

3. 类别解析

官网是这么说的，但是自己还是有点不理解，就自己绘制看看了

...
< class_label =1: pedestrians >
< class_label =2: riders >
< class_label =3: partially-visible persons >
< class_label =4: ignore regions >
< class_label =5: crowd >
...

类别 1：pedestrians 行人

根据自己的可视化代码，cls == 1的时候绘制出来。可以看到都是比较完整的行人图片，这个比较好理解

类别 2：riders 骑车的人

这个很好理解

类别 3：partially-visible persons 被挡住了一部分的人

这个也比较好理解，就是遮挡嘛，不过有些时候遮挡严重都快看不出来是个人了。。

类别 4：ignore regions

这个是重点了，一开始有点懵，不知道啥意思，绘制出来如下（这是“假人”，自己实际应用并不需要这样的假人，不过VOC COCO数据集都标注为person了）：

类别 5：crowd

拥挤人群直接大框笼罩了，COCO 数据集很多也这样而且还标成person了

4 数据集转换

转成VOC格式

import os
import numpy as np
import scipy.io as sio
import shutil
from lxml.etree import Element, SubElement, tostring
from xml.dom.minidom import parseString
import cv2

def make_voc_dir():
    # labels 目录若不存在，创建labels目录。若存在，则清空目录
    if not os.path.exists('../VOC2007/Annotations'):
        os.makedirs('../VOC2007/Annotations')
    if not os.path.exists('../VOC2007/ImageSets'):
        os.makedirs('../VOC2007/ImageSets')
        os.makedirs('../VOC2007/ImageSets/Main')
    if not os.path.exists('../VOC2007/JPEGImages'):
        os.makedirs('../VOC2007/JPEGImages')

if __name__ == '__main__':
    classes = {'1': 'pedestrians',
               '2': 'riders',
               '3': 'partially',
               '4': 'ignore',
               '5': 'crowd'}
    VOCRoot = '../VOC2007'
    widerDir = '../WiderPerson'  # 数据集所在的路径
    wider_path = '../WiderPerson/trainval.txt'
    make_voc_dir()
    with open(wider_path, 'r') as f:
        imgIds = [x for x in f.read().splitlines()]

    for imgId in imgIds:
        objCount = 0  # 一个标志位，用来判断该img是否包含我们需要的标注
        filename = imgId + '.jpg'
        img_path = '../WiderPerson/images/' + filename
        print('Img :%s' % img_path)
        img = cv2.imread(img_path)
        width = img.shape[1]  # 获取图片尺寸
        height = img.shape[0]  # 获取图片尺寸 360

        node_root = Element('annotation')
        node_folder = SubElement(node_root, 'folder')
        node_folder.text = 'JPEGImages'
        node_filename = SubElement(node_root, 'filename')
        node_filename.text = 'VOC2007/JPEGImages/%s' % filename
        node_size = SubElement(node_root, 'size')
        node_width = SubElement(node_size, 'width')
        node_width.text = '%s' % width
        node_height = SubElement(node_size, 'height')
        node_height.text = '%s' % height
        node_depth = SubElement(node_size, 'depth')
        node_depth.text = '3'

        label_path = img_path.replace('images', 'Annotations') + '.txt'
        with open(label_path) as file:
            line = file.readline()
            count = int(line.split('\n')[0])  # 里面行人个数
            line = file.readline()
            while line:
                cls_id = line.split(' ')[0]
                xmin = int(line.split(' ')[1]) + 1
                ymin = int(line.split(' ')[2]) + 1
                xmax = int(line.split(' ')[3]) + 1
                ymax = int(line.split(' ')[4].split('\n')[0]) + 1
                line = file.readline()

                cls_name = classes[cls_id]

                obj_width = xmax - xmin
                obj_height = ymax - ymin

                difficult = 0
                if obj_height <= 6 or obj_width <= 6:
                    difficult = 1

                node_object = SubElement(node_root, 'object')
                node_name = SubElement(node_object, 'name')
                node_name.text = cls_name
                node_difficult = SubElement(node_object, 'difficult')
                node_difficult.text = '%s' % difficult
                node_bndbox = SubElement(node_object, 'bndbox')
                node_xmin = SubElement(node_bndbox, 'xmin')
                node_xmin.text = '%s' % xmin
                node_ymin = SubElement(node_bndbox, 'ymin')
                node_ymin.text = '%s' % ymin
                node_xmax = SubElement(node_bndbox, 'xmax')
                node_xmax.text = '%s' % xmax
                node_ymax = SubElement(node_bndbox, 'ymax')
                node_ymax.text = '%s' % ymax
                node_name = SubElement(node_object, 'pose')
                node_name.text = 'Unspecified'
                node_name = SubElement(node_object, 'truncated')
                node_name.text = '0'

        image_path = VOCRoot + '/JPEGImages/' + filename
        xml = tostring(node_root, pretty_print=True)  # 'annotation'
        dom = parseString(xml)
        xml_name = filename.replace('.jpg', '.xml')
        xml_path = VOCRoot + '/Annotations/' + xml_name
        with open(xml_path, 'wb') as f:
            f.write(xml)
        # widerDir = '../WiderPerson'  # 数据集所在的路径
        shutil.copy(img_path, '../VOC2007/JPEGImages/' + filename)