深度学习目标检测数据VisDrone2019（to yolo / voc / coco）---MMDetection数据篇

qq_41627642

已于 2024-07-04 09:22:11 修改

阅读量4.6w

点赞数 54

分类专栏： MMdetection 深度学习 COCO数据处理文章标签：深度学习目标检测计算机视觉

于 2022-05-11 18:36:02 首次发布

本文链接：https://blog.csdn.net/qq_41627642/article/details/124662888

版权

1、VisDrone2019数据集介绍

配备摄像头的无人机(或通用无人机)已被快速部署到广泛的应用领域，包括农业、航空摄影、快速交付和监视。因此，从这些平台上收集的视觉数据的自动理解要求越来越高，这使得计算机视觉与无人机的关系越来越密切。我们很高兴为各种重要的计算机视觉任务展示一个大型基准，并仔细注释了地面真相，命名为VisDrone，使视觉与无人机相遇。VisDrone2019数据集由天津大学机器学习和数据挖掘实验室AISKYEYE团队收集。基准数据集包括288个视频片段，由261908帧和10209幅静态图像组成，由各种无人机摄像头捕获，覆盖范围广泛，包括位置(来自中国相隔数千公里的14个不同城市)、环境(城市和农村)、物体(行人、车辆、自行车、等)和密度(稀疏和拥挤的场景)。请注意，数据集是在不同的场景、不同的天气和光照条件下使用不同的无人机平台(即不同型号的无人机)收集的。这些框架用超过260万个经常感兴趣的目标框手工标注，比如行人、汽车、自行车和三轮车。一些重要的属性，包括场景可见性，对象类和遮挡，也提供了更好的数据利用。
挑战主要集中在四个任务上:
(1)任务1:图像中的目标检测挑战。该任务旨在从无人机拍摄的单个图像中检测预定义类别的物体(如汽车和行人)。
(2)任务2:视频中的物体检测挑战。该任务与task 1类似，不同之处在于需要从视频中检测物体。
(3)task 3:单物体跟踪挑战。
(4)任务4:多目标跟踪挑战(multiobject tracking challenge)。
(5)任务5:人群计数挑战。该任务的目的是统计每个视频帧中的人数。

1、目标检测数据介绍

我们很高兴宣布VisDrone2021图像对象检测挑战(任务1)。该比赛旨在推动与无人机平台的对象检测的最先进技术。要求团队预测10个预定义类别(即行人、人、汽车、面包车、巴士、卡车、摩托车、自行车、遮阳篷-三轮车和三轮车
pedestrian, person, car, van, bus, truck, motor, bicycle, awning-tricycle, and tricycle)的物体边界盒，并给出实值置信度。一些很少发生的特种车辆(如机械车间卡车、叉车、油罐车)在评估中被忽略。
据DeepBlueAI团队介绍，虽然该比赛已举办多届，仍然存在以下几个难点：

大量的检测物体
部分目标过小
不同的数据分布
目标遮挡严重

2、数据下载

3、任务一，目标检测数据集

我们很高兴地宣布VisDrone2021图像对象检测挑战(任务1)。该比赛旨在推动与无人机平台的最先进的目标检测。要求团队预测10个预定义类别(pedestrian, person, car, van, bus, truck, motor, bicycle, awning-tricycle, and tricycle)的物体边界盒，并给出实值置信度。一些很少发生的特种车辆(如machineshop truck, forklift truck, and tanker)在评估中被忽略。
该挑战包含10209张静态图像(6471张用于训练，548张用于验证，3190张用于测试)，由无人机平台在不同地点和不同高度捕获，可在下载页面上下载。我们手动注释每个图像中不同类别对象的边界框。此外，我们还提供了两种有用的注释，遮挡比和截断比。具体地说，我们使用被遮挡物体的比例来定义遮挡比。截断比用来表示物体部分出现在框架外的程度。如果一个物体在一帧内没有被完全捕获，我们在帧边界上标注边界框，并根据图像外部区域估计截断比。值得一提的是，如果目标的截断比大于50%，则在评估过程中跳过该目标。关于培训和验证集的注释是公开可用的。
对于DET比赛，有三组数据和标签:训练数据、验证数据和测试挑战数据。这三组之间没有重叠。
Number of images

Dataset Training Validation Test-Challenge

Object detection in images 6,471 images 548 images 1,580 images

1、标签类别

标签从0到11分别为’ignored regions’,‘pedestrian’,‘people’,‘bicycle’,‘car’,‘van’,
‘truck’,‘tricycle’,‘awning-tricycle’,‘bus’,‘motor’,‘others’

2、注释标签

在这里插入图片描述
<bbox_left>,<bbox_top>,<bbox_width>,<bbox_height>,,<object_category>,,

<bbox_left>,<bbox_top>,<bbox_width>,<bbox_height>,,<object_category>,,
Name Description

<bbox_left> The x coordinate of the top-left corner of the predicted bounding box

<bbox_top> The y coordinate of the top-left corner of the predicted object bounding box

<bbox_width> The width in pixels of the predicted object bounding box

<bbox_height> The height in pixels of the predicted object bounding box

The score in the DETECTION file indicates the confidence of the predicted bounding box enclosing
an object instance.
The score in GROUNDTRUTH file is set to 1 or 0. 1 indicates the bounding box is considered in evaluation,
while 0 indicates the bounding box will be ignored.

<object_category> The object category indicates the type of annotated object, (i.e., ignored regions(0), pedestrian(1),
people(2), bicycle(3), car(4), van(5), truck(6), tricycle(7), awning-tricycle(8), bus(9), motor(10),
others(11))

The score in the DETECTION result file should be set to the constant -1.
The score in the GROUNDTRUTH file indicates the degree of object parts appears outside a frame
(i.e., no truncation = 0 (truncation ratio 0%), and partial truncation = 1 (truncation ratio 1% ~ 50%)).

The score in the DETECTION file should be set to the constant -1.
The score in the GROUNDTRUTH file indicates the fraction of objects being occluded (i.e., no occlusion = 0
(occlusion ratio 0%), partial occlusion = 1 (occlusion ratio 1% ~ 50%), and heavy occlusion = 2
(occlusion ratio 50% ~ 100%)).

其中：两种有用的注释：truncation截断率,occlusion遮挡率。

被遮挡的对象比例来定义遮挡率。

截断率用于指示对象部分出现在框架外部的程度。

如果目标的截断率大于50％，则会在评估过程中将其跳过。

3、数据评估

我们要求每个评估的算法以预定义的格式输出检测到的带有每个测试图像置信度得分的边界框列表。请参阅结果格式了解更多细节。与MS COCO[1]的评估协议类似，我们使用 AP, APIOU=0.50, APIOU=0.75, ARmax=1, ARmax=10, ARmax=100, and ARmax=500 metrics to evaluate the results of detection algorithms。除非另有规定，AP和AR指标是在联合(loU)值的多个交集上平均的。具体来说，我们使用十个loU阈值[0.50:0.05:0.95]。所有指标的计算允许最多500个最高得分检测每个图像(跨所有类别)。这些标准会惩罚对象检测缺失和重复检测(同一个对象实例有两个检测结果)。AP指标被用作算法排序的主要指标。下表描述了这些指标。
Measure Perfect Description
AP 100% The average precision over all 10 IoU thresholds (i.e., [0.5:0.05:0.95]) of all object categories
所有对象类别的10个IoU阈值(即[0.5:0.05:0.95])的平均精度
APIOU=0.50 100% The average precision over all object categories when the IoU overlap with ground truth is larger than 0.50
当IoU与地面真实值重叠时，所有对象类别的大于0.50的平均精度
APIOU=0.75 100% The average precision over all object categories when the IoU overlap with ground truth is larger than 0.75
ARmax=1 100% The maximum recall given 1 detection per image，给定每幅图像一次检测的最大召回率
ARmax=10 100% The maximum recall given 10 detections per image，给定每幅图像10次检测的最大召回率
ARmax=100 100% The maximum recall given 100 detections per image
ARmax=500 100% The maximum recall given 500 detections per image
以上指标是根据10个感兴趣的对象类别计算的。综合评估，我们将报告每个对象类别的性能。图像中对象检测的评估代码可以在VisDrone github上获得。

evalDET.m is the main function used to evaluate your detector -please modify the dataset path and result path -use “isImgDisplay” to display the groundtruth and detections

4、VisDrone2019目标检测数据集格式转换

4.1 转换为YOLO（TXT）格式

YOLO数据集文件夹共有两个子文件夹，一个是 images ，一个是 labels ，分别存放图片与标签txt文件，并且 images与labels的目录结构需要对应，因为yolo是先读取images图片路径，随后直接将images替换为labels来查找标签文件。如下所示：
在这里插入图片描述

每张图片对应的txt文件中，数据格式是：cls_id x y w h 其中坐标(x,y)是中心点坐标，并且是相对于图片宽高的比例值，并非绝对坐标。

新版本的yolov5中已经集成了训练visdrone数据集的配置文件，其中附带了数据集的处理方式，主要是labels的生成，可以新建一个visDrone2019_txt2txt_yolo.py文件。

'''
Author: 刘鸿燕 13752614153@163.com
Date: 2022-05-09 14:05:05
LastEditors: 刘鸿燕 13752614153@163.com
LastEditTime: 2022-05-09 15:38:09
FilePath: \VisDrone2019\data_process\visDrone2019_txt2txt_yolo.py
Description: 这是默认设置,请设置`customMade`, 打开koroFileHeader查看配置 进行设置: https://github.com/OBKoro1/koro1FileHeader/wiki/%E9%85%8D%E7%BD%AE
'''
import os
from pathlib import Path
from PIL import Image
from tqdm import tqdm

def visdrone2yolo(dir):
    def convert_box(size, box):
        #Convert VisDrone box to YOLO CxCywh box,坐标进行了归一化
        dw = 1. / size[0]
        dh = 1. / size[1]
        return (box[0] + box[2] / 2) * dw, (box[1] + box[3] / 2) * dh, box[2] * dw, box[3] * dh

    # (dir / 'labels').mkdir(parents=True, exist_ok=True)  # make labels directory
    (dir / 'Annotations_YOLO').mkdir(parents=True, exist_ok=True)  # make labels directory
    pbar = tqdm((dir / 'annotations').glob('*.txt'), desc=f'Converting {dir}')
    for f in pbar:
        img_size = Image.open((dir / 'images' / f.name).with_suffix('.jpg')).size
        lines = []
        with open(f, 'r') as file:  # read annotation.txt
            for row in [x.split(',') for x in file.read().strip().splitlines()]:
                if row[4] == '0':  # VisDrone 'ignored regions' class 0
                    continue
                cls = int(row[5]) - 1
                box = convert_box(img_size, tuple(map(int, row[:4])))
                lines.append(f"{cls} {' '.join(f'{x:.6f}' for x in box)}\n")
                with open(str(f).replace(os.sep + 'annotations' + os.sep, os.sep + 'Annotations_YOLO' + os.sep), 'w') as fl:
                    fl.writelines(lines)  # write label.txt


dir = Path(r'E:\DPL\DeepLearnData\目标检测\航空目标检测数据VisDrone\VisDrone2019')  # dataset文件夹下Visdrone2019文件夹路径
# Convert
for d in 'VisDrone2019-DET-train', 'VisDrone2019-DET-val', 'VisDrone2019-DET-test-dev':
    visdrone2yolo(dir / d)  # convert VisDrone annotations to YOLO labels

正确执行代码后，会在’VisDrone2019-DET-train’, ‘VisDrone2019-DET-val’, 'VisDrone2019-DET-test-dev三个文件夹内新生成Annotations_YOLO文件夹，用以存放将VisDrone数据集处理成YoloV5格式后的数据标签。
在这里插入图片描述

标签为yolo格式数据集划分训练集和验证集

from xml.dom.minidom import Document
import os
import cv2

# def makexml(txtPath, xmlPath, picPath):  # txt所在文件夹路径，xml文件保存路径，图片所在文件夹路径
def makexml(picPath, txtPath, xmlPath):  # txt所在文件夹路径，xml文件保存路径，图片所在文件夹路径
    """此函数用于将yolo格式txt标注文件转换为voc格式xml标注文件
    在自己的标注图片文件夹下建三个子文件夹，分别命名为picture、txt、xml
    """
    dic = {
   '0': "hat",  # 创建字典用来对类型进行转换
           '1': "person",  # 此处的字典要与自己的classes.txt文件中的类对应，且顺序要一致
           }
    files = os.listdir(txtPath)
    for i, name in enumerate(files):
        xmlBuilder = Document()
        annotation = xmlBuilder.createElement("annotation")  # 创建annotation标签
        xmlBuilder.appendChild(annotation)
        txtFile = open(txtPath + name)
        txtList = txtFile.readlines()
        img = cv2.imread(picPath + name[0:-4] + ".jpg")
        Pheight, Pwidth, Pdepth = img.shape
 
        folder = xmlBuilder.createElement("folder")  # folder标签
        foldercontent = xmlBuilder.createTextNode("driving_annotation_dataset")
        folder.appendChild(foldercontent)
        annotation.appendChild(folder)  # folder标签结束
 
        filename = xmlBuilder.createElement("filename")  # filename标签
        filenamecontent = xmlBuilder.createTextNode(name[0:-4] + ".jpg")
        filename.appendChild(filenamecontent)
        annotation.appendChild(filename)  # filename标签结束
 
        size = xmlBuilder.createElement("size")  # size标签
        width = xmlBuilder.createElement("width")  # size子标签width
        widthcontent = xmlBuilder.createTextNode(str(Pwidth))
        width.appendChild(widthcontent)
        size.appendChild(width)  # size子标签width结束
 
        height = xmlBuilder.createElement("height")  # size子标签height
        heightcontent = xmlBuilder.createTextNode(str(Pheight))
        height.appendChild(heightcontent)
        size.appendChild(height)  # size子标签height结束
 
        depth = xmlBuilder.createElement("depth")  # size子标签depth
        depthcontent = xmlBuilder.createTextNode(str(Pdepth))
        depth.appendChild(depthcontent)
        size.appendChild(depth)  # size子标签depth结束
 
        annotation.appendChild(size)  # size标签结束
 
        for j in txtList:
            oneline = j.strip().split(" ")
            object = xmlBuilder.createElement("object")  # object 标签
            picname = xmlBuilder.createElement("name")  # name标签
            namecontent = xmlBuilder.createTextNode(dic[oneline[0]])
            picname.appendChild(namecontent)
            object.appendChild(picname)  # name标签结束
 
            pose = xmlBuilder.createElement("pose")  # pose标签
            posecontent = xmlBuilder.createTextNode("Unspecified")
            pose.appendChild(posecontent)
            object.appendChild(pose)  # pose标签结束
 
            truncated = xmlBuilder.createElement("truncated")  # truncated标签
            truncatedContent = xmlBuilder.createTextNode("0")
            truncated.appendChild(truncatedContent)
            object.appendChild(truncated)  # truncated标签结束
 
            difficult = xmlBuilder.createElement("difficult")  # difficult标签
            difficultcontent = xmlBuilder.createTextNode("0")
            difficult.appendChild(difficultcontent)
            object.appendChild(difficult)  # difficult标签结束
 
            bndbox = xmlBuilder.createElement("bndbox")  # bndbox标签
            xmin = xmlBuilder.createElement("xmin")  # xmin标签
            mathData = int(((float(oneline[1])) * Pwidth + 1) - (float(oneline[3])) * 0.5 * Pwidth)
            xminContent = xmlBuilder.createTextNode(str(mathData))
            xmin.appendChild(xminContent)
            bndbox.appendChild(xmin)  # xmin标签结束
 
            ymin = xmlBuilder.createElement("ymin")  # ymin标签

最低0.47元/天解锁文章