Visdrone目标检测实验（含数据格式转换）

最新推荐文章于 2023-11-13 09:43:30 发布

肆十二

最新推荐文章于 2023-11-13 09:43:30 发布

阅读量5.2k

点赞数 3

分类专栏：目标检测文章标签：深度学习 python 计算机视觉人工智能

本文链接：https://blog.csdn.net/ECHOSON/article/details/109644666

版权

目标检测专栏收录该内容

9 篇文章 12 订阅

订阅专栏

Visdrone目标检测实验（含数据格式转换）

目前的初步计划是使用ssd或者fasterrcnn或者目前比较新的这个yolov4出来，来处理数据集

github地址：https://github.com/cmFighting/visdrone_detection

数据集介绍

主页：http://aiskyeye.com/challenge/object-detection/

We are pleased to announce the VisDrone2020 Object Detection in Images Challenge (Task 1). This competition is designed to push the state-of-the-art in object detection with drone platform forward. Teams are required to predict the bounding boxes of objects of ten predefined classes (i.e., pedestrian, person, car, van, bus, truck, motor, bicycle, awning-tricycle, and tricycle) with real-valued confidences. Some rarely occurring special vehicles (e.g., machineshop truck, forklift truck, and tanker) are ignored in evaluation.
The challenge containing 10,209 static images (6,471 for training, 548 for validation and 3,190 for testing) captured by drone platforms in different places at different height, are available on the download page. We manually annotate the bounding boxes of different categories of objects in each image. In addition, we also provide two kinds of useful annotations, occlusion ratio and truncation ratio. Specifically, we use the fraction of objects being occluded to define the occlusion ratio. The truncation ratio is used to indicate the degree of object parts appears outside a frame. If an object is not fully captured within a frame, we annotate the bounding box across the frame boundary and estimate the truncation ratio based on the region outside the image. It is worth mentioning that a target is skipped during evaluation if its truncation ratio is larger than 50%. Annotations on the training and validation sets are publicly available.

原始的数据集中一共提供了以下11个类，当用SSD代码进行训练的时候，需要额外添加一个背景类，也就是在ssd代码中使用了12个类，本文使用ssd300.

'pedestrian', 'people', 'bicycle', 'car', 'van','truck', 'tricycle', 'awning-tricycle', 'bus', 'motor', 'others'

官方提供的数据中训练集是6471、验证集是548、测试集是1610。其中训练集用来训练模型，验证集用来保存模型，测试集用来测试模型。

visdrone标注形式

visdrone也采用了xml作为标注文件，示例如下：

http://aiskyeye.com/evaluate/results-format/
684,8,273,116,0,0,0,0
406,119,265,70,0,0,0,0
255,22,119,128,0,0,0,0
1,3,209,78,0,0,0,0
从左到右依次是<bbox_left>,<bbox_top>,<bbox_width>,<bbox_height>,<score>,<object_category>,<truncation>,<occlusion>

需要注意的是在转化为voc标注形式的时候需要事先根据做左上角的位置求解出右下角的位置，后面的四个位置表示得分、物体类别、截断和遮挡，这部分内容在做转化的时候其实可以暂时不考虑，无所谓，直接使用默认的即可。

voc标注形式

voc官网：http://host.robots.ox.ac.uk/pascal/VOC/

voc数据集简介：https://blog.csdn.net/mzpmzk/article/details/88065416

组织结构

.
├── Annotations 进行 detection 任务时的标签文件，xml 形式，文件名与图片名一一对应
├── ImageSets 包含三个子文件夹 Layout、Main、Segmentation，其中 Main 存放的是分类和检测的数据集分割文件
├── JPEGImages 存放 .jpg 格式的图片文件
├── SegmentationClass 存放按照 class 分割的图片
└── SegmentationObject 存放按照 object 分割的图片

├── Main
│   ├── train.txt 写着用于训练的图片名称， 共 2501 个
│   ├── val.txt 写着用于验证的图片名称，共 2510 个
│   ├── trainval.txt train与val的合集。共 5011 个
│   ├── test.txt 写着用于测试的图片名称，共 4952 个

xml解析（以目标检测为例）

<annotation>
	<folder>VOC2007</folder>
	<filename>000001.jpg</filename>  # 文件名 
	<source>
		<database>The VOC2007 Database</database>
		<annotation>PASCAL VOC2007</annotation>
		<image>flickr</image>
		<flickrid>341012865</flickrid>
	</source>
	<owner>
		<flickrid>Fried Camels</flickrid>
		<name>Jinky the Fruit Bat</name>
	</owner>
	<size>  # 图像尺寸, 用于对 bbox 左上和右下坐标点做归一化操作
		<width>353</width>
		<height>500</height>
		<depth>3</depth>
	</size>
	<segmented>0</segmented>  # 是否用于分割
	<object>
		<name>dog</name>  # 物体类别
		<pose>Left</pose>  # 拍摄角度：front, rear, left, right, unspecified 
		<truncated>1</truncated>  # 目标是否被截断（比如在图片之外），或者被遮挡（超过15%）
		<difficult>0</difficult>  # 检测难易程度，这个主要是根据目标的大小，光照变化，图片质量来判断
		<bndbox>
			<xmin>48</xmin>
			<ymin>240</ymin>
			<xmax>195</xmax>
			<ymax>371</ymax>
		</bndbox>
	</object>
	<object>
		<name>person</name>
		<pose>Left</pose>
		<truncated>1</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>8</xmin>
			<ymin>12</ymin>
			<xmax>352</xmax>
			<ymax>498</ymax>
		</bndbox>
	</object>
</annotation>

分类标准

以本文的方法来说，我们主要是做一个目标检测的任务，其中一个xml中标注了多个object，name表示是物体的类别，然后采用左上角和右下角标注bbox。

yolo标注形式

YOLO数据集txt标注格式：

0 0.160938 0.541667 0.120312 0.386111
标注内容的类别、归一化后的中心点x坐标，归一化后的中心点y坐标，归一化后的目标框宽度w，归一化后的目标况高度h（此处归一化指的是除以图片宽和高）

重点 voc和coco之间的转化公式

import xml.etree.ElementTree as ET
import pickle
import os
from os import listdir, getcwd
from os.path import join
import sys

sets=[('2018', 'train'), ('2018', 'val')]

classes = ["a", "b", "c", "d"]

# soft link your VOC2018 under here
root_dir = sys.argv[1]


def convert(size, box):
    dw = 1./(size[0])
    dh = 1./(size[1])
    x = (box[0] + box[1])/2.0 - 1
    y = (box[2] + box[3])/2.0 - 1
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x*dw
    w = w*dw
    y = y*dh
    h = h*dh
    return (x,y,w,h)

def convert_annotation(year, image_id):
    in_file = open(os.path.join(root_dir, 'VOC%s/Annotations/%s.xml'%(year, image_id)))
    out_file = open(os.path.join(root_dir, 'VOC%s/labels/%s.txt'%(year, image_id)), 'w')
    tree=ET.parse(in_file)
    root = tree.getroot()
    size = root.find('size')
    w = int(size.find('width').text)
    h = int(size.find('height').text)

    for obj in root.iter('object'):
        difficult = obj.find('difficult').text
        cls = obj.find('name').text
        if cls not in classes or int(difficult)==1:
            continue
        cls_id = classes.index(cls)
        xmlbox = obj.find('bndbox')
        b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text))
        bb = convert((w,h), b)
        out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')

wd = getcwd()

for year, image_set in sets:
    labels_target = os.path.join(root_dir, 'VOC%s/labels/'%(year))
    print('labels dir to save: {}'.format(labels_target))
    if not os.path.exists(labels_target):
        os.makedirs(labels_target)
    image_ids = open(os.path.join(root_dir, 'VOC{}/ImageSets/Main/{}.txt'.format(year, image_set))).read().strip().split()
    list_file = open(os.path.join(root_dir, '%s_%s.txt'%(year, image_set)), 'w')
    for image_id in image_ids:
        img_f = os.path.join(root_dir, 'VOC%s/JPEGImages/%s.jpg\n'%(year, image_id))
        list_file.write(os.path.abspath(img_f))
        convert_annotation(year, image_id)
    list_file.close()

print('done.')

coco标注形式

参考这篇文章：https://blog.csdn.net/qq_41375609/article/details/94737915

[x,y,width,height] 边界框表示形式

Yolo实验

参考代码：https://github.com/eriklindernoren/PyTorch-YOLOv3

服务器位置：/mnt/data/scm/2020/ai/yolo/PyTorch-YOLOv3

pytorch 1.0

tensorflow 1.14

训练指令：

CUDA_VISIBLE_DEVICES=1 python train.py --model_def config/yolov3-custom.cfg --data_config config/custom.data --pretrained_weights weights/darknet53.conv.74

进程信息
(ai-yolo) supermicro@supermicro:/mnt/data/scm/2020/ai/yolo/PyTorch-YOLOv3$ CUDA_VISIBLE_DEVICES=1 nohup python train.py --model_def config/yolov3-custom.cfg --data_config config/custom.data --pretrained_weights weights/darknet53.conv.74 &
[1] 10816

实验结果

1610张测试图像
epoch98
Detecting objects: 100%|██████████| 202/202 [01:35<00:00,  2.12it/s]
Computing AP:   0%|          | 0/11 [00:00<?, ?it/s]
Computing AP:  36%|███▋      | 4/11 [00:00<00:00, 24.15it/s]
Computing AP: 100%|██████████| 11/11 [00:00<00:00, 52.46it/s]ap_class:[ 0  1  2  3  4  5  6  7  8  9 10]
class_names:['pedestrian', 'people', 'bicycle', 'car', 'van', 'truck', 'tricycle', 'awning-tricycle', 'bus', 'motor', 'others']
+-------+-----------------+---------+
| Index | Class name      | AP      |
+-------+-----------------+---------+
| 0     | pedestrian      | 0.08822 |
| 1     | people          | 0.02340 |
| 2     | bicycle         | 0.00165 |
| 3     | car             | 0.43279 |
| 4     | van             | 0.07407 |
| 5     | truck           | 0.07747 |
| 6     | tricycle        | 0.00995 |
| 7     | awning-tricycle | 0.01727 |
| 8     | bus             | 0.25008 |
| 9     | motor           | 0.05405 |
| 10    | others          | 0.00366 |
+-------+-----------------+---------+
---- mAP 0.09387386786609676

fps: 19.17227602398063
83.97542357444763

SSD实验

参考代码：https://github.com/amdegroot/ssd.pytorch

服务器位置：/data1/scm/ssd/code/SSD0414/

实验结果

1610张测试图像
image_sets_file:/data1/scm/ssd/data/voc/VisDrone_ROOT/DET2019/ImageSets/Main/test.txt
2020-11-04 14:56:01,130 SSD.inference INFO: Evaluating VisDrone_2019__test dataset(1610 images):
100%|██████████████████████████████████████████████████████████████████| 161/161 [00:17<00:00,  9.22it/s]
2020-11-04 14:56:20,698 SSD.inference INFO: mAP: 0.1524
pedestrian      : 0.1170
people          : 0.0909
bicycle         : 0.0909
car             : 0.4377
van             : 0.1740
truck           : 0.2258
tricycle        : 0.1048
awning-tricycle : 0.0413
bus             : 0.3754
motor           : 0.0800
others          : 0.0909
24.98157835006714
FPS:64.44748916337679
测试图像结果
E:\datas\ai\voc\VisDrone_ROOT\VisDrone2019\Images_split\ssd_result

肆十二

关注

3
点赞
踩
27

收藏

觉得还不错? 一键收藏
打赏
3
评论
Visdrone目标检测实验（含数据格式转换）

Visdrone目标检测实验（含数据格式转换）目前的初步计划是使用ssd或者fasterrcnn或者目前比较新的这个yolov4出来，来处理数据集github地址：https://github.com/cmFighting/visdrone_detection数据集介绍主页：http://aiskyeye.com/challenge/object-detection/We are pleased to announce the VisDrone2020 Object Detection i
复制链接

扫一扫