YOLOv5训练KAIST数据集

胖刺客阿七

已于 2023-05-15 21:17:38 修改

阅读量5k

点赞数 13

分类专栏：数据集处理文章标签：深度学习 python

于 2021-10-15 15:06:38 首次发布

本文链接：https://blog.csdn.net/onepunch_k/article/details/120749477

版权

数据集处理专栏收录该内容

4 篇文章 2 订阅

订阅专栏

YOLOv5训练KAIST数据集

YOLOv5目前比较火热，因此模型下载和使用不再赘述，网上资源很多，该文章主要是介绍如何将KAIST数据集处理成YOLOv5可以使用的格式。

一、数据获取

1.KAIST数据集介绍：

KAIST 行人数据集中的每张图片都由一张可见光图片和与之对应的长波红外图像组成。
KAIST 训练集由 50172 对在全天候(白天和夜间)采集的可见光与长波红外配对图像(分辨率为 640x512)和 13853 个行人矩形框标注组成。
KAIST 测试集由2252 对可见光与长波红外配对图像和 1356 个行人矩形框标注组成,其中 1455 对图像在白天采集得到,797 对图像在夜间采集得到。

2.KAIST数据集地址：

https://github.com/SoonminHwang/rgbt-ped-detection

注：该数据集总共有12个set，其中set00-set05为train数据集，set06-set11为test数据集，且每部分前3个set为白天，后3个为晚上。

二、数据处理

1.格式转换：vbb转xml

KAIST数据的annotations是vbb格式，而yolo用的是txt格式，因此需要经过转换才可用于训练。GitHub上数据集也有给了处理的代码，这里我直接贴一份吧。

#-*- coding:utf-8 -*-
# change vbb to xml
# ------------kevin---------------

import os, glob
from scipy.io import loadmat
from collections import defaultdict
import numpy as np
from lxml import etree, objectify

import pdb

IMAGE_SIZE = (640, 512)  # KAIST Multispectral Benchmark


def vbb_anno2dict(vbb_file, sub_dir):
    vid_name = os.path.splitext(os.path.basename(vbb_file))[0]
    annos = defaultdict(dict)
    vbb = loadmat(vbb_file)
    # object info in each frame: id, pos, occlusion, lock, posv

    objLists = vbb['A'][0][0][1][0]
    objLbl = [str(v[0]) for v in vbb['A'][0][0][4][0]]

    nFrame = int(vbb['A'][0][0][0][0][0])
    maxObj = int(vbb['A'][0][0][2][0][0])
    objInit = vbb['A'][0][0][3][0]

    for frame_id, obj in enumerate(objLists):

        frame_name = '/'.join([sub_dir, vid_name, 'I{:05d}'.format(frame_id)])
        annos[frame_name] = defaultdict(list)
        annos[frame_name]["id"] = frame_name

        if len(obj[0]) > 0:
            for id, pos, occl, lock, posv in zip(
                    obj['id'][0], obj['pos'][0], obj['occl'][0],
                    obj['lock'][0], obj['posv'][0]):
                id = int(id[0][0]) - 1  # for matlab start from 1 not 0
                pos = pos[0].tolist()
                occl = int(occl[0][0])
                lock = int(lock[0][0])
                posv = posv[0].tolist()

                annos[frame_name]["label"].append(objLbl[id])
                annos[frame_name]["occlusion"].append(occl)
                annos[frame_name]["bbox"].append(pos)

    return annos


def instance2xml_base(anno, img_size, bbox_type='xyxy'):
    """bbox_type: xyxy (xmin, ymin, xmax, ymax); xywh (xmin, ymin, width, height)"""
    assert bbox_type in ['xyxy', 'xywh']

    E = objectify.ElementMaker(annotate=False)
    anno_tree = E.annotation(
        E.folder('KAIST Multispectral Ped Benchmark'),
        E.filename(anno['id']),
        E.source(
            E.database('KAIST pedestrian'),
            E.annotation('KAIST pedestrian'),
            E.image('KAIST pedestrian'),
            E.url('https://soonminhwang.github.io/rgbt-ped-detection/')
        ),
        E.size(
            E.width(img_size[0]),
            E.height(img_size[1]),
            E.depth(4)
        ),
        E.segmented(0),
    )
    for index, bbox in enumerate(anno['bbox']):
        bbox = [float(x) for x in bbox]
        if bbox_type == 'xyxy':
            xmin, ymin, w, h = bbox
            xmax = xmin + w
            ymax = ymin + h
        else:
            xmin, ymin, xmax, ymax = bbox

        E = objectify.ElementMaker(annotate=False)

        anno_tree.append(
            E.object(
                E.name(anno['label'][index]),
                E.bndbox(
                    E.xmin(xmin),
                    E.ymin(ymin),
                    E.xmax(xmax),
                    E.ymax(ymax)
                ),
                E.pose('unknown'),
                E.truncated(0),
                E.difficult(0),
                E.occlusion(anno["occlusion"][index])
            )
        )
    return anno_tree


def parse_anno_file(vbb_inputdir, vbb_outputdir):
    # annotation sub-directories in hda annotation input directory
    assert os.path.exists(vbb_inputdir)
    sub_dirs = os.listdir(vbb_inputdir)

    for sub_dir in sub_dirs:
        print("Parsing annotations (vbb): {}".format(sub_dir))
        vbb_files = glob.glob(os.path.join(vbb_inputdir, sub_dir, "*.vbb"))

        for vbb_file in vbb_files:
            annos = vbb_anno2dict(vbb_file, sub_dir)
            if annos:
                vbb_outdir = os.path.join(vbb_outputdir, sub_dir, os.path.basename(vbb_file).split('.')[0])
                print("vbb_outdir: {}".format(vbb_outdir))

                if not os.path.exists(vbb_outdir):
                    os.makedirs(vbb_outdir)

                for filename, anno in sorted(annos.items(), key=lambda x: x[0]):
                    # if "bbox" in anno:
                    anno_tree = instance2xml_base(anno, IMAGE_SIZE)
                    outfile = os.path.join(vbb_outputdir, os.path.splitext(filename)[0] + ".xml")

                    print("outfile: {}".format(outfile))
                    etree.ElementTree(anno_tree).write(outfile, pretty_print=True)


if __name__ == "__main__":
    vbb_inputdir = "/home/kevin/PycharmProjects/Detection/KAIST/annotations/"
    xml_outputdir = "/home/kevin/PycharmProjects/Detection/KAIST/Annotations/"


    parse_anno_file(vbb_inputdir, xml_outputdir)

只要修改input和output即可。
把生成的xml文件合并到一个文件夹里，统一处理成txt（也可以先处理成txt再放入同一个文件夹，形成labels文件夹)
合并xml的代码：

#-*- coding:utf-8 -*-
# move all the xml to one folder
# ------------kevin---------------

import os
import shutil
from tqdm import tqdm

filepath = "/home/kevin/PycharmProjects/Detection/KAIST/Annotations/train"
output_path = "/home/kevin/PycharmProjects/Detection/KAIST_processed/annotations/train/"
files_1 = os.listdir(filepath)    # set

filetext = open("/home/kevin/PycharmProjects/Detection/KAIST_processed/annotations/train.txt", "w")
for filename_1 in tqdm(files_1):
    tmp_path_1 = os.path.join(filepath, filename_1)
    if os.path.isdir(tmp_path_1):
        files_2 = os.listdir(tmp_path_1)    # Vxxx
        for filename_2 in tqdm(files_2):
            tmp_path_2 = os.path.join(tmp_path_1, filename_2)
            if os.path.isdir(tmp_path_2):   # xml
                files_3 = os.listdir(tmp_path_2)
                for filename_3 in files_3:
                    tmp_path_3 = os.path.join(tmp_path_2, filename_3)
                    new_filename = filename_1 + filename_2 + "visible" +filename_3
                    new_path = output_path + new_filename
                    shutil.copy(tmp_path_3, new_path)
                    content = new_path + '\n'
                    filetext.write(content)

filetext.close()

同样，图片也要合并到一起。yolo可以用png和jpg格式的，所以这里不需要对图片格式进行转换。
合并图片的代码：

#-*- coding:utf-8 -*-
# move all images to one folder
# ------------kevin---------------

import os
import shutil
from tqdm import tqdm

filepath = "/home/kevin/PycharmProjects/Detection/KAIST/images/test"
output_path = "/home/kevin/PycharmProjects/Detection/KAIST_processed/images/visible/val/"
files_1 = os.listdir(filepath)    # set

filetext = open("/home/kevin/PycharmProjects/Detection/KAIST_processed/images/visible/val.txt", "w")
for filename_1 in tqdm(files_1):
    tmp_path_1 = os.path.join(filepath, filename_1)
    if os.path.isdir(tmp_path_1):
        files_2 = os.listdir(tmp_path_1)    # Vxxx
        for filename_2 in tqdm(files_2):
            tmp_path_2 = os.path.join(tmp_path_1, filename_2)
            if os.path.isdir(tmp_path_2):   # lwir or visible
                files_3 = os.listdir(tmp_path_2)
                for filename_3 in files_3:
                    if filename_3 == "visible":     # choose lwir or visible
                        tmp_path_3 = os.path.join(tmp_path_2, filename_3)
                        files_4 = os.listdir(tmp_path_3)
                        for filename_4 in files_4:
                            tmp_path_4 = os.path.join(tmp_path_3, filename_4)
                            new_filename = filename_1 + filename_2 + filename_3 + filename_4
                            new_path = output_path + new_filename
                            shutil.copy(tmp_path_4, new_path)
                            content = new_path + '\n'
                            filetext.write(content)



filetext.close()

2.xml转txt

yolo使用的labels是txt格式，因此还需要将xml转为txt格式：

# -*- coding: utf-8 -*-
# change xml to txt
# ------------kevin---------------

import xml.etree.ElementTree as ET
from tqdm import tqdm
import os
from os import getcwd

sets = ['train', 'test']
classes = ["person", "people", "cyclist", "person?"]   # 改成自己的类别


def convert(size, box):
    dw = 1. / (size[0])
    dh = 1. / (size[1])
    x = (box[0] + box[1]) / 2.0 - 1
    y = (box[2] + box[3]) / 2.0 - 1
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x * dw
    w = w * dw
    y = y * dh
    h = h * dh
    return x, y, w, h

def convert_annotation(image_id):
    in_file = open(image_id)
    out_file_name = image_id[-26:-4]
    out_file = open('/home/kevin/PycharmProjects/Detection/KAIST_processed/labels/%s.txt' % (out_file_name), 'w')
    tree = ET.parse(in_file)
    root = tree.getroot()
    size = root.find('size')
    w = int(size.find('width').text)
    h = int(size.find('height').text)
    for obj in root.iter('object'):
        cls = obj.find('name').text
        if cls not in classes:
            continue
        cls_id = classes.index(cls)
        xmlbox = obj.find('bndbox')
        b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text),
             float(xmlbox.find('ymax').text))
        b1, b2, b3, b4 = b
        # 标注越界修正
        if b2 > w:
            b2 = w
        if b4 > h:
            b4 = h
        b = (b1, b2, b3, b4)
        bb = convert((w, h), b)
        out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')
    out_file.close()


for image_set in tqdm(sets):
    if not os.path.exists('/home/kevin/PycharmProjects/Detection/KAIST_processed/labels/'):
        os.makedirs('/home/kevin/PycharmProjects/Detection/KAIST_processed/labels/')
    image_ids = open('/home/kevin/PycharmProjects/Detection/KAIST_processed/annotations/test.txt').read().strip().split()
    for image_id in tqdm(image_ids):
        convert_annotation(image_id)

三、数据整理

这一步花了我比较长的时间，因为也是第一次用yolov5训练自己的数据集，加上KAIST的数据集也是第一次接触，所以走了很多弯路。
yolo训练自己数据集时，除了准备好labels以及图片，还需要再配置好以下几个文件：
①将所有的数据放到一个文件夹下，如图所示，不需要放在yolo路径下，只要在后续的配置里用绝对路径就好。
在这里插入图片描述

其中labels里面是所有的标签，txt格式，每个文件对应一张图片，文件名是图片的名字；train.txt是训练集图片的名字，val.txt是测试集图片的名字，images里面放置了所有的图片。
labels文件夹：
在这里插入图片描述
images文件夹：

train.txt/val/txt.txt文件：

②在data目录下添加一个KAIST.yaml
这个可以参照其他数据集的做，我这里按照coco数据集做的：

这里只需要改train和val的路径，然后nc是自己训练集的数量，KAIST数据集为4个类别，names也是改成对应的类别名称。
③在model目录下选择你需要的那个网络修改，我选的时yolov5l，对应的yolov5l.yaml修改如下：
在这里插入图片描述
这里只要修改nc的值即可。

四、yolo的参数设置

在这里插入图片描述
weights：预训练权重，选择对应模型的就好
cfg：选择刚刚修改过的model下面的那个yolo5l.yaml
data：训练数据选择刚刚配好的那个KAIST2.yaml(因为我第一个写的有点问题就重新弄了一个)
batch_size：我是双TITAN显卡，所以设置了64，基本上在yolo5l能够跑到大概70%的功率吧，
device：训练设备选择，我直接就选者0，1，因为不选的话不知道为什么我的没法两个显卡同时训练…

好了，基本就这些了。写下这个也是对自己这次经历的一个记录吧，算是一个总结。另外也希望能够给大家带来一点点的帮助或者参考。

胖刺客阿七

关注

13
点赞
踩
83

收藏

觉得还不错? 一键收藏
35
评论
YOLOv5训练KAIST数据集

YOLOv5训练KAIST数据集YOLOv5目前比较火热，因此模型下载和使用不再赘述，网上资源很多，该文章主要是介绍如何将KAIST数据集处理成YOLOv5可以使用的格式。一、数据获取1.KAIST数据集介绍：KAIST 行人数据集中的每张图片都由一张可见光图片和与之对应的长波红外图像组成。KAIST 训练集由 50172 对在全天候(白天和夜间)采集的可见光与长波红外配对图像(分辨率为 640x512)和 13853 个行人矩形框标注组成。KAIST 测试集由2252 对可见光与长波红外配对
复制链接

扫一扫

专栏目录