使用自制数据集训练YOLOX算法

科技军饷

已于 2024-08-22 17:04:44 修改

阅读量250

点赞数 4

分类专栏： AI动手做文章标签： python json conda pip

于 2024-07-31 17:14:15 首次发布

本文链接：https://blog.csdn.net/weixin_51612528/article/details/140826557

版权

AI动手做专栏收录该内容

5 篇文章 0 订阅

订阅专栏

文章目录

1、图像标注
2、格式转换
4、小结

1、图像标注

对于图像数据集的标注，主要是用LabelImg工具，LabelImg 是一个用于图像标注的开源工具，它提供了一个用户友好的图形界面，用于手动标记图像中的物体或区域，并生成相应的标注文件。这个工具通常用于计算机视觉和机器学习项目中，尤其是目标检测任务。
LabelImg的使用方法可以参考该博主的内容，该工具在Ubuntu系统下同样可以使用，建议通过conda建立虚拟环境并通过pip进行快速安装。
由于通过LabelImg生成的json格式无法完全满足YOLOX的模型训练，所以这里使用VOC格式标注，并通过Python进行格式转换，将标注文件转换为COCO2017的格式，这样只需要将标签文件及数据集放在指定的目录结构下并修改训练参数即可开始训练。

VOC格式的标签文件如下所示：

<annotation>
	<folder>images</folder>
	<filename>000001.png</filename>
	<path>/home/build_datasets/images/000001.png</path>
	<source>
		<database>Unknown</database>
	</source>
	<size>
		<width>1242</width>
		<height>375</height>
		<depth>3</depth>
	</size>
	<segmented>0</segmented>
	<object>
		<name>car</name>
		<pose>Unspecified</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>386</xmin>
			<ymin>176</ymin>
			<xmax>422</xmax>
			<ymax>201</ymax>
		</bndbox>
	</object>
	<object>
		<name>car</name>
		<pose>Unspecified</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>597</xmin>
			<ymin>155</ymin>
			<xmax>631</xmax>
			<ymax>192</ymax>
		</bndbox>
	</object>
</annotation>

2、格式转换

通过LabelImg标注一张图片会生成一个XML文件，而COCO格式会将多张图片的标签信息放在一个json文件中，我们可以通过如下Python代码将多个XML文件信息整合到一个json中，这样就可以满足YOLOX的训练要求。

转换代码如下：

import xml.etree.ElementTree as ET  
import json  
import os  


# 处理一个xml文件，一个xml表示一个image的标签 
def xml_to_coco_item(xml_file, image_id):  
    tree = ET.parse(xml_file)  
    root = tree.getroot()  
  
    annotations = []  
    category_id_map = {'pedestrian': 0, 'stoplight': 1, 'circle': 2, 'square': 3, 'triangle': 4}   # 根据实际调整类别，键值对需要和代码中对应
    
    # 在xml中每次拿一个object标签，一个image中可能会有多个object被标注
    for obj in root.findall('object'):  
  		# 获取边界框的信息
        bbox = obj.find('bndbox')  
        # 左上角坐标
        xmin = int(bbox.find('xmin').text)  
        ymin = int(bbox.find('ymin').text) 
        # 右下角坐标 
        xmax = int(bbox.find('xmax').text)  
        ymax = int(bbox.find('ymax').text)  
  		
  		
        area = (xmax - xmin) * (ymax - ymin) 
        category_id = category_id_map[obj.find('name').text]  # 根据类别名（键）在字典中找对应的值
  
        """
        coco格式的annotation,如果不做分割，segmentation为null；
        bbox表示box的左上角坐标，box的width和height；
        image_id表示数据集中第几张图的标签，默认值从0开始；
        id表示该图中bbox的id，默认值从0开始；
        """
        annotation = {  
        
        	"segmentation": None,
        	"area": area,
        	"iscrowd": 0,
        	"image_id": image_id,
        	"bbox": [xmin, ymin, xmax - xmin, ymax - ymin],
        	"category_id": category_id,
            "id": len(annotations)  
        }  
        
        annotations.append(annotation)  
  
    return annotations  


# 将多个xml文件合并为一个coco format json文件
def xml_files_to_coco_json(xml_dir, json_file): 
 
    coco_format = {
      
        "images": [],  
        "annotations": [],  
        "categories": 
        [  
            {"id": 0, "name": "pedestrian", "supercategory": None}, 
            {"id": 1, "name": "stoplight", "supercategory": None},  
            {"id": 2, "name": "circle", "supercategory": None},  
            {"id": 3, "name": "square", "supercategory": None},   
            {"id": 4, "name": "triangle", "supercategory": None}  
        ]  
    }  
  
    # image_id表示 处理的第几个图像，0表示第一个
    image_id = 0  
    annotation_id = 0  # 新增的全局annotation_id计数器
    
  	# 所有xml列表
    for xmlname in os.listdir(xml_dir):  
        if xmlname.endswith('.xml'):
            # 每次拿一个xml文件  
            xml_path = os.path.join(xml_dir, xmlname) # 提取文件名（不包括扩展名） 
              
            tree = ET.parse(xml_path)  
            root = tree.getroot()  
            
            filename = root.find('filename').text
  			
  			
            """
  			file_name表示图像的名字
  			height表示图像高度
  			width表示图像宽度
  			image_id用来记录第几张图像
  			"""
            image_info = {
            
            	"license": None,
                "file_name": filename,
                "coco_url": None,
                "height": 0,  
                "width": 0,   
                "date_captured": None,
                "flickr_url": None,
                "id": image_id  
            }  
            
            # 读取一个XML并获取annotations(标签)  
            annotations = xml_to_coco_item(xml_path, image_id)  
  
  			# 获取当前xml中图像的宽度和高度
            image_info["height"] = int(root.find('size/height').text)
            image_info["width"] = int(root.find('size/width').text)
            
            for annotation in annotations:  
                annotation['id'] = annotation_id  # 直接使用annotation_id  
                annotation_id += 1  # 更新annotation_id  
  
            coco_format["images"].append(image_info)  
            coco_format["annotations"].extend(annotations)  
  
            image_id += 1  
  
    # 将数据写入JSON文件  
    with open(json_file, 'w') as json_out:  
        json.dump(coco_format, json_out, ensure_ascii=False, indent=4)  


if __name__ == "__main__":
  
    xml_dir = 'path to your xml dir'  
    json_file = 'path to your json file output dir' # 注意这里需要指定json的文件名和后缀  
    xml_files_to_coco_json(xml_dir, json_file)