xml样本标签转txt

最新推荐文章于 2024-04-12 22:45:00 发布

hi我是大嘴巴

最新推荐文章于 2024-04-12 22:45:00 发布

阅读量2.8k

点赞数 1

分类专栏：实习

本文链接：https://blog.csdn.net/weixin_38740463/article/details/81867501

版权

实习专栏收录该内容

81 篇文章 1 订阅

订阅专栏

我使用的数据标注工具生成的文档如下所示：

<?xml version='1.0' encoding='GB2312'?>
<info>
    <src width="480" height="640" depth="3">00ff0abc4818a309b51180264b830211.jpg</src>
    <object id="E68519DF-E8E1-4C55-9231-CB381DE1CC5A">
        <rect lefttopx="168" lefttopy="168" rightbottomx="313" rightbottomy="340"></rect>
        <type>21</type>
        <descriinfo></descriinfo>
        <modifydate>2018-05-08 17:04:07</modifydate>
    </object>
</info>

所以我需要将这个文档中的检测框坐标点提取出来，并整理成如上所述的标准形式，形成一个 label.txt 文档
根据以上xml的形式，转换的脚本如下：

# -*- coding:utf-8 -*-

import os
from lxml import etree

##################### 以下部分用于读取xml文件，返回检测框左上角和右下角的坐标 ###################
def read_xml(in_path):
    tree = etree.parse(in_path)
    return tree

def find_nodes(tree, path):
    return tree.findall(path)

def get_obj(xml_path):
    tree = read_xml(xml_path)
    nodes = find_nodes(tree, "src")
    objects = []

    for node in nodes:
        pic_struct = {}
        pic_struct['width'] = str(node.get('width'))
        pic_struct['height'] = str(node.get('height'))
        pic_struct['depth'] = str(node.get('depth'))
        # objects.append(pic_struct)
    nodes = find_nodes(tree, "object")

    for i in range(len(nodes)):
        # obj_struct = {}
        # obj_struct['name'] = str(find_nodes(nodes[i] , 'type')[0].text)
        cl_box = find_nodes(nodes[i], 'rect')
        for rec in cl_box:
            objects = [int(rec.get('lefttopx')), int(rec.get('lefttopy')),
                       int(rec.get('rightbottomx')), int(rec.get('rightbottomy'))]
    return objects

################# 将xml的信息统一成标准形式 ################
def listFile(data_dir, suffix):
    fs = os.listdir(data_dir)
    for i in range(len(fs)-1, -1, -1):
        # 如果后缀不是.jpg就将该文件删除掉
        if not fs[i].endswith(suffix):
            del fs[i]
    return fs

def write_label(data_dir, xml_dir):
    images = listFile(data_dir, ".jpg")
    with open("label.txt", "w") as label:
        for i in range(len(images)):
            image_path = data_dir + "/" + images[i]
            xml_path = xml_dir + "/" + images[i][:-4] + ".txt"
            objects = get_obj(xml_path)
            line = image_path + " " + str(objects[0]) + " " + str(objects[1]) \
                   + " " + str(objects[2]) + " " + str(objects[3]) + "\n"
            label.write(line)

################ 主函数 ###################
if __name__ == '__main__':
    data_dir = "E:/MTCNN/Train/samples"
    xml_dir = "E:/MTCNN/Train/samples/annotation"
    write_label(data_dir, xml_dir)

整理好的 label.txt 形式为：

E:/MTCNN/Train/samples/0019c3f356ada6bcda0b695020e295e6.jpg 102 87 311 417
E:/MTCNN/Train/samples/0043e38f303b247e50b9a07cb5887b39.jpg 156 75 335 295
E:/MTCNN/Train/samples/004e26290d2290ca87e02b737a740aee.jpg 105 122 291 381
E:/MTCNN/Train/samples/00ff0abc4818a309b51180264b830211.jpg 168 168 313 340
E:/MTCNN/Train/samples/015a7137173f29e2cd4663c7cbcad1cb.jpg 127 60 332 398
E:/MTCNN/Train/samples/0166ceba53a4bfc4360e1d12b33ecb61.jpg 149 82 353 378
E:/MTCNN/Train/samples/01e6deccb55b377985d2c4d72006ee34.jpg 185 100 289 249
E:/MTCNN/Train/samples/021e34448c0ed051db501156cf2b6552.jpg 204 91 359 289
......

hi我是大嘴巴

关注

1
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
xml样本标签转txt

我使用的数据标注工具生成的文档如下所示：&lt;?xml version='1.0' encoding='GB2312'?&gt;&lt;info&gt; &lt;src width="480" height="640" depth="3"&gt;00ff0abc4818a309b51180264b830211.jpg&lt;/src&gt; &lt;o
复制链接

扫一扫