之前在做模型训练时,对图片数据进行处理:对训练的图片和xml文件重新命名为6位数前缀的名称(原始的可能有中文或者不符合使用习惯等),修改后的图片和xml格式如下:
代码如下,需要将原始的图片和对应的xml文件分别放到images和labels文件夹中:
from xml.etree.ElementTree import ElementTree
from os import walk, path
import cv2
import os
def read_xml(in_path):
tree = ElementTree()
tree.parse(in_path)
return tree
def write_xml(tree, out_path):
tree.write(out_path, encoding="utf-8", xml_declaration=True)
def get_path_prex(rootdir):
data_path = []
prefixs = []
for root, dirs, files in walk(rootdir, topdown=True):
for name in files:
pre, ending = path.splitext(name)
if ending != ".xml":
continue
else:
data_path.append(path.join(root, name))
prefixs.append(pre)
return data_path, prefixs
if __name__ == "__main__":
# build files which will be used in VOC2007
if not os.path.exists("Annotations"):
os.mkdir("Annotations")
if not os.path.exists("JPEGImages"):
os.mkdir("JPEGImages")
xml_paths, prefixs = get_path_prex("labels")
for i in range(len(xml_paths)):
# rename and save the corresponding xml
tree = read_xml(xml_paths[i])
# save output xml, 000001.xml
write_xml(tree, "Annotations/{}.xml".format("%06d" % (i + 1)))
# rename and save the corresponding image
img_pre = prefixs[i] + ".jpg"
root = os.getcwd() + '/images/'
img_path = path.join(root, img_pre)
img = cv2.imread(img_path)
# save output jpg, 000001.jpg
cv2.imwrite('JPEGImages/{}.jpg'.format("%06d" % (i + 1)), img)
运行后,会生成JPEGImages和Annotations文件夹, 对应VOC2007下的文件两个文件,分别为新命名的图片和xml文件,将得到的文件夹拷贝到VOC2007下覆盖即可。