检查JPG与其标签的XML是否一一对应
在手动标注自己的数据集,或请人帮忙标注数据集之后,应对检查是否每一个JPG文件都有一个对应的XML文件(检查是否标全),每一个XML文件是否都对应一个JPG文件(图片标了觉得图片太模糊而将图片删除,会遗留多余的XML文件)
修改JpgDir为JPG所在目录,XmlDir为XML文件所在目录,具体代码如下:
import os
XmlDir = r"D:\Program Files (x86)\Pycharm\PycharmProject\pythonProject\yolo3-pytorch-master\VOCdevkit\VOC2007\Annotations"
JpgDir = r"D:\Program Files (x86)\Pycharm\PycharmProject\pythonProject\yolo3-pytorch-master\VOCdevkit\VOC2007\JPEGImages"
NoXml = []
NoJpg = []
for root, dirs, files in os.walk(JpgDir):
for file in files:
if file[-1] == 'g':
if os.path.exists(XmlDir + "\\"+file[:-3] + "xml") is False:
NoXml.append(XmlDir+"\\"+file)
for root, dirs, files in os.walk(XmlDir):
for file in files:
if file[-1] == 'l':
if os.path.exists(JpgDir + "\\"+file[:-3] + "jpg") is False:
NoJpg.append(JpgDir+"\\"+file)
if len(NoXml) == 0:
print("All jpg are labeled")
else:
print("%d unlabeled" % len(NoXml))
print(NoXml)
if len(NoJpg) == 0:
print("All xml have a jpg")
else:
print("%d xmls have no jpg" % len(NoJpg))
print(NoJpg)
运行结果如下: