最近在筛选用labelImg标注的图片文件时,发现有大量未标注的图片不能够进行训练,于是写了个根据已有标签文件名筛选同名图片的程序:
'''
Description:
version: 1.0.0
Author: yuanruiyi
Date: Thu, 23 Feb 2023 09:57:00
LastEditors: yuanruiyi
LastEditTime: Thu, 23 Feb 2023 09:57:00
todo: 筛选图片中有对应标签文件名的图片
'''
# 使用方法:python fileFilter.py path1 path2 path3
# -path1 被筛选文件夹
# -path2 对照文件夹
# -path3 保存文件夹
import os
import glob
import sys
from rich.progress import track
import cv2
def dirlist(path, allfile):
filelist = os.listdir(path)
for filename in filelist:
filepath = os.path.join(path, filename)
if os.path.isdir(filepath):
dirlist(filepath, allfile)
else:
allfile.append(filepath)
return allfile
folderPath1 = sys.argv[1]
filePath1 = []
fileName1 = []
fileType1 = ""
dirlist(folderPath1, filePath1)
print("被筛选文件夹:" + folderPath1)
folderPath2 = sys.argv[2]
filePath2 = []
fileName2 = []
fileType2 = ""
dirlist(folderPath2, filePath2)
print("对照文件夹:" + folderPath2)
folderPath3 = sys.argv[3]
print("保存文件夹:" + folderPath3)
#遍历所有文件,获取文件名称(不含后缀)
for item in filePath1:
fileName1.append(os.path.basename(item).split('.')[0])
for item in filePath2:
fileName2.append(os.path.basename(item).split('.')[0])
##遍历被筛选文件夹,转存与对照文件夹中同名的文件
for item2 in track(fileName2):
for item1 in fileName1:
if item1 == item2:
path_in = str(folderPath1) + item1 +".jpg"
img = cv2.imread(str(path_in))
# 显示当前图片
# cv2.imshow("temp",img)
# cv2.waitKey(1)
path_out = str(folderPath3) + item1 + ".jpg"
print(path_out)
cv2.imwrite(path_out, img)
函数的总体思路参考这位老哥的博客:重名文件筛选
筛选部分参考这个博客的方法:获取文件名不加后缀
注意:需要安装rich和cv2库才可正常运行。
如果本篇博文对你有帮助的话,麻烦到github(fileFilter)帮忙点个小星星~