项目场景:
python遍历图像分类数据集目录生成对应的标签文件
问题描述
`数据集没有对应的标签,需要自己遍历类别目录生成标签文件
数据集结构
modelnet40
--train
--airplane
1.png
2.png
...
--bathhub
1.png
...
...
--test
--airplane
1.png
2.png
--bathhub
1.png
...
...
解决方案:
运用pythond的os模块遍历数据集并按照文件名字顺序读取写入
这里我生成3个文件,分别是训练集的train.txt,测试集的test.txt,以及全部的all.text
import os
def traversal_and_write_img(path, mode):
root = os.path.join(path, mode) # ./modelnet40/train or ./modelnet40/test
save = os.path.join(path, mode + '.txt') # modelnet40/train.txt
txt = open(save, 'w') # 打开标签文件准备写入
class_list = sorted(os.listdir(root)) # 对./modelnet40/train按照类别名称排序
for i in range(len(class_list)): # 标签为i
class_name = class_list[i] # 类别名
img_list = sorted(os.listdir(os.path.join(root, class_name))) # 对该类别文件夹下所有文件按照名称排序
for img in img_list:
img_name = os.path.join(mode, class_name, img) # 图像目录:train/airplane/1.png
label = img_name + ' ' + str(i) + '\n' # 要写入的图像名和标签,中间用空格分隔,最后加入换行符
txt.write(label) # 写入文件
path = os.path.dirname(__file__) # 数据集的根目录: ./modelnet40
traversal_and_write_img(path, 'train') # 生成train.txt
traversal_and_write_img(path, 'test') # 生成test.txt
train_label = os.path.join(path, 'train.txt')
test_label = os.path.join(path, 'test.txt')
save = os.path.join(path, 'all.txt')
with open(train_label, 'r') as f: # 读取train.txt
train = f.readlines()
with open(test_label, 'r') as f: # 读取test.txt
test = f.readlines()
with open(save, 'w') as f: # 写入all.txt
for label in train:
f.write(label)
for label in test:
f.write(label)