在使用公开数据集进行训练的时候,发现我们只使用其中某几个标签就可以了。在网上没有找相关解决办法,看到一篇博客:yolov5只训练数据集中的某几个类别。受到启发,其中第二种办法并不适合我们,因为我的数据集共有四类,但是只想使用后三类。他提到手动更改数据集标签这种方式比较呆。于是想到,能不能写一个代码来快速更改标签文件内容。我想使用数据集中后三类,也就是说删除标签中每一行首个数值为0的这一行。
然后将后面信息进行补位。
然后再将后三类数值分别减一就OK了。最后更改nc和类别,完美!
代码如下:
分两步:
第一步:删除并补位
import os def process_file(file_path): with open(file_path, 'r') as file: lines = file.readlines() modified_lines = [] for line in lines: parts = line.split() if len(parts) > 0 and parts[0].isdigit(): x = int(parts[0]) if x != 0: modified_lines.append(line) else: modified_lines.append(line) # Write modified content back to the original file with open(file_path, 'w') as file: file.writelines(modified_lines) def process_folder(folder_path): for filename in os.listdir(folder_path): if filename.endswith(".txt"): file_path = os.path.join(folder_path, filename) process_file(file_path) if __name__ == "__main__": folder_path = "E:\\tmt\\val\\labels" # Change this to the path of your folder process_folder(folder_path)
第二步其他类别数值减一:
import os def process_file(file_path): with open(file_path, 'r') as file: lines = file.readlines() modified_lines = [] for line in lines: parts = line.split() if len(parts) > 0 and parts[0].isdigit(): x = int(parts[0]) if x == 1: modified_line = "0" + line[len(parts[0]):] elif x == 2: modified_line = "1" + line[len(parts[0]):] elif x == 3: modified_line = "2" + line[len(parts[0]):] else: modified_line = line else: modified_line = line modified_lines.append(modified_line) # Write modified content back to the original file with open(file_path, 'w') as file: file.writelines(modified_lines) def process_folder(folder_path): for filename in os.listdir(folder_path): if filename.endswith(".txt"): file_path = os.path.join(folder_path, filename) process_file(file_path) if __name__ == "__main__": folder_path = "E:\\tmt\\train\\labels" # Change this to the path of your folder process_folder(folder_path)