Python遍历某个目录的方法

最新推荐文章于 2024-07-15 20:51:08 发布

white-poplar

最新推荐文章于 2024-07-15 20:51:08 发布

阅读量432

点赞数 1

分类专栏：计算机视觉实践

本文链接：https://blog.csdn.net/opencv_fjc/article/details/105581255

版权

计算机视觉实践专栏收录该内容

15 篇文章 5 订阅

订阅专栏

假设任务要求如下：
用python遍历一个指定路径的图片文件夹，并将该目录下的所有路径输出到一个txt文件。

1、通过某个函数来实现对文件夹的遍历

import os
import sys
def listfiles(sourcedir,txtpath,label):
    ftxtfile=open(txtpath,'w')
    
    #利用os.walk打开源文件夹，返回的是一个迭代器列表。
    list_dirs=os.walk(sourcedir)
    #统计该目录下有多少个目录
    dir_count=0
     #统计该目录下总共有多少个源文件。
    file_count=0   
    for root,dirs,files in list_dirs:
        
         for d in dirs:
             os.path.join(root,d)
             dir_count=dir_count+1
            
         for file in files:
             filepath=os.path.join(root,file)
             file_count=file_count+1
             ftxtfile.write(os.path.join(filepath+''+str(label)+'\n'))#如果为图像文件，需要为该文件添加标签。
    print('The sourcedir has {} dirs,{} files'.format(dir_count,file_count))

执行该函数时，只需要传入源目录source_dir,需要写入的txt文件名：txtpath（包含详细地址）以及文件的标签:label。

listfiles(sys.argv[1],sys.argv[2],sys.argv[3])

2、通过某个类实现对目录的遍历处理

任务要求如下：
定义一个类，这个类将用于批量处理指定路径下面的文件夹内的图像，要求该类包括若干功能：
(1)、统一图像的格式为jpg。
(2)、将多个图片文件夹构造成一个用于图像分类的数据集，产生txt文件，每一行的格式为：图片路径标签。
(3)、txt文件按照7:3的比例分为训练集和测试集，并随机打乱顺序。
分析：任务中要求对所有的图片进行重新打乱制作训练集和测试集，则需要以下几个步骤：
(1)遍历所有的图片：
(2)对所有的图片统一格式（图片后缀名)JPG格式。
(3)将所有的图片打乱顺序，制作数据集。具体操作为：将所有的图片路径打乱排放，且数据集中包含所有的图片的类别。

class GenerateDatasets():
    
    '''
    初始化条件：需要知道原数据集所在的目录在哪里
    '''
    def __init__(self,source_dir):
        self.root_dir=source_dir
        self.sub_dirs=[]#用于保存源目录下各个类别文件夹名字。
        self.lines=[]#用于保存所有的图像文件
   
        
    def looksubdirs(self):
        '''
        用于遍历源目录下所有的类别目录文件夹,并保存在self.sub_dirs中
        '''
        list_dirs=os.walk(self.root_dir)
        
        for root,dirs,files in list_dirs:
            for d in dirs:
                print('The current sub_dir is {}'.format(os.path.join(root,dirs)))
                self.sub_dirs.append(os.path.join(root,dirs))


    
    def reformat(self):
        '''   
 经过looksubdirs()处理，已经得到了源目录下各个子文件夹(各类别文件目录)的绝对目录，
 下一步只需一次遍历各个子目录即可进行格式统一操作。
        '''
        label=0#一个子目录为一个类别。为图像打标签使用。
        for d in self.sub_dirs:
           sub_list_dirs=os.walk(d)
           for sub_root,sub_dirs,sub_files in sub_list_dirs:
               source_file=os.path.join(sub_root,sub_files)
               print('The current file is {}'.format(source_file))
               source_f=source_file.split('.')[-1]
               if source_f != ".JPG":
                   img=cv2.imread(source_file)
                   re_file=source_file.replace(source_f,'.JPG')
                   print('The current file new name is {}'.format(re_file))
                   cv2.imwrite(img,re_file+''+str(label))
                   os.remove(source_file)#删除掉原来的图像文件。
                   self.lines.append(re_file+''+str(label)+'/n')#添加换行符是为了避免在后续的txt文件中造成串行。
           label=label+1#一个子目录为一个类别。
            
    def generatetxe(self,trtxtpath,tstxtpath,rate):
        
        '''
        经过reformat函数，所有的图像文件的格式已经统一为JPG格式，且各自都贴好了标签。
        只需将所有的图片路径打乱顺序再按比例分批次即可
        '''
        if len(self.lines):
            self.lines=random.shuffle(self.lines)
            len_train=int(len(self.lines)*rate)
            ftrain=open(trtxtpath,'w')
            ftest=open(tstxtpath,'w')
            for i in range(0,len_train):
                ftrain.write(self.lines[i])
            for j in range(len_train,len(self.lines)):
                ftest.write(self.lines[j])
        
        print('The data has generated completely')

执行该函数即可：

generage_data=GenerateDatasets(sys.argv[1])
generage_data.looksubdirs
generage_data.reformat
generage_data.generatetxe(sys.argv[1],sys.argv[2],sys.argv[3])

white-poplar

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Python遍历某个目录的方法

假设任务要求如下：用python遍历一个指定路径的图片文件夹，并将该目录下的所有路径输出到一个txt文件。1、通过某个函数来实现对文件夹的遍历import osimport sysdef listfiles(sourcedir,txtpath,label): ftxtfile=open(txtpath,'w') #利用os.walk打开源文件夹，返回的是一个迭代...
复制链接

扫一扫

专栏目录