Python 利用 dfs 思想打印文件夹目录结构

本文链接：https://blog.csdn.net/qq_44614115/article/details/113089847

本文档介绍了如何使用os模块遍历文件系统，统计文件类型，并以深度优先搜索（DFS）的方式打印文档树。通过os.walk和os.listdir获取文件信息，借助os.path.splitext判断文件类型，使用dfs思想记录层级，并将结果输出到CSV文件，以避免缓冲区满的问题。整个程序实现了文件类型的统计和目录结构的可视化。

摘要由CSDN通过智能技术生成

昨天本人使用 os.walk 函数实现了一个比较简单的统计文件中项目类型的程序，然后就想着能不能通过相关的函数实现一个打印文档树的程序，试了一下还挺简单，记录一下。

首先我们需要解决几个比较基本的问题：

如何获取文件的名称

os.walk 函数

这个函数返回的有三个参数，第三个参数为该文件夹中所有的文件(不包括子目录)，我们需要的是子目录里面的文件名称，理论上可以做，但我们选择更为简便的 os.listdir 函数

os.listdir(path) 方法用于返回指定的文件夹包含的文件或文件夹的名字的列表。

如何判断文件的类型

在之前的博客里面提到过，我们可以借用 os.path.splitext 函数分割后的后缀名进行判断。

遇到文件夹的时候，我们可以使用 os.path.isdir(current_file) 来判断当前文件路径是否为文件夹。这样我们就可以通过其返回的判断来决定要不要对文件再次操作（即把为文件夹的文件，作为一个新的顶级目录进行遍历，递归的思想）

    type_list = ['.md', '.csv', '.pdf', '.txt', '.png', '.html', '.xml', '.py', '其他', '文件夹']
    type_count = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    
    try:
        ind = type_list.index(os.path.splitext(files)[1])
        type_count[ind] = type_count[ind] + 1
    except:
    	type_count[ind] = type_count[ind] + 1
    	continue

如何遍历并记录层级

既然是一个文档树的形式，我们就可以想到两种遍历数的方法：dfs 和 bfs 。我们要求在每一个目录下再把其目录下的文件输出，即每一个目录是一个节点，我们需要将节点里的目录打印出来，显而易见这是 dfs 的思想。如果是 bfs 的思想，应该是吧一级目录打印出来完，接着再打印二级目录，和我们的需求不同，因此我们要用dfs思想，对于不同层的目录，我们需要一个记录dfs深度的标志,利用标志的深度来进行打印。

    if os.path.isdir(current_file): # 如果当前路径是一个文件夹，那么把它作为目录进行遍历
    	type_count[9] = type_count[9] + 1
    	get_list(current_file, index + 1, csv_write)

打印缓存问题

如果当前目录比较多的话，可能会出现缓冲区满的问题，我们可以将其写入一个文件中，这样既方便查阅，有避免了缓冲区满的问题。这里使用了csv文件。

    def create_csv(pathname, index):
        path = "resList.csv"
        with open(path, 'w+', newline='') as f:
            csv_write = csv.writer(f)
            get_list(pathname, index, csv_write)

整体代码如下：

import os
import csv

type_list = ['.md', '.csv', '.pdf', '.txt', '.png', '.html', '.xml', '.py', '其他', '文件夹']
type_count = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]


def create_csv(pathname, index):
    path = "resList.csv"
    with open(path, 'w+', newline='') as f:
        csv_write = csv.writer(f)
        get_list(pathname, index, csv_write)


def get_list(pathname, index, csv_write):
    for files in os.listdir(pathname):
        ind = 8
        type_count[ind] = type_count[ind] + 1
        current_file = pathname + '\\' + files
        list1 = []
        for i in range(index):
            list1.append('|--------')
        list1.append(files)
        print(list1)
        csv_write.writerow(list1)
        #  print('|---' * index + files)
        if os.path.isdir(current_file):
            type_count[9] = type_count[9] + 1
            get_list(current_file, index + 1, csv_write) # 每次遇到文件夹对更深层进行搜索遍历，index+1
        try:
            ind = type_list.index(os.path.splitext(files)[1])
            type_count[ind] = type_count[ind] + 1 # 用于统计文件种类
        except:
            type_count[ind] = type_count[ind] + 1
            continue


if __name__ == '__main__':
    current_dir = os.getcwd()
    create_csv(current_dir, 1)
    for i in range(len(type_list)):
        print(type_list[int(i)] + ':' + str(type_count[int(i)]))