python 单步调试os.walk以理解topdown

python 中遍历文件夹一般用如下代码:

    import os
    from os.path import join, getsize
    for root, dirs, files in os.walk('python/Lib/email'):
        print root, "consumes",
        print sum([getsize(join(root, name)) for name in files]),
        print "bytes in", len(files), "non-directory files"
        if 'CVS' in dirs:
            dirs.remove('CVS')  # don't visit CVS directories

root是最外层文件夹名,dirs是该root文件夹下的所有子文件夹,files是该root文件夹下的所有文件。


今天看源码的时候,有点懵逼,因为用到了 生成器yield 和 递归 

def walk(top, topdown=True, οnerrοr=None, followlinks=False):
    
    import pdb             # 这两行是博主自己加的,目的是开启单步调试。 
    pdb.set_trace()        # n:下一步, p xxx:观察xxx , l:查看所在代码行  
    
    islink, join, isdir = path.islink, path.join, path.isdir


    try:
        # Note that listdir and error are globals in this module due
        # to earlier import-*.
        names = listdir(top)
    except error, err:
        if onerror is not None:
            onerror(err)
        return

    dirs, nondirs = [], []
    for name in names:
        if isdir(join(top, name)):
            dirs.append(name)
        else:
            nondirs.append(name)

    if topdown:
        yield top, dirs, nondirs
    for name in dirs:
        new_path = join(top, name)
        if followlinks or not islink(new_path):
            for x in walk(new_path, topdown, onerror, followlinks):
                yield x
    if not topdown:
        yield top, dirs, nondirs

博主的疑问主要在于,topdown这个参数:

When topdown is true, the caller can modify the dirnames list in-place
    (e.g., via del or slice assignment), and walk will only recurse into the
    subdirectories whose names remain in dirnames; this can be used to prune the
    search, or to impose a specific order of visiting. 

看文档topdown = True的时候,可以原地修改文件夹们,然后只会递归那些还留着的文件夹,可以减少查询次数??

好吧,看得我一愣一愣的,什么鬼嘛,只好单步下看看。


    try:
        # Note that listdir and error are globals in this module due
        # to earlier import-*.
        names = listdir(top)
    except error, err:
        if onerror is not None:
            onerror(err)
        return

    dirs, nondirs = [], []
    for name in names:
        if isdir(join(top, name)):
            dirs.append(name)
        else:
            nondirs.append(name)
这一段讲的是把文件夹root下的子文件夹dirs 和 文件nondirs 分别找出来。没啥难度。


我的目录如下:

E:\projects\myApp_emits\myApp
E:\projects\myApp_emits\myApp\a.jnt
E:\projects\myApp_emits\myApp\b
E:\projects\myApp_emits\myApp\b\c.txt

我的调用函数如下:

import os
des_folder = 'e:/projects/myApp_emits/myApp'
a = os.walk(des_folder, topdown=True)

parent, dir, files = a.next()
print parent, dir, files

parent, dir, files = a.next()
print parent, dir, files


接下来这段先看topdown=True的情况,源代码简化为:

    if topdown:
        yield top, dirs, nondirs
    for name in dirs:
        new_path = join(top, name)
        if followlinks or not islink(new_path):
            for x in walk(new_path, topdown, onerror, followlinks):
                yield x

稍微解释下:

因为os.walk这个函数带有yield,那么它就不再是函数啦,是个生成器,记为a, 不断得a.next() 就可以不断返回yield后面的参数。

eg: yield top, dirs, nondirs ,那么每次a.next() 就会返回top, dirs, nondirs, 然后整个生成器挂起,直到下一个next() 触发,从yield top, dirs, nondirs这一句后的下一句继续执行,直到再次遇到yield,若没有遇到就结束啦。(奇怪,博主怎么来了一波yield讲解。。)


所以按照我们的代码结果如下:

e:/projects/myApp_emits/myApp ['b'] ['a.jnt']
e:/projects/myApp_emits/myApp\b [] ['c.txt']


再来看topdown=False的情况,源代码简化为:

    for name in dirs:
        new_path = join(top, name)
        if followlinks or not islink(new_path):
            for x in walk(new_path, topdown, onerror, followlinks):
                yield x
    if not topdown:
        yield top, dirs, nondirs


结果如下:

e:/projects/myApp_emits/myApp\b [] ['c.txt']
e:/projects/myApp_emits/myApp ['b'] ['a.jnt']


结论:

对比结果我们能知道,topdown参数其实作用很简单,True则先扫顶级目录,False则从子目录开扫,最后再扫顶级目录。


Ps:

单步遇到递归要慢点,不然容易晕,这个例子还算好的,不晕,看tornado那个yield+装饰器,分分钟让你迷失在人生道路。


  • 5
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值