os.walk(string directoryPath)的参数是一个目录,字符串类型,返回root(根目录),directory(子目录,列表),file(子文件名,列表类型)
代码1-1.
import os
for root,dirs,files in os.walk('e://HIMYM//HIMYM-S5'):
print root,dirs,files,'\n'
输出结果:
e://HIMYM//HIMYM-S5 [] ['How I Met Your Mother S05E01 Definitions 720p WEB-DL DD5.1.mkv', 'How I Met Your Mother S05E02 Double Date 720p WEB-DL DD5.1.mkv', 'How I Met Your Mother S05E03 Robin 101 720p WEB-DL DD5.1.mkv', 'How I Met Your Mother S05E04 The Sexless Innkeeper 720p WEB-DL DD5.1.mkv', 'How I Met Your Mother S05E05 Duel Citizenship 720p WEB-DL DD5.1.mkv', 'How I Met Your Mother S05E06 Bagpipes 720p WEB-DL DD5.1.mkv', 'How I Met Your Mother S05E07 The Rough Patch 720p WEB-DL DD5.1.mkv', 'How I Met Your Mother S05E08 The Playbook 720p WEB-DL DD5.1.mkv', 'How I Met Your Mother S05E10 The Window 720p WEB-DL DD5.1.mkv', 'How I Met Your Mother S05E11 The Last Cigarette Ever 720p WEB-DL DD5.1.mkv', 'How I Met Your Mother S05E12 Girls VS. Suits 720p WEB-DL DD5.1.mkv', 'How I Met Your Mother S05E13 Jenkins 720p WEB-DL DD5.1.mkv', 'How I Met Your Mother S05E14 The Perfect Week 720p WEB-DL DD5.1 H264-PeeWee.mkv', 'How I Met Your Mother S05E15 Rabbit or Duck 720p WEB-DL DD5.1.mkv', 'How I Met Your Mother S05E16 Hooked 720p WEB-DL DD5.1.mkv', 'How I Met Your Mother S05E17 Of Course 720p WEB-DL DD5.1.mkv', 'How I Met Your Mother S05E18 Say Cheese 720p WEB-DL DD5.1.mkv', 'How I Met Your Mother S05E19 Zoo or False 720p WEB-DL DD5.1.mkv', 'How I Met Your Mother S05E20 Home Wreckers 720p WEB-DL DD5.1.mkv', 'How I Met Your Mother S05E21 Twin Beds 720p WEB-DL DD5.1.mkv', 'How I Met Your Mother S05E22 Robots vs. Wrestlers 720p WEB-DL DD5.1.mkv', 'How I Met Your Mother S05E23 The Wedding Bride 720p WEB-DL DD5.1.mkv', 'How I Met Your Mother S05E24 Doppelgangers 720p WEB-DL DD5.1-PeeWee.mkv']
代码1-1中os.walk的参数'e://HIMYM//HIMYM-S5'是一个只包含文件的目录,没有子目录,所以dirs=[].
使用os.walk经常遇到中文编码问题,当目录名或文件名中包含中文时,输出乱码,如下:
代码1-2.
import os
for root,dirs,files in os.walk('E:\\WORK_FILE\\Python\\Python2'):
print root,dirs,files,'\n'
输出结果:
>>>
E:\WORK_FILE\Python\Python2 [] ['bkjw.py', 'calculator.py', 'cdclog.txt', 'cdctools.py', 'cdctools.pyc', 'class_login.py', 'class_test01.py', 'eight_queen.py', 'hehe.py', 'pycdc-v0.5.py', 'pyre_ebb9ce1c-e5e8-4219-a8ae-7ee620d5f9f1.png', 'renren.html', 'renren.py', 're_match.py', 're_test.py', 'szhxy\xd0\xde\xb8\xc4\xb0\xe6.py', 'szhxy\xd4\xad\xb0\xe6.py', 'table.html', 'test (2).py', 'test.py', 'test0.py', 'test1.py', 'YaYa', 'YaYa.html', 'YaYa.txt', 'yy1.py', 'yy2.py', '\xd5\xbb.py', '\xc0\xe0\xb5\xc4\xbc\xcc\xb3\xd0.py', '\xb1\xe0\xc2\xeb\xce\xca\xcc\xe2.py', '\xbc\xc7\xca\xc2\xb1\xbe.py']
解决方法:像上面代码中直接输出dirs,files,会导致乱码,如果将dirs,files遍历每项然后输出,就不会产生乱码,
代码1-3:
import os
for root,dirs,files in os.walk('E:\\WORK_FILE\\Python\\Python2'):
print 'root:' , root , '\n'
print 'directory:\n'
for directory in dirs:
print directory , '\n'
print 'file:\n'
for f in files:
print f , '\n'
部分输出结果:
>>>
root: E:\WORK_FILE\Python\Python2
directory:
file:
....
szhxy修改版.py
szhxy原版.py
...
栈.py
类的继承.py
编码问题.py
记事本.py
实用代码1-3:
# _*_coding:utf-8 _*_
import os
import chardet
import re
#
#@param file_list 全为字符串的列表
#功能:将列表中的每一个字符串重新格式化,返回一个格式化好的字符串
#
def list2str(file_list):
if file_list==[]:
return 'null'
tmp_file=''
i=0
for name in file_list:
if i%5==0 and i!=0:
tmp_file+='\n'
tmp_file+=(name+'|#|')
i+=1
return tmp_file
#
#param directory 需要遍历的目录
# save-file 将遍历之后的结果保存在save_file
#
def fileWalker(directory,save_file):
fp=open(save_file,'w')
for root,dirs,files in os.walk(directory):
dirs=list2str(dirs)
files=list2str(files)
tmp='rootdir:'+root+'\n'+'dirs----'+dirs+'\n'+'files----'+files+'\n'
fp.write(tmp)
fp.write('+'*20+'\n'+'+'*20+'\n')
fp.close()
#
#param directory 指定搜索目录
# keyword 指定查询关键字
#返回directory目录下的所有符合条件的目录,文件,子目录,子文件
#
def Grep(directory,keyword):
tmp_dir=''
for root,dirs,files in os.walk(directory):
'''dirs=list2str(dirs)
files=list2str(files)
re_find=re.compile(keyword)
re_find.findAll(dirs)'''
if chardet.detect(keyword)['encoding']!='ascii':
for dir_name in dirs:
if chardet.detect(dir_name)['encoding']=='GB2312':
if keyword.decode('utf8') in dir_name.decode('GB2312'):
tmp_dir+=('d:'+root+'\\'+dir_name+'\n')
for file_name in files:
code=chardet.detect(file_name)['encoding']
try:
if keyword.decode('utf8') in file_name.decode(code):
tmp_dir+=('f:'+root+'\\'+file_name+'\n')
except:
pass
else:
for dir_name in dirs:
if keyword in dir_name:
tmp_dir+=('d:'+root+'\\'+dir_name+'\n')
for file_name in files:
if keyword in file_name:
tmp_dir+=('f:'+root+'\\'+file_name+'\n')
return tmp_dir
if __name__=="__main__":
'''directory="E:\\BaiduYunDownload"
fileWalker(directory,"E:\\WORK_FILE\\Python\\Python2\\cdclog.txt")'''
dirs=Grep('E:\\BaiduYunDownload','韩寒')
print dirs