python基础——fileinput模块

转载:Python中fileinput模块介绍

fileinput模块可以对一个或多个文件中的内容进行迭代、遍历等操作。

该模块的input()函数有点类似文件readlines()方法,区别在于:

前者是一个迭代对象,即每次只生成一行,需要用for循环迭代。

后者是一次性读取所有行。在碰到大文件的读取时,前者无疑效率更高效。

用fileinput对文件进行循环遍历,格式化输出,查找、替换等操作,非常方便。

【典型用法】

import fileinput
for line in fileinput.input():
    process(line)

【基本格式】

fileinput.input([files[, inplace[, backup[, bufsize[, mode[, openhook]]]]]])

【默认格式】

fileinput.input (files=None, inplace=False, backup='', bufsize=0, mode='r', openhook=None)


 
 
  1. files: #文件的路径列表,默认是stdin方式,多文件['1.txt','2.txt',...]
  2. inplace: #是否将标准输出的结果写回文件,默认不取代
  3. backup: #备份文件的扩展名,只指定扩展名,如.bak。如果该文件的备份文件已存在,则会自动覆盖。
  4. bufsize: #缓冲区大小,默认为0,如果文件很大,可以修改此参数,一般默认即可
  5. mode: #读写模式,默认为只读
  6. openhook: #该钩子用于控制打开的所有文件,比如说编码方式等;
【 常用函数 】

 
 
  1. fileinput.input() #返回能够用于for循环遍历的对象
  2. fileinput.filename() #返回当前文件的名称
  3. fileinput.lineno() #返回当前已经读取的行的数量(或者序号)
  4. fileinput.filelineno() #返回当前读取的行的行号
  5. fileinput.isfirstline() #检查当前行是否是文件的第一行
  6. fileinput.isstdin() #判断最后一行是否从stdin中读取
  7. fileinput.close() #关闭队列

【常见例子】

  • 例子01: 利用fileinput读取一个文件所有行

 
 
  1. >>> import fileinput
  2. >>> for line in fileinput.input( 'data.txt'):
  3. print line,
  4. #输出结果
  5. Python
  6. Java
  7. C/C++
  8. Shell

命令行方式:


 
 
  1. #test.py
  2. import fileinput
  3. for line in fileinput.input():
  4. print fileinput.filename(), '|', 'Line Number:',fileinput.lineno(), '|: ',line
  5. c:>python test.py data.txt
  6. data.txt | Line Number: 1 |: Python
  7. data.txt | Line Number: 2 |: Java
  8. data.txt | Line Number: 3 |: C/C++
  9. data.txt | Line Number: 4 |: Shell
  • 例子02: 利用fileinput对多文件操作,并原地修改内容

 
 
  1. #test.py
  2. #---样本文件---
  3. c:\Python27>type 1.txt
  4. first
  5. second
  6. c:\Python27>type 2.txt
  7. third
  8. fourth
  9. #---样本文件---
  10. import fileinput
  11. def process(line):
  12. return line.rstrip() + ' line'
  13. for line in fileinput.input([ '1.txt', '2.txt'],inplace= 1):
  14. print process(line)
  15. #---结果输出---
  16. c:\Python27>type 1.txt
  17. first line
  18. second line
  19. c:\Python27>type 2.txt
  20. third line
  21. fourth line
  22. #---结果输出---
命令行方式:

 
 
  1. #test.py
  2. import fileinput
  3. def process(line):
  4. return line.rstrip() + ' line'
  5. for line in fileinput.input(inplace = True):
  6. print process(line)
  7. #执行命令
  8. c:\Python27>python test.py 1.txt 2.txt
  • 例子03: 利用fileinput实现文件内容替换,并将原文件作备份

 
 
  1. #样本文件:
  2. #data.txt
  3. Python
  4. Java
  5. C/C++
  6. Shell
  7. #FileName: test.py
  8. import fileinput
  9. for line in fileinput.input( 'data.txt',backup= '.bak',inplace= 1):
  10. print line.rstrip().replace( 'Python', 'Perl') #或者print line.replace('Python','Perl'),
  11. #最后结果:
  12. #data.txt
  13. Python
  14. Java
  15. C/C++
  16. Shell
  17. #并生成:
  18. #data.txt.bak文件

 
 
  1. #其效果等同于下面的方式
  2. import fileinput
  3. for line in fileinput.input():
  4. print 'Tag:',line,
  5. #---测试结果:
  6. d:\>python Learn.py < data.txt > data_out.txt
 
  • 例子04: 利用fileinput将CRLF文件转为LF

 
 
  1. import fileinput
  2. import sys
  3. for line in fileinput.input(inplace= True):
  4. #将Windows/DOS格式下的文本文件转为Linux的文件
  5. if line[ -2:] == "\r\n":
  6. line = line + "\n"
  7. sys.stdout.write(line)
  • 例子05: 利用fileinput对文件简单处理

 
 
  1. #FileName: test.py
  2. import sys
  3. import fileinput
  4. for line in fileinput.input( r'C:\Python27\info.txt'):
  5. sys.stdout.write( '=> ')
  6. sys.stdout.write(line)
  7. #输出结果
  8. >>>
  9. => The Zen of Python, by Tim Peters
  10. =>
  11. => Beautiful is better than ugly.
  12. => Explicit is better than implicit.
  13. => Simple is better than complex.
  14. => Complex is better than complicated.
  15. => Flat is better than nested.
  16. => Sparse is better than dense.
  17. => Readability counts.
  18. => Special cases aren 't special enough to break the rules.
  19. => Although practicality beats purity.
  20. => Errors should never pass silently.
  21. => Unless explicitly silenced.
  22. => In the face of ambiguity, refuse the temptation to guess.
  23. => There should be one-- and preferably only one --obvious way to do it.
  24. => Although that way may not be obvious at first unless you're Dutch.
  25. => Now is better than never.
  26. => Although never is often better than *right* now.
  27. => If the implementation is hard to explain, it 's a bad idea.
  28. => If the implementation is easy to explain, it may be a good idea.
  29. => Namespaces are one honking great idea -- let's do more of those!
  • 例子06: 利用fileinput批处理文件

 
 
  1. #---测试文件: test.txt test1.txt test2.txt test3.txt---
  2. #---脚本文件: test.py---
  3. import fileinput
  4. import glob
  5. for line in fileinput.input(glob.glob( "test*.txt")):
  6. if fileinput.isfirstline():
  7. print '-'* 20, 'Reading %s...' % fileinput.filename(), '-'* 20
  8. print str(fileinput.lineno()) + ': ' + line.upper(),
  9. #---输出结果:
  10. >>>
  11. -------------------- Reading test.txt... --------------------
  12. 1: AAAAA
  13. 2: BBBBB
  14. 3: CCCCC
  15. 4: DDDDD
  16. 5: FFFFF
  17. -------------------- Reading test1.txt... --------------------
  18. 6: FIRST LINE
  19. 7: SECOND LINE
  20. -------------------- Reading test2.txt... --------------------
  21. 8: THIRD LINE
  22. 9: FOURTH LINE
  23. -------------------- Reading test3.txt... --------------------
  24. 10: THIS IS LINE 1
  25. 11: THIS IS LINE 2
  26. 12: THIS IS LINE 3
  27. 13: THIS IS LINE 4
  • 例子07: 利用fileinput及re做日志分析: 提取所有含日期的行

 
 
  1. #--样本文件--
  2. aaa
  3. 1970 -01 -01 13: 45: 30 Error: **** Due to System Disk spacke not enough...
  4. bbb
  5. 1970 -01 -02 10: 20: 30 Error: **** Due to System Out of Memory...
  6. ccc
  7. #---测试脚本---
  8. import re
  9. import fileinput
  10. import sys
  11. pattern = '\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}'
  12. for line in fileinput.input( 'error.log',backup= '.bak',inplace= 1):
  13. if re.search(pattern,line):
  14. sys.stdout.write( "=> ")
  15. sys.stdout.write(line)
  16. #---测试结果---
  17. => 1970 -01 -01 13: 45: 30 Error: **** Due to System Disk spacke not enough...
  18. => 1970 -01 -02 10: 20: 30 Error: **** Due to System Out of Memory...
  • 例子08: 利用fileinput及re做分析: 提取符合条件的电话号码

 
 
  1. #---样本文件: phone.txt---
  2. 010 -110 -12345
  3. 800 -333 -1234
  4. 010 -99999999
  5. 05718888888
  6. 021 -88888888
  7. #---测试脚本: test.py---
  8. import re
  9. import fileinput
  10. pattern = '[010|021]-\d{8}' #提取区号为010或021电话号码,格式:010-12345678
  11. for line in fileinput.input( 'phone.txt'):
  12. if re.search(pattern,line):
  13. print '=' * 50
  14. print 'Filename:'+ fileinput.filename()+ ' | Line Number:'+str(fileinput.lineno())+ ' | '+line,
  15. #---输出结果:---
  16. >>>
  17. ==================================================
  18. Filename:phone.txt | Line Number: 3 | 010 -99999999
  19. ==================================================
  20. Filename:phone.txt | Line Number: 5 | 021 -88888888
  21. >>>
  • 例子09: 利用fileinput实现类似于grep的功能

 
 
  1. import sys
  2. import re
  3. import fileinput
  4. pattern= re.compile(sys.argv[ 1])
  5. for line in fileinput.input(sys.argv[ 2]):
  6. if pattern.match(line):
  7. print fileinput.filename(), fileinput.filelineno(), line
  8. $ ./test.py import.*re *.py
  9. #查找所有py文件中,含import re字样的
  10. addressBook.py 2 import re
  11. addressBook1.py 10 import re
  12. addressBook2.py 18 import re
  13. test.py 238 import re
  • 例子10: 利用fileinput做正则替换

 
 
  1. #---测试样本: input.txt
  2. * [Learning Python]( #author:Mark Lutz)
  3. #---测试脚本: test.py
  4. import fileinput
  5. import re
  6. for line in fileinput.input():
  7. line = re.sub( r'\* \[(.*)\]\(#(.*)\)', r'<h2 id="\2">\1</h2>', line.rstrip())
  8. print(line)
  9. #---输出结果:
  10. c:\Python27>python test.py input.txt
  11. <h2 id= "author:Mark Lutz">Learning Python</h2>

  • 例子11: 利用fileinput做正则替换,不同字模块之间的替换

 
 
  1. #---测试样本:test.txt
  2. [@!$First]&[*%-Second]&[Third]
  3. #---测试脚本:test.py
  4. import re
  5. import fileinput
  6. regex = re.compile( r'^([^&]*)(&)([^&]*)(&)([^&]*)')
  7. #整行以&分割,要实现[@!$First]与[*%-Second]互换
  8. for line in fileinput.input( 'test.txt',inplace= 1,backup= '.bak'):
  9. print regex.sub( r'\3\2\1\4\5',line),
  10. #---输出结果:
  11. [*%-Second]&[@!$First]&[Third]
  • 例子12: 利用fileinput根据argv命令行输入做替换

 
 
  1. #---样本数据: host.txt
  2. # localhost is used to configure the loopback interface
  3. # when the system is booting. Do not change this entry.
  4. 127.0 .0 .1 localhost
  5. 192.168 .100 .2 www.test2.com
  6. 192.168 .100 .3 www.test3.com
  7. 192.168 .100 .4 www.test4.com
  8. #---测试脚本: test.py
  9. import sys
  10. import fileinput
  11. source = sys.argv[ 1]
  12. target = sys.argv[ 2]
  13. files = sys.argv[ 3:]
  14. for line in fileinput.input(files,backup= '.bak',openhook=fileinput.hook_encoded( "gb2312")):
  15. #对打开的文件执行中文字符集编码
  16. line = line.rstrip().replace(source,target)
  17. print line
  18. #---输出结果:
  19. c:\>python test.py 192.168 .100 127.0 .0 host.txt
  20. #将host文件中,所有192.168.100转换为:127.0.0
  21. 127.0 .0 .1 localhost
  22. 127.0 .0 .2 www.test2.com
  23. 127.0 .0 .3 www.test3.com
  24. 127.0 .0 .4 www.test4.com







    • 0
      点赞
    • 2
      收藏
      觉得还不错? 一键收藏
    • 0
      评论
    评论
    添加红包

    请填写红包祝福语或标题

    红包个数最小为10个

    红包金额最低5元

    当前余额3.43前往充值 >
    需支付:10.00
    成就一亿技术人!
    领取后你会自动成为博主和红包主的粉丝 规则
    hope_wisdom
    发出的红包
    实付
    使用余额支付
    点击重新获取
    扫码支付
    钱包余额 0

    抵扣说明:

    1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
    2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

    余额充值