fileinput模块可以对一个或多个文件中的内容进行迭代、遍历等操作。
该模块的input()函数有点类似文件readlines()方法,区别在于:
前者是一个迭代对象,即每次只生成一行,需要用for循环迭代。
后者是一次性读取所有行。在碰到大文件的读取时,前者无疑效率更高效。
用fileinput对文件进行循环遍历,格式化输出,查找、替换等操作,非常方便。
【典型用法】
import fileinput for line in fileinput.input(): process(line)
【基本格式】
fileinput.input([files[, inplace[, backup[, bufsize[, mode[, openhook]]]]]])
【默认格式】
fileinput.input (files=None, inplace=False, backup='', bufsize=0, mode='r', openhook=None)
-
files:
#文件的路径列表,默认是stdin方式,多文件['1.txt','2.txt',...]
-
inplace:
#是否将标准输出的结果写回文件,默认不取代
-
backup:
#备份文件的扩展名,只指定扩展名,如.bak。如果该文件的备份文件已存在,则会自动覆盖。
-
bufsize:
#缓冲区大小,默认为0,如果文件很大,可以修改此参数,一般默认即可
-
mode:
#读写模式,默认为只读
-
openhook:
#该钩子用于控制打开的所有文件,比如说编码方式等;
【
常用函数
】
-
fileinput.input()
#返回能够用于for循环遍历的对象
-
fileinput.filename()
#返回当前文件的名称
-
fileinput.lineno()
#返回当前已经读取的行的数量(或者序号)
-
fileinput.filelineno()
#返回当前读取的行的行号
-
fileinput.isfirstline()
#检查当前行是否是文件的第一行
-
fileinput.isstdin()
#判断最后一行是否从stdin中读取
-
fileinput.close()
#关闭队列
【常见例子】
- 例子01: 利用fileinput读取一个文件所有行
-
>>>
import fileinput
-
>>>
for line
in fileinput.input(
'data.txt'):
-
print line,
-
#输出结果
-
Python
-
Java
-
C/C++
-
Shell
命令行方式:
-
#test.py
-
import fileinput
-
-
for line
in fileinput.input():
-
print fileinput.filename(),
'|',
'Line Number:',fileinput.lineno(),
'|: ',line
-
-
c:>python test.py data.txt
-
data.txt | Line Number:
1 |: Python
-
data.txt | Line Number:
2 |: Java
-
data.txt | Line Number:
3 |: C/C++
-
data.txt | Line Number:
4 |: Shell
- 例子02: 利用fileinput对多文件操作,并原地修改内容
-
#test.py
-
#---样本文件---
-
c:\Python27>type
1.txt
-
first
-
second
-
-
c:\Python27>type
2.txt
-
third
-
fourth
-
#---样本文件---
-
import fileinput
-
-
def process(line):
-
return line.rstrip() +
' line'
-
-
for line
in fileinput.input([
'1.txt',
'2.txt'],inplace=
1):
-
print process(line)
-
-
#---结果输出---
-
c:\Python27>type
1.txt
-
first line
-
second line
-
-
c:\Python27>type
2.txt
-
third line
-
fourth line
-
#---结果输出---
命令行方式:
#test.py import fileinput def process(line): return line.rstrip() + ' line' for line in fileinput.input(inplace = True): print process(line) #执行命令 c:\Python27>python test.py 1.txt 2.txt
- 例子03: 利用fileinput实现文件内容替换,并将原文件作备份
-
#样本文件:
-
#data.txt
-
Python
-
Java
-
C/C++
-
Shell
-
-
#FileName: test.py
-
import fileinput
-
-
for line
in fileinput.input(
'data.txt',backup=
'.bak',inplace=
1):
-
print line.rstrip().replace(
'Python',
'Perl')
#或者print line.replace('Python','Perl'),
-
-
#最后结果:
-
#data.txt
-
Python
-
Java
-
C/C++
-
Shell
-
#并生成:
-
#data.txt.bak文件
-
#其效果等同于下面的方式
-
import fileinput
-
for line
in fileinput.input():
-
print
'Tag:',line,
-
-
-
#---测试结果:
-
d:\>python Learn.py < data.txt > data_out.txt
- 例子04: 利用fileinput将CRLF文件转为LF
-
import fileinput
-
import sys
-
-
for line
in fileinput.input(inplace=
True):
-
#将Windows/DOS格式下的文本文件转为Linux的文件
-
if line[
-2:] ==
"\r\n":
-
line = line +
"\n"
-
sys.stdout.write(line)
- 例子05: 利用fileinput对文件简单处理
-
#FileName: test.py
-
import sys
-
import fileinput
-
-
for line
in fileinput.input(
r'C:\Python27\info.txt'):
-
sys.stdout.write(
'=> ')
-
sys.stdout.write(line)
-
-
#输出结果
-
>>>
-
=> The Zen of Python, by Tim Peters
-
=>
-
=> Beautiful
is better than ugly.
-
=> Explicit
is better than implicit.
-
=> Simple
is better than complex.
-
=> Complex
is better than complicated.
-
=> Flat
is better than nested.
-
=> Sparse
is better than dense.
-
=> Readability counts.
-
=> Special cases aren
't special enough to break the rules.
-
=> Although practicality beats purity.
-
=> Errors should never pass silently.
-
=> Unless explicitly silenced.
-
=> In the face of ambiguity, refuse the temptation to guess.
-
=> There should be one-- and preferably only one --obvious way to do it.
-
=> Although that way may not be obvious at first unless you're Dutch.
-
=> Now
is better than never.
-
=> Although never
is often better than *right* now.
-
=> If the implementation
is hard to explain, it
's a bad idea.
-
=> If the implementation is easy to explain, it may be a good idea.
-
=> Namespaces are one honking great idea -- let's do more of those!
- 例子06: 利用fileinput批处理文件
-
#---测试文件: test.txt test1.txt test2.txt test3.txt---
-
#---脚本文件: test.py---
-
import fileinput
-
import glob
-
-
for line
in fileinput.input(glob.glob(
"test*.txt")):
-
if fileinput.isfirstline():
-
print
'-'*
20,
'Reading %s...' % fileinput.filename(),
'-'*
20
-
print str(fileinput.lineno()) +
': ' + line.upper(),
-
-
-
#---输出结果:
-
>>>
-
-------------------- Reading test.txt... --------------------
-
1: AAAAA
-
2: BBBBB
-
3: CCCCC
-
4: DDDDD
-
5: FFFFF
-
-------------------- Reading test1.txt... --------------------
-
6: FIRST LINE
-
7: SECOND LINE
-
-------------------- Reading test2.txt... --------------------
-
8: THIRD LINE
-
9: FOURTH LINE
-
-------------------- Reading test3.txt... --------------------
-
10: THIS IS LINE
1
-
11: THIS IS LINE
2
-
12: THIS IS LINE
3
-
13: THIS IS LINE
4
- 例子07: 利用fileinput及re做日志分析: 提取所有含日期的行
-
#--样本文件--
-
aaa
-
1970
-01
-01
13:
45:
30 Error: **** Due to System Disk spacke
not enough...
-
bbb
-
1970
-01
-02
10:
20:
30 Error: **** Due to System Out of Memory...
-
ccc
-
-
#---测试脚本---
-
import re
-
import fileinput
-
import sys
-
-
pattern =
'\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}'
-
-
for line
in fileinput.input(
'error.log',backup=
'.bak',inplace=
1):
-
if re.search(pattern,line):
-
sys.stdout.write(
"=> ")
-
sys.stdout.write(line)
-
-
#---测试结果---
-
=>
1970
-01
-01
13:
45:
30 Error: **** Due to System Disk spacke
not enough...
-
=>
1970
-01
-02
10:
20:
30 Error: **** Due to System Out of Memory...
- 例子08: 利用fileinput及re做分析: 提取符合条件的电话号码
-
#---样本文件: phone.txt---
-
010
-110
-12345
-
800
-333
-1234
-
010
-99999999
-
05718888888
-
021
-88888888
-
-
#---测试脚本: test.py---
-
import re
-
import fileinput
-
-
pattern =
'[010|021]-\d{8}'
#提取区号为010或021电话号码,格式:010-12345678
-
-
for line
in fileinput.input(
'phone.txt'):
-
if re.search(pattern,line):
-
print
'=' *
50
-
print
'Filename:'+ fileinput.filename()+
' | Line Number:'+str(fileinput.lineno())+
' | '+line,
-
-
#---输出结果:---
-
>>>
-
==================================================
-
Filename:phone.txt | Line Number:
3 |
010
-99999999
-
==================================================
-
Filename:phone.txt | Line Number:
5 |
021
-88888888
-
>>>
- 例子09: 利用fileinput实现类似于grep的功能
-
import sys
-
import re
-
import fileinput
-
-
pattern= re.compile(sys.argv[
1])
-
for line
in fileinput.input(sys.argv[
2]):
-
if pattern.match(line):
-
print fileinput.filename(), fileinput.filelineno(), line
-
$ ./test.py
import.*re *.py
-
#查找所有py文件中,含import re字样的
-
addressBook.py
2
import re
-
addressBook1.py
10
import re
-
addressBook2.py
18
import re
-
test.py
238
import re
- 例子10: 利用fileinput做正则替换
-
#---测试样本: input.txt
-
* [Learning Python](
#author:Mark Lutz)
-
-
#---测试脚本: test.py
-
import fileinput
-
import re
-
-
for line
in fileinput.input():
-
line = re.sub(
r'\* \[(.*)\]\(#(.*)\)',
r'<h2 id="\2">\1</h2>', line.rstrip())
-
print(line)
-
-
#---输出结果:
-
c:\Python27>python test.py input.txt
-
<h2 id=
"author:Mark Lutz">Learning Python</h2>
- 例子11: 利用fileinput做正则替换,不同字模块之间的替换
-
#---测试样本:test.txt
-
[@!$First]&[*%-Second]&[Third]
-
-
#---测试脚本:test.py
-
import re
-
import fileinput
-
-
regex = re.compile(
r'^([^&]*)(&)([^&]*)(&)([^&]*)')
-
#整行以&分割,要实现[@!$First]与[*%-Second]互换
-
for line
in fileinput.input(
'test.txt',inplace=
1,backup=
'.bak'):
-
print regex.sub(
r'\3\2\1\4\5',line),
-
-
#---输出结果:
-
[*%-Second]&[@!$First]&[Third]
- 例子12: 利用fileinput根据argv命令行输入做替换
-
#---样本数据: host.txt
-
# localhost is used to configure the loopback interface
-
# when the system is booting. Do not change this entry.
-
127.0
.0
.1 localhost
-
192.168
.100
.2 www.test2.com
-
192.168
.100
.3 www.test3.com
-
192.168
.100
.4 www.test4.com
-
-
#---测试脚本: test.py
-
import sys
-
import fileinput
-
-
source = sys.argv[
1]
-
target = sys.argv[
2]
-
files = sys.argv[
3:]
-
-
for line
in fileinput.input(files,backup=
'.bak',openhook=fileinput.hook_encoded(
"gb2312")):
-
#对打开的文件执行中文字符集编码
-
line = line.rstrip().replace(source,target)
-
print line
-
-
#---输出结果:
-
c:\>python test.py
192.168
.100
127.0
.0 host.txt
-
#将host文件中,所有192.168.100转换为:127.0.0
-
127.0
.0
.1 localhost
-
127.0
.0
.2 www.test2.com
-
127.0
.0
.3 www.test3.com
-
127.0
.0
.4 www.test4.com