Python 模块2

最新推荐文章于 2023-06-09 15:06:26 发布

weixin_42225810

最新推荐文章于 2023-06-09 15:06:26 发布

阅读量85

点赞数

标准库

fileinput

fileinput.input()       #返回能够用于for循环遍历的对象
fileinput.filename()    #返回当前文件的名称
fileinput.lineno()      #返回当前已经读取的行的数量（或者序号）
fileinput.filelineno()  #返回当前读取的行的行号
fileinput.isfirstline() #检查当前行是否是文件的第一行
fileinput.isstdin()     #判断最后一行是否从stdin中读取
fileinput.close()       #关闭队列

【常见例子】

例子01: 利用fileinput读取一个文件所有行

 
   [python]  
   view plaincopy
>>> import fileinput  
>>> for line in fileinput.input('data.txt'):  
        print line,  
#输出结果  
Python  
Java   
C/C++  
Shell  

命令行方式:

 
   [python]  
   view plaincopy
#test.py  
import fileinput  
  
for line in fileinput.input():  
    print fileinput.filename(),'|','Line Number:',fileinput.lineno(),'|: ',line  
  
c:>python test.py data.txt  
data.txt | Line Number: 1 |:  Python  
data.txt | Line Number: 2 |:  Java  
data.txt | Line Number: 3 |:  C/C++  
data.txt | Line Number: 4 |:  Shell  

例子02: 利用fileinput对多文件操作，并原地修改内容

 
   [python]  
   view plaincopy
#test.py  
#---样本文件---  
c:\Python27>type 1.txt  
first  
second  
  
c:\Python27>type 2.txt  
third  
fourth  
#---样本文件---  
import fileinput  
  
def process(line):  
    return line.rstrip() + ' line'  
  
for line in fileinput.input(['1.txt','2.txt'],inplace=1):  
    print process(line)  
  
#---结果输出---  
c:\Python27>type 1.txt  
first line  
second line  
  
c:\Python27>type 2.txt  
third line  
fourth line  
#---结果输出---  

命令行方式:

 
   [html]  
   view plaincopy
#test.py  
import fileinput  
  
def process(line):  
    return line.rstrip() + ' line'  
  
for line in fileinput.input(inplace = True):  
    print process(line)  
  
#执行命令  
c:\Python27>python test.py 1.txt 2.txt  

例子03: 利用fileinput实现文件内容替换，并将原文件作备份

 
   [python]  
   view plaincopy
#样本文件:  
#data.txt  
Python  
Java  
C/C++  
Shell  
  
#FileName: test.py  
import fileinput  
  
for line in fileinput.input('data.txt',backup='.bak',inplace=1):  
    print line.rstrip().replace('Python','Perl')  #或者print line.replace('Python','Perl'),  
      
#最后结果:  
#data.txt  
Python  
Java  
C/C++  
Shell  
#并生成:  
#data.txt.bak文件  

 
   [python]  
   view plaincopy
#其效果等同于下面的方式  
import fileinput  
for line in fileinput.input():  
    print 'Tag:',line,  
  
  
#---测试结果:     
d:\>python Learn.py < data.txt > data_out.txt  

例子04: 利用fileinput将CRLF文件转为LF

 
   [python]  
   view plaincopy
import fileinput  
import sys  
  
for line in fileinput.input(inplace=True):  
    #将Windows/DOS格式下的文本文件转为Linux的文件  
    if line[-2:] == "\r\n":    
        line = line + "\n"  
    sys.stdout.write(line)  

例子05: 利用fileinput对文件简单处理

 
   [python]  
   view plaincopy
#FileName: test.py  
import sys  
import fileinput  
  
for line in fileinput.input(r'C:\Python27\info.txt'):  
    sys.stdout.write('=> ')  
    sys.stdout.write(line)  
  
#输出结果     
>>>   
=> The Zen of Python, by Tim Peters  
=>   
=> Beautiful is better than ugly.  
=> Explicit is better than implicit.  
=> Simple is better than complex.  
=> Complex is better than complicated.  
=> Flat is better than nested.  
=> Sparse is better than dense.  
=> Readability counts.  
=> Special cases aren't special enough to break the rules.  
=> Although practicality beats purity.  
=> Errors should never pass silently.  
=> Unless explicitly silenced.  
=> In the face of ambiguity, refuse the temptation to guess.  
=> There should be one-- and preferably only one --obvious way to do it.  
=> Although that way may not be obvious at first unless you're Dutch.  
=> Now is better than never.  
=> Although never is often better than *right* now.  
=> If the implementation is hard to explain, it's a bad idea.  
=> If the implementation is easy to explain, it may be a good idea.  
=> Namespaces are one honking great idea -- let's do more of those!  

例子06: 利用fileinput批处理文件

 
   [python]  
   view plaincopy
#---测试文件: test.txt test1.txt test2.txt test3.txt---  
#---脚本文件: test.py---  
import fileinput  
import glob  
  
for line in fileinput.input(glob.glob("test*.txt")):  
    if fileinput.isfirstline():  
        print '-'*20, 'Reading %s...' % fileinput.filename(), '-'*20  
    print str(fileinput.lineno()) + ': ' + line.upper(),  
      
      
#---输出结果:  
>>>   
-------------------- Reading test.txt... --------------------  
1: AAAAA  
2: BBBBB  
3: CCCCC  
4: DDDDD  
5: FFFFF  
-------------------- Reading test1.txt... --------------------  
6: FIRST LINE  
7: SECOND LINE  
-------------------- Reading test2.txt... --------------------  
8: THIRD LINE  
9: FOURTH LINE  
-------------------- Reading test3.txt... --------------------  
10: THIS IS LINE 1  
11: THIS IS LINE 2  
12: THIS IS LINE 3  
13: THIS IS LINE 4  

例子07: 利用fileinput及re做日志分析: 提取所有含日期的行

 
   [python]  
   view plaincopy
#--样本文件--  
aaa  
1970-01-01 13:45:30  Error: **** Due to System Disk spacke not enough...  
bbb  
1970-01-02 10:20:30  Error: **** Due to System Out of Memory...  
ccc  
  
#---测试脚本---  
import re  
import fileinput  
import sys  
  
pattern = '\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}'  
  
for line in fileinput.input('error.log',backup='.bak',inplace=1):  
    if re.search(pattern,line):  
        sys.stdout.write("=> ")  
        sys.stdout.write(line)  
  
#---测试结果---  
=> 1970-01-01 13:45:30  Error: **** Due to System Disk spacke not enough...  
=> 1970-01-02 10:20:30  Error: **** Due to System Out of Memory...  

例子08: 利用fileinput及re做分析: 提取符合条件的电话号码

 
   [python]  
   view plaincopy
#---样本文件: phone.txt---  
010-110-12345  
800-333-1234  
010-99999999  
05718888888  
021-88888888  
  
#---测试脚本: test.py---  
import re  
import fileinput  
  
pattern = '[010|021]-\d{8}'  #提取区号为010或021电话号码，格式:010-12345678  
  
for line in fileinput.input('phone.txt'):  
    if re.search(pattern,line):  
        print '=' * 50  
        print 'Filename:'+ fileinput.filename()+' | Line Number:'+str(fileinput.lineno())+' | '+line,  
  
#---输出结果:---  
>>>   
==================================================  
Filename:phone.txt | Line Number:3 | 010-99999999  
==================================================  
Filename:phone.txt | Line Number:5 | 021-88888888  
>>>   

例子09: 利用fileinput实现类似于grep的功能

 
   [python]  
   view plaincopy
import sys  
import re  
import fileinput  
  
pattern= re.compile(sys.argv[1])  
for line in fileinput.input(sys.argv[2]):  
    if pattern.match(line):  
        print fileinput.filename(), fileinput.filelineno(), line  
$ ./test.py import.*re *.py  
#查找所有py文件中，含import re字样的  
addressBook.py  2   import re  
addressBook1.py 10  import re  
addressBook2.py 18  import re  
test.py         238 import re  

例子10: 利用fileinput做正则替换

 
   [python]  
   view plaincopy
#---测试样本: input.txt  
* [Learning Python](#author:Mark Lutz)  
      
#---测试脚本: test.py  
import fileinput  
import re  
   
for line in fileinput.input():  
    line = re.sub(r'\*  
    (.∗)(.∗) 
   
#(.*)#(.*)', r'<h2 id="\2">\1</h2>', line.rstrip())  
    print(line)  
  
#---输出结果:  
c:\Python27>python test.py input.txt  
<h2 id="author:Mark Lutz">Learning Python</h2>  

例子11: 利用fileinput做正则替换，不同字模块之间的替换

 
   [python]  
   view plaincopy
#---测试样本:test.txt  
[@!$First]&[*%-Second]&[Third]  
  
#---测试脚本:test.py  
import re  
import fileinput  
  
regex = re.compile(r'^([^&]*)(&)([^&]*)(&)([^&]*)')  
#整行以&分割，要实现[@!$First]与[*%-Second]互换  
for line in fileinput.input('test.txt',inplace=1,backup='.bak'):  
    print regex.sub(r'\3\2\1\4\5',line),  
  
#---输出结果:  
[*%-Second]&[@!$First]&[Third]  

例子12: 利用fileinput根据argv命令行输入做替换

 
   [python]  
   view plaincopy
#---样本数据: host.txt  
# localhost is used to configure the loopback interface  
# when the system is booting.  Do not change this entry.  
127.0.0.1      localhost  
192.168.100.2  www.test2.com  
192.168.100.3  www.test3.com  
192.168.100.4  www.test4.com  
  
#---测试脚本: test.py  
import sys  
import fileinput  
  
source = sys.argv[1]  
target = sys.argv[2]  
files  = sys.argv[3:]  
  
for line in fileinput.input(files,backup='.bak',openhook=fileinput.hook_encoded("gb2312")):  
    #对打开的文件执行中文字符集编码  
    line = line.rstrip().replace(source,target)  
    print line  
      
#---输出结果:      
c:\>python test.py 192.168.100 127.0.0 host.txt  
#将host文件中，所有192.168.100转换为:127.0.0  
127.0.0.1  localhost  
127.0.0.2  www.test2.com  
127.0.0.3  www.test3.com  
127.0.0.4  www.test4.com  

time

在Python中，与时间处理有关的模块就包括：time，datetime以及calendar。这篇文章，主要讲解time模块。

在开始之前，首先要说明这几点：

在Python中，通常有这几种方式来表示时间：1）时间戳 2）格式化的时间字符串 3）元组（struct_time）共九个元素。由于Python的time模块实现主要调用C库，所以各个平台可能有所不同。
UTC（Coordinated Universal Time，世界协调时）亦即格林威治天文时间，世界标准时间。在中国为UTC+8。DST（Daylight Saving Time）即夏令时。
时间戳（timestamp）的方式：通常来说，时间戳表示的是从1970年1月1日00:00:00开始按秒计算的偏移量。我们运行“type(time.time())”，返回的是float类型。返回时间戳方式的函数主要有time()，clock()等。
元组（struct_time）方式：struct_time元组共有9个元素，返回struct_time的函数主要有gmtime()，localtime()，strptime()。下面列出这种方式元组中的几个元素：

索引（Index）	属性（Attribute）	值（Values）
0	tm_year（年）	比如2011
1	tm_mon（月）	1 - 12
2	tm_mday（日）	1 - 31
3	tm_hour（时）	0 - 23
4	tm_min（分）	0 - 59
5	tm_sec（秒）	0 - 61
6	tm_wday（weekday）	0 - 6（0表示周日）
7	tm_yday（一年中的第几天）	1 - 366
8	tm_isdst（是否是夏令时）	默认为-1

接着介绍time模块中常用的几个函数：

1）time.localtime([secs])：将一个时间戳转换为当前时区的struct_time。secs参数未提供，则以当前时间为准。

>>> time.localtime()
time.struct_time(tm_year=2011, tm_mon=5, tm_mday=5, tm_hour=14, tm_min=14, tm_sec=50, tm_wday=3, tm_yday=125, tm_isdst=0)
>>> time.localtime(1304575584.1361799)
time.struct_time(tm_year=2011, tm_mon=5, tm_mday=5, tm_hour=14, tm_min=6, tm_sec=24, tm_wday=3, tm_yday=125, tm_isdst=0)

2）time.gmtime([secs])：和localtime()方法类似，gmtime()方法是将一个时间戳转换为UTC时区（0时区）的struct_time。

>>>time.gmtime()
time.struct_time(tm_year=2011, tm_mon=5, tm_mday=5, tm_hour=6, tm_min=19, tm_sec=48, tm_wday=3, tm_yday=125, tm_isdst=0)

注意：这里的tm_wday=3表示的是周几，但是要在这个返回值的基础上往后推一天，即表示的是周四，而不是周三。

3）time.time()：返回当前时间的时间戳。

>>> time.time()
1304575584.1361799

4）time.mktime(t)：将一个struct_time转化为时间戳。

>>> time.mktime(time.localtime())
1304576839.0

5）time.sleep(secs)：线程推迟指定的时间运行。单位为秒。

6）time.clock()：这个需要注意，在不同的系统上含义不同。在UNIX系统上，它返回的是“进程时间”，它是用秒表示的浮点数（时间戳）。而在WINDOWS中，第一次调用，返回的是进程运行的实际时间。而第二次之后的调用是自第一次调用以后到现在的运行时间。（实际上是以WIN32上QueryPerformanceCounter()为基础，它比毫秒表示更为精确）

 
   [python]  
   view plaincopy
import time    
if __name__ == '__main__':    
    time.sleep(1)    
    print "clock1:%s" % time.clock()    
    time.sleep(1)    
    print "clock2:%s" % time.clock()    
    time.sleep(1)    
    print "clock3:%s" % time.clock()  

clock1:3.35238137808e-006 运行结果：

clock2:1.00004944763
clock3:2.00012040636

其中第一个clock()输出的是程序运行时间
第二、三个clock()输出的都是与第一个clock的时间间隔

7）time.asctime([t])：把一个表示时间的元组或者struct_time表示为这种形式：'Sun Jun 20 23:21:05 1993'。如果没有参数，将会将time.localtime()作为参数传入。

>>> time.asctime()
'Thu May 5 14:55:43 2011'

8）time.ctime([secs])：把一个时间戳（按秒计算的浮点数）转化为time.asctime()的形式。如果参数未给或者为None的时候，将会默认time.time()为参数。它的作用相当于time.asctime(time.localtime(secs))。

>>> time.ctime()
'Thu May 5 14:58:09 2011'
>>> time.ctime(time.time())
'Thu May 5 14:58:39 2011'
>>> time.ctime(1304579615)
'Thu May 5 15:13:35 2011'

9）time.strftime(format[, t])：把一个代表时间的元组或者struct_time（如由time.localtime()和time.gmtime()返回）转化为格式化的时间字符串。如果t未指定，将传入time.localtime()。如果元组中任何一个元素越界，ValueError的错误将会被抛出。

格式	含义	备注
%a	本地（locale）简化星期名称
%A	本地完整星期名称
%b	本地简化月份名称
%B	本地完整月份名称
%c	本地相应的日期和时间表示
%d	一个月中的第几天（01 - 31）
%H	一天中的第几个小时（24小时制，00 - 23）
%I	第几个小时（12小时制，01 - 12）
%j	一年中的第几天（001 - 366）
%m	月份（01 - 12）
%M	分钟数（00 - 59）
%p	本地am或者pm的相应符	一
%S	秒（01 - 61）	二
%U	一年中的星期数。（00 - 53星期天是一个星期的开始。）第一个星期天之前的所有天数都放在第0周。	三
%w	一个星期中的第几天（0 - 6，0是星期天）	三
%W	和%U基本相同，不同的是%W以星期一为一个星期的开始。
%x	本地相应日期
%X	本地相应时间
%y	去掉世纪的年份（00 - 99）
%Y	完整的年份
%Z	时区的名字（如果不存在为空字符）
%%	‘%’字符

备注：

“%p”只有与“%I”配合使用才有效果。
文档中强调确实是0 - 61，而不是59，闰年秒占两秒（汗一个）。
当使用strptime()函数时，只有当在这年中的周数和天数被确定的时候%U和%W才会被计算。

举个例子：

>>> time.strftime("%Y-%m-%d %X", time.localtime())
'2011-05-05 16:37:06'

10）time.strptime(string[, format])：把一个格式化时间字符串转化为struct_time。实际上它和strftime()是逆操作。

>>> time.strptime('2011-05-05 16:37:06', '%Y-%m-%d %X')
time.struct_time(tm_year=2011, tm_mon=5, tm_mday=5, tm_hour=16, tm_min=37, tm_sec=6, tm_wday=3, tm_yday=125, tm_isdst=-1)

在这个函数中，format默认为："%a %b %d %H:%M:%S %Y"。

最后，我们来对time模块进行一个总结。根据之前描述，在Python中共有三种表达方式：1）timestamp 2）tuple或者struct_time 3）格式化字符串。

它们之间的转化如图所示：