Python学习笔记11

最新推荐文章于 2024-09-27 10:11:28 发布

ToBeGeek

最新推荐文章于 2024-09-27 10:11:28 发布

阅读量276

点赞数

分类专栏： Python学习笔记文章标签： python

本文链接：https://blog.csdn.net/u010545519/article/details/51244578

版权

Python学习笔记专栏收录该内容

15 篇文章 0 订阅

订阅专栏

chapter_11

迭代器和解析,第一部分

常用迭代工具

for循环
列表解析
in 成员测试
map 内置函数

文件迭代器

>>> f = open('script1.py')
>>> f.readline()
'import sys\n'
>>> f.readline()
'print(sys.path)\n'
>>> f.readline()
'x = 2\n'
>>> f.readline()
'print(2 ** 33)\n'
>>> f.readline()
''

文件也有一个方法,名为 __next__,差不多有相同的效果:每次调用时,就会返回文件中的下一行,唯一值得注意的区别在于,到达文件末尾时, __next__ 会引发内置Stopiteration异常,而不是返回空字符串

>>> f = open('script1.py')
>>> f.__next__()
'import sys\n'
>>> f.__next__()
'print(sys.path)\n'
>>> f.__next__()
'x = 2\n'
>>> f.__next__()
'print(2 ** 33)\n'
>>> f.__next__()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

迭代协议: 有 __next__ 方法的对象会前进到下一个结果,而在一系列结果的末尾时,则会引发 Stopiteration.任何符合以上描述的对象都认为是可迭代的.

任何可迭代对象能以for循环或其他迭代工具遍历,因为所有迭代工具内部工作起来都是在每次迭代中调用__next__,并且捕捉Stopiteration异常来确定何时离开.

>>> for line in open('script1.py'): #调用文件迭代器
...   print(line.upper(),end='')
... 
IMPORT SYS
PRINT(SYS.PATH)
X = 2
PRINT(2 ** 33)

#readlines方法也可用,但不是最佳的做法,当打开大文件时,会很慢并耗费大量的内存
>>> for line in open('script1.py').readlines(): 
...   print(line.upper(),end='')
... 
IMPORT SYS
PRINT(SYS.PATH)
X = 2
PRINT(2 ** 33)

#用while循环逐行读取文件,比文件迭代器慢一些
>>> f = open('script1.py')
>>> while True:
...   line = f.readline()
...   if not line: break
...   print(line.upper(),end='')
... 
IMPORT SYS
PRINT(SYS.PATH)
X = 2
PRINT(2 ** 33)

手动迭代: iter和next

next方法:

为了支持手动迭代代码(用较少的录入),Python 3.0 还提供了一个内置函数next,它会自动调用一个对象的__next__方法.给定一个可迭代对象X,调用next(X)等同于X.__next__( ), 但前者简单很多.

>>> f = open('script1.py')
>>> f.__next__()
'import sys\n'
>>> f.__next__()
'print(sys.path)\n'


>>> f = open('script1.py')
>>> next(f)
'import sys\n'
>>> next(f)
'print(sys.path)\n'

从技术的角都来讲,当for循环开始时,会通过它传给iter内置函数,以便从可迭代对象中获得一个迭代器,返回的对象含有需要的next方法.

iter方法:

>>> L = [1,2,3]
>>> I = iter(L) #构建一个迭代器
>>> I.__next__() #调用迭代器的__next__方法
1
>>> I.__next__()
2
>>> I.__next__()
3
>>> I.__next__()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

注意

文件对象就是自己的迭代器,也就是说,文件有自己的__next__方法,调用iter方法后返回的也是自己.

>>> f = open('script1.py')
>>> iter(f) is f #调用iter方法后返回的也是自己
True
>>> f.__next__()
'import sys\n'

列表以及很多其他的内置对象,不是自身的迭代器,因为它们支持多次用iter方法创建迭代器.

> L = [1,2,3]
>>> iter(L) is L #调用iter方法后,创建了一个迭代器
False
>>> L.__next__() #列表自身是可迭代对象,不是迭代器,不具有__next__()方法调用
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'list' object has no attribute '__next__'
>>> 
>>> I = iter(L) #创建了一个迭代器
>>> I.__next__() 
1
>>> next(I)
2

自动迭代和手动迭代

自动迭代

>>> L = [1,2,3]
>>> for X in L:
...   print(X ** 2,end=' ')
... 
1 4 9

手动迭代

>>> L = [1,2,3]
>>> I = iter(L)
>>> while True:
...   try:
...      X = next(I) 
...   except StopIteration:
...      break
...   print(X ** 2,end=' ')
... 
1 4 9

字典的迭代器

传统的遍历字典方法

>>> D = {'a':1,'b':2,'c':3}
>>> for key in D.keys():
...   print(key,D[key])
... 
c 3
b 2
a 1

最近的Python版本中,字典有一个迭代器,在迭代环境中,会自动一次返回一个键

>>> I = iter(D) #构建字典迭代器
>>> next(I)
'c'
>>> next(I)
'b'
>>> next(I)
'a'
>>> next(I)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

因为在for循环中使用到的就是迭代协议(构造迭代器,然后调用__next__( )方法逐个取值),我们可以直接用for循环来遍历字典

>>> D
{'c': 3, 'b': 2, 'a': 1}
>>> for key in D:
...   print(key,D[key])
... 
c 3
b 2
a 1

列表解析

遍历一个列表的两种方式

>>> L = [1,2,3,4,5]
>>> for i in range(len(L)):
...   L[i] += 10
... 
>>> L
[11, 12, 13, 14, 15]

>> L = [x + 10 for x in L]
>>> L
[21, 22, 23, 24, 25]

列表解析比手动的for循环语句运行的更快(往往速度会快一倍),因为它们的迭代在解析器内部是以C语言的速度执行的,而不是以手动Python代码执行的,特别是对于较大的数据集合,这是使用列表解析的一个主要的性能优点.

在文件上使用列表解析

文件对象有一个readlines方法,它能一次性地把文件载入到行字符串中的一个列表中:

>>> f = open('script1.py')
>>> lines = f.readlines()
>>> lines
['import sys\n', 'print(sys.path)\n', 'x = 2\n', 'print(2 ** 33)\n']

移除每一行后面的换行符

>>> lines = [line.rstrip() for line in lines]
>>> lines
['import sys', 'print(sys.path)', 'x = 2', 'print(2 ** 33)']

一次性解决

>>> lines = [line.rstrip() for line in open('script1.py')]
>>> lines
['import sys', 'print(sys.path)', 'x = 2', 'print(2 ** 33)']

以上的解析表达式是一种高效的解析文件方式,因为大多数工作在Python解析器内部完成,这比等价的语句要快很多,特别是对于较大的文件,列表解析的速度优势可能很显著.

扩展列表解析语法

表达式中嵌套的for循环可以有一个相关的if语句,用来过滤那些测试不为真的结果

>>> lines = [line.rstrip() for line in open('script1.py') if line[0] == 'p'] #筛选出文件中以p开头的行
>>> lines
['print(sys.path)', 'print(2 ** 33)']

等价的语句

>>> res = []
>>> for line in open('script1.py'):
...   if line[0] == 'p':
...      res.append(line.rstrip())
... 
>>> res
['print(sys.path)', 'print(2 ** 33)']

解析语句的嵌套

>>> [x + y for x in 'abc' for y in 'lmn']
['al', 'am', 'an', 'bl', 'bm', 'bn', 'cl', 'cm', 'cn']

等价的语句

>>> res = []
>>> for x in 'abc':
...   for y in 'lmn':
...     res.append(x+y)
... 
>>> res
['al', 'am', 'an', 'bl', 'bm', 'bn', 'cl', 'cm', 'cn']

其他迭代方法

>>> map(str.upper,open('script1.py'))
<map object at 0xb743848c>
>>> list(map(str.upper,open('script1.py')))
['IMPORT SYS\n', 'PRINT(SYS.PATH)\n', 'X = 2\n', 'PRINT(2 ** 33)\n']

>>> sorted(open('script1.py')) #对文件进行迭代排序
['import sys\n', 'print(2 ** 33)\n', 'print(sys.path)\n', 'x = 2\n']

>>> list(zip(open('script1.py'),open('script1.py')))
[('import sys\n', 'import sys\n'), ('print(sys.path)\n', 'print(sys.path)\n'), ('x = 2\n', 'x = 2\n'), ('print(2 ** 33)\n', 'print(2 ** 33)\n')]

>>> list(enumerate(open('script1.py')))
[(0, 'import sys\n'), (1, 'print(sys.path)\n'), (2, 'x = 2\n'), (3, 'print(2 ** 33)\n')]

filter内置函数,对于传入的函数返回True的可迭代对象中的每一项,它都会返回该项.

>>> list(filter(bool,open('script1.py'))) #选择函数为真的项
['import sys\n', 'print(sys.path)\n', 'x = 2\n', 'print(2 ** 33)\n']

>>> filter(bool,['spam','','ni']) #
<filter object at 0xb743858c>
>>> list(filter(bool,['spam','','ni']))
['spam', 'ni']

>>> import functools,operator
>>> operator.add('a','b') #将两个对象连接起来
'ab'
>>> functools.reduce(operator.add,open('script1.py'))
'import sys\nprint(sys.path)\nx = 2\nprint(2 ** 33)\n'

zip,enumerate和filter也像map一样返回一个可迭代对象.
迭代器在遍历其结果一次后,就用尽了.不能在同一个迭代器上拥有保持不同位置的多个迭代器.

>>> M = map(abs,(-1,0,1)) #构建一个迭代器
>>> M
<map object at 0xb74386cc>
>>> next(M)
1
>>> next(M)
0
>>> next(M)
1
>>> next(M)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> 
>>> for x in M: print(x) #迭代器已经被遍历过了,再遍历也没有结果,需要重新构建一个迭代器
...

>>> M = map(abs,(-1,0,1)) #重新构建一个迭代器
>>> for x in M: print(x)
... 
1
0
1

单个迭代器和多个迭代器

range支持多个迭代器

#支持多个迭代器
>>> R = range(3) 
>>> next(R) #range不是迭代器
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'range' object is not an iterator
>>> I1 = iter(R) #构造迭代器1
>>> next(I1) 
0
>>> next(I1)
1
>>> I2 = iter(R) #构造迭代器2
>>> next(I2)
0
>>> next(I1) 
2

zip,map和filter不支持多个迭代器

>>> Z = zip((1,2,3),(10,11,12))
>>> I1 = iter(Z) #试图开两个迭代器,结果只是一个引用
>>> I2 = iter(Z)
>>> next(I1)
(1, 10)
>>> next(I1)
(2, 11)
>>> next(I2)
(3, 12)

>>> M = map(abs,(-1,0,1))
>>> I1 = iter(M);I2 = iter(M)
>>> print(next(I1),next(I1),next(I1))
1 0 1
>>> print(next(I2))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

字典迭代器

>>> D = dict(a=1,b=2,c=3)
>>> D
{'c': 3, 'b': 2, 'a': 1}
>>> 
>>> K = D.keys()
>>> K
dict_keys(['c', 'b', 'a'])
>>> next(K)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'dict_keys' object is not an iterator
>>> I = iter(K) #构造字典迭代器,迭代产生字典的键
>>> next(I)
'c'
>>> next(I)
'b'
>>> for k in D.keys():print(k,end=' ')
... 
c b a