Python yield深度解析

最新推荐文章于 2024-05-14 06:34:01 发布

冯西的技术博客

最新推荐文章于 2024-05-14 06:34:01 发布

阅读量882

点赞数

分类专栏： Python相关

Python相关专栏收录该内容

22 篇文章 0 订阅

订阅专栏

为了理解yield到底是做什么的，你必须知道什么是generators。在将generators之前，我们先看看什么是iterables.

iterables

当我们创建了一个列表后，你就可以一个接一个地读取元素。这样一个接一个地读取元素就叫做iteration:

>>> mylist = [1, 2, 3]
>>> for i in mylist:
...    print(i)
1
2
3

mylist是一个iterable。当你使用一个列表，你创建了一个列表，也就是一个iterable:

>>> mylist = [x*x for x in range(3)]
>>> for i in mylist:
...    print(i)
0
1
4

任何你可以使用“for....in....”都是一个iterable：列表，字符串，文件等等都是

这些iterable是非常方便的，因为你可以读取你想读取的任意多的数据，但是你把它们的值都存储在了内存中，但有很多值的时候这可能就不是你想要的结果了。

Generators

Generators也是iterables， 但是你只可以迭代它们一次。这是因为它并不把所有的值存储在内存中， 它们在运行时产生这些值：

>>> mygenerator = (x*x for x in range(3))
>>> for i in mygenerator:
...    print(i)
0
1
4

这和上面的例子是一样，除了这次你用了()，上面用的是[]。但是，你不可以第二次去执行for i in mygenerator，因为generators只能使用一次：它计算了0，然后丢掉它去计算1，直到计算了4，一个接一个地。

Yield

Yield是一个使用上类似于return的关键字，除了这个函数返回的是一个generator.

>>> def createGenerator():
...    mylist = range(3)
...    for i in mylist:
...        yield i*i
...
>>> mygenerator = createGenerator() # create a generator
>>> print(mygenerator) # mygenerator is an object!
<generator object createGenerator at 0xb7555c34>
>>> for i in mygenerator:
...     print(i)
0
1
4

这里是一个没什么用的例子，但是当你知道你的函数会返回巨大的一堆数据，而你只需要读取它们一次时，它是很方便的。

为了掌握yield，你必须知道当你调用一个函数，你写在函数体里的代码并没有执行。这个函数仅仅是返回了一个generator对象，这是有一点小技巧的。

然后，你的代码将会在每一次用for使用generator的时候被执行。

现在是比较难的部分：

当for第一次调用从你函数中创建的generator对象时，它会从开始执行函数中的代码直到它遇到yield，然后它会返回这个循环的第一个值。然后，每一次其它的调用将会再一次执行你写在函数中的循环，并且返回下一个值，直到没有值可以返回为止。

generator会被认为是空的一旦你的函数运行的时候没有再遇到yield。这可能是因为循环已经到了终点，或者是你并没有再满足“if/else”.

代码解释

Generator

# 这里你创建了一个node对象的方法，这个方法将返回一个generator
def node._get_child_candidates(self, distance, min_dist, max_dist):

  # 这里是每一次你使用generator对象时会调用的代码:

  # If there is still a child of the node object on its left
  # AND if distance is ok, return the next child
  if self._leftchild and distance - max_dist < self._median:
      yield self._leftchild

  # If there is still a child of the node object on its right
  # AND if distance is ok, return the next child
  if self._rightchild and distance + max_dist >= self._median:
      yield self._rightchild

  # If the function arrives here, the generator will be considered empty
  # there is no more than two values: the left and the right children

Caller

# Create an empty list and a list with the current object reference
result, candidates = list(), [self]

# Loop on candidates (they contain only one element at the beginning)
while candidates:

    # Get the last candidate and remove it from the list
    node = candidates.pop()

    # Get the distance between obj and the candidate
    distance = node._get_dist(obj)

    # If distance is ok, then you can fill the result
    if distance <= max_dist and distance >= min_dist:
        result.extend(node._values)

    # Add the children of the candidate in the candidates list
    # so the loop will keep running until it will have looked
    # at all the children of the children of the children, etc. of the candidate
    candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))

return result

这些代码包含着一些巧妙的地方：

循环迭代在一个列表上，但是这个列表在循环迭代的时候也在不断膨胀。这是一个简洁的方式去遍历所有这些嵌套的数据，即使这是有一些危险，因为你可能遇到一个无限循环。在这个例子中，candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))消耗掉所有generator的值，但是while保持着创建新的generator对象，这些对象将从前一个节点中产生不同的值，因为它不会作用在相同的节点上。
extend()方法是一系列对象方法期待着一个iterable并且将整个值加到列表中。

通常我们传入一个列表到它：

>>> a = [1, 2]
>>> b = [3, 4]
>>> a.extend(b)
>>> print(a)
[1, 2, 3, 4]

但是你的代码中它取得了一个generator，这是好的，因为：

你不需要读取它的值两次。
你可以有许多的儿子并且你不想把他们都存储在内存中。

这能工作是因为Python并不介意一个函数的参数是不是一个列表。Python期待iterables，所以它可以工作在字符串，列表，元组和generators！这叫做鸭子类型，并且是其中一个为什么Python这么酷的原因。但是这是另外一个故事了，略去不表。

你可以在此停下了，或者继续读一些generator的高级用法：

Controlling a generator exhaustion

>>> class Bank(): # let's create a bank, building ATMs
...    crisis = False
...    def create_atm(self):
...        while not self.crisis:
...            yield "$100"
>>> hsbc = Bank() # when everything's ok the ATM gives you as much as you want
>>> corner_street_atm = hsbc.create_atm()
>>> print(corner_street_atm.next())
$100
>>> print(corner_street_atm.next())
$100
>>> print([corner_street_atm.next() for cash in range(5)])
['$100', '$100', '$100', '$100', '$100']
>>> hsbc.crisis = True # crisis is coming, no more money!
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> wall_street_atm = hsbc.create_atm() # it's even true for new ATMs
>>> print(wall_street_atm.next())
<type 'exceptions.StopIteration'>
>>> hsbc.crisis = False # trouble is, even post-crisis the ATM remains empty
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> brand_new_atm = hsbc.create_atm() # build a new one to get back in business
>>> for cash in brand_new_atm:
...    print cash
$100
$100
$100
$100
$100
$100
$100
$100
$100
...

这对于各种事情比如控制访问资源是很有用的。

Itertools, your best friend

Itertools模块包含特别的函数来操纵iterables。你可曾想过复制一个generator？连接两个generator？用一个liner分组在一个嵌套的列表中？ Map/Zip而不用创建另外的列表。
那就使用import itertools。

一个例子？让我们看一个四匹马可能的到达顺序：

>>> horses = [1, 2, 3, 4]
>>> races = itertools.permutations(horses)
>>> print(races)
<itertools.permutations object at 0xb754f1dc>
>>> print(list(itertools.permutations(horses)))
[(1, 2, 3, 4),
 (1, 2, 4, 3),
 (1, 3, 2, 4),
 (1, 3, 4, 2),
 (1, 4, 2, 3),
 (1, 4, 3, 2),
 (2, 1, 3, 4),
 (2, 1, 4, 3),
 (2, 3, 1, 4),
 (2, 3, 4, 1),
 (2, 4, 1, 3),
 (2, 4, 3, 1),
 (3, 1, 2, 4),
 (3, 1, 4, 2),
 (3, 2, 1, 4),
 (3, 2, 4, 1),
 (3, 4, 1, 2),
 (3, 4, 2, 1),
 (4, 1, 2, 3),
 (4, 1, 3, 2),
 (4, 2, 1, 3),
 (4, 2, 3, 1),
 (4, 3, 1, 2),
 (4, 3, 2, 1)]

Understanding the inner mechanisms of iteration

迭代是一个指明iterables的过程（实现 __iter__()方法）和iterators（实现 __next__()方法）。Iterables是你能从iterator中取的的任何对象。Iterators是可以让你迭代在iterables上的对象。

关于这篇文章的更多内容请看 how does the for loop work

此文翻译自stackoverflow上的一个帖子，地址是：这里

冯西的技术博客

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
Python yield深度解析

最好的关于yield的解析，绝对是大牛的视野：http://stackoverflow.com/questions/231767/what-does-the-yield-keyword-do-in-python
复制链接

扫一扫