什么是python中的惰性评估

最新推荐文章于 2022-11-12 20:37:29 发布

weixin_26721705

最新推荐文章于 2022-11-12 20:37:29 发布

阅读量455

点赞数

文章标签： python

原文链接：https://towardsdatascience.com/what-is-lazy-evaluation-in-python-9efb1d3bfed0

版权

If you’ve never heard of Lazy Evaluation before, Lazy Evaluation is an evaluation strategy which delays the evaluation of an expression until its value is needed and which also avoids repeated evaluations (From Wikipedia). It’s usually being considered as a strategy to optimize your code.

如果您以前从未听说过惰性评估，那么惰性评估是一种评估策略，它可以将表达式的评估延迟到需要其值之前，并且还可以避免重复评估(摘自Wikipedia)。通常认为它是优化代码的策略。

Let’s turn this theory into an example. For example, you have a simple expression sum = 1 + 2, Python would try to understand the real meaning of this expression and get the conclusion that sum = 3. This process is called Evaluation and it needs some sort of computation power. In this case, the evaluation is done immediately, therefore it has another name: Strict Evaluation.

让我们将此理论变成一个例子。例如，您有一个简单的表达式sum = 1 + 2 ，Python会尝试理解该表达式的真实含义并得出sum = 3的结论。此过程称为评估，它需要某种计算能力。在这种情况下，评估会立即进行，因此它有另一个名称：严格评估。

On the other hand, we have a non-strict evaluation which is called Lazy Evaluation. The difference is that Lazy Evaluation will not immediately evaluate the expression but only does it when the outcome is needed. It’s a bit like a lazy student who only does the homework when it needs to be submitted to the teacher.

另一方面，我们有一个非严格的评估，称为惰性评估。 区别在于，惰性评估不会立即评估表达，而仅在需要结果时才评估。 有点像一个懒惰的学生，只在需要将作业提交给老师时才做作业。

But being lazy here is not necessarily a bad thing, it can improve the efficiency of your code and save plenty of resources. Luckily, Python has silently applied Lazy Evaluation to many built-in functions in order to optimize your code. And I’m sure that you must be familiar with those functions even without being aware of Lazy Evaluation.

但是在这里懒惰不一定是一件坏事，它可以提高代码效率并节省大量资源。幸运的是，Python已将“惰性评估”静默地应用于许多内置函数，以优化您的代码。而且我敢肯定，即使您不了解惰性评估，也必须熟悉这些功能。

In this article, I will explain how Lazy Evaluation works in Python, which functions benefit from it, and the reason behind it. In the end, I will show you how you can write your own Lazy functions/classes. Let’s get started!

在本文中，我将解释Lazy Evaluation如何在Python中工作，哪些功能可以从中受益以及其背后的原因。最后，我将向您展示如何编写自己的Lazy函数/类。让我们开始吧！

范围() (Range())

Most of the time, Python still evaluates expression immediately. Let’s look at this example. How long do you think it’s gonna take?

大多数时候，Python仍然会立即评估表达式。让我们来看这个例子。您认为需要多长时间？

print([time.sleep(0), time.sleep(1), time.sleep(2)][0])

The answer is 3 seconds. This is because when you create a list, Python will immediately evaluate every item inside the list, even though you only need the first element.

答案是3秒。这是因为创建列表时，即使您只需要第一个元素，Python也会立即评估列表中的每个项目。

Since Python3, there has been a big improvement in making such list traversal more memory-efficient and time-efficient, which is range() function. I bet every Python developer has used this function at lease once in their life. In Python2, range(5) would return a list of 5 elements. As the size of the list increases, more memory is used.

自Python3以来，在使此类列表遍历更有效地利用内存和时间range()即range()函数range()方面有了很大的改进。我敢打赌，每个Python开发人员一生都曾使用过此功能。在Python2中， range(5)将返回5个元素的列表。随着列表大小的增加，将使用更多的内存。

Python 2.7.16
>>> range(5)
[0, 1, 2, 3, 4]
>>> import sys
>>> sys.getsizeof(range(5))
112
>>> sys.getsizeof(range(500))
4072

However in Python 3, range(5) returns a range type. This object can be iterated over to yield a sequence of numbers. No matter how big the range is, the object always has the same size. This is due to the fact that range(5) only stores the start, stop, step values, and calculates each item when it’s needed.

但是在Python 3中， range(5)返回范围类型。可以迭代该对象以产生数字序列。无论范围有多大，对象始终具有相同的大小。这是因为range(5)仅存储start ， stop， step值，并在需要时计算每个项目。

Python 3.7.7
>>> range(5)
range(0, 5)
>>> import sys
>>> sys.getsizeof(range(5))
48
>>> sys.getsizeof(range(500))
48

If you are not familiar with generator, iterator and their benefits, please continue reading this section. Otherwise, feel free to skip this section.

如果您不熟悉Generator，Iterator及其好处，请继续阅读本节。 否则，请随时跳过本节。

iterator > generator

迭代器>生成器

To make it simple, iterator is a bigger concept than generator. Iterator is an object whose class has a __next__ and __iter__ method. Every time you do next() call to the iterator object, you would get the next item in the sequence until the iterator object is exhausted and raise StopIteration. However, generator is a function that returns an iterator. It looks like a normal function except that it uses yield instead of return.

为简单起见，迭代器是一个比生成器更大的概念。 Iterator是一个对象，其类具有__next__和__iter__方法。每次对迭代器对象进行next()调用时，您将获得序列中的下一项，直到迭代器对象用尽并引发StopIteration为止。然而，发电机是返回一个迭代器的功能。它看起来像一个普通函数，除了它使用yield而不是return 。

When the yield statement is executed, the program would suspend the current function execution and returns the yielded value to the caller. This is the key idea of Lazy Evaluation where the value is calculated and returned when the caller is needed and the next value will still be quiet and doing nothing in the program.

执行yield语句时，程序将暂停当前函数的执行，并将yield的值返回给调用方。这是“惰性评估”的关键思想，其中当需要调用者时将计算并返回该值，而下一个值仍然是安静的，并且在程序中不执行任何操作。

To create a generator, there can be 2 ways:

要创建一个生成器，可以有两种方法：

# expression 1
generator1 = (time.sleep(x) for x in range(3))


def sleep():
    for x in range(3):
        yield time.sleep(x)


# expression 2
generator2 = sleep()

Then, let’s improve the first example using range(). Before we get to the final result, there is another “trap”. The following example has 2 functions: use_generator() and use_list(). They look almost the same except use_generator() uses () in islice() function, while use_list() uses []. Such small difference can, however, make a huge impact on the running time.

然后，让我们使用range()改进第一个示例。在获得最终结果之前，还有另一个“陷阱”。以下示例具有2个函数： use_generator()和use_list() 。除了use_generator()在islice()函数中使用() ，而use_list()使用[]之外，它们看起来几乎相同。但是，如此小的差异会对运行时间产生巨大影响。

The answer to this behaviour is already present in the function name. (time.sleep(x) for x in range(3)) is a generator while [time.sleep(x) for x in range(3)] is a list even though it uses range(). Because of that, the function that uses list takes way more time than the other one.

函数名称中已经存在此行为的答案。 (time.sleep(x) for x in range(3))是一个生成器，而[time.sleep(x) for x in range(3)]是一个列表，即使它使用range() 。因此，使用列表的功能比其他功能花费更多的时间。

import time
from itertools import islice


def timing(f):
    def wrap(*args, **kwargs):
        time1 = time.time()
        ret = f(*args, **kwargs)
        time2 = time.time()
        print(
            "{:s} function took {:.3f} ms".format(f.__name__, (time2 - time1) * 1000.0)
        )
        return ret
    return wrap


@timing
def use_generator():
    return list(islice((time.sleep(x) for x in range(3)), 1))


@timing
def use_list():
    return list(islice([time.sleep(x) for x in range(3)], 1))


print(use_generator())
# use_generator function took 0.048 ms
# [None]
print(use_list())
# use_list function took 3003.090 ms
# [None]


print(type((time.sleep(x) for x in range(3))))
# <class 'generator'>
print(type([time.sleep(x) for x in range(3)]))
# <class 'list'>

If you understand this part, then congratulations, you’ve already known 50% about Lazy Evaluation.

如果您了解这部分内容，那么恭喜您，您已经了解了50％的惰性评估。

压缩() (Zip())

A very similar use case is zip() which merges 2 iterables to produce a sequence of tuples. In Python2, zip(*iterables) would return a list of tuples.

一个非常相似的用例是zip() ，它合并2个可迭代对象以生成一个元组序列。在Python2中， zip(*iterables)将返回一个元组列表。

Python 2.7.16
>>> type(zip([1,2],[3,4]))
<type 'list'>
>>> import sys
>>> sys.getsizeof(zip([1,2],[3,4]))
88
>>> sys.getsizeof(zip([i for i in range(500)],[i for i in range(500)]))
4072

While since Python3, it has been improved to return a zip object which is similar to range object that can be iterated over.

自Python3以来，改进了返回一个zip对象的方法，该对象类似于可以迭代的range对象。

Python 3.7.7 
>>> type(zip([1,2],[3,4]))
<class 'zip'>
>>> import sys
>>> sys.getsizeof(zip([1,2],[3,4]))
72
>>> sys.getsizeof(zip([i for i in range(500)],[i for i in range(500)]))
72

I will not repeat the reason again because it has the same idea of range(). But if you want to know other aspects of zip, feel free to read my another article.

我不再重复这个原因，因为它具有range()的相同思想。但是，如果您想了解zip其他方面，请随时阅读我的另一篇文章。

打开() (Open())

This is another built-in function that we probably use everyday and take it for granted. When we open a file, we normally do:

这是我们可能每天都会使用的另一种内置函数，这是理所当然的。打开文件时，通常会执行以下操作：

with open("file.csv", "r") as f:
    for line in f:
      print(line)

with open(...) doesn’t read the entire file and store it in memory, instead it returns a file object that can be iterated over. Because of that, it’s able to efficiently read huge files and not hurt the memory.

with open(...)不会读取整个文件并将其存储在内存中，而是会返回一个可以迭代的文件对象。因此，它能够有效地读取大文件，而不会损害内存。

Lambda表达式 (Lambda expression)

A couple of weeks ago, I received a question from a reader about Lambda expression which actually triggered me to write this article. His question is:

几周前，我收到了一个读者关于Lambda表达的问题，这实际上促使我写了这篇文章。他的问题是：

Why does a lambda map object like x = map(lambda x: x*2, [1,2,3,4,5]) doesn’t take any space? But if you do list(x), it will print all the values and take space in the memory?

为什么像x = map(lambda x：x * 2，[1,2,3,4,5])这样的lambda地图对象不占用任何空间？但是，如果执行list(x)，它将打印所有值并占用内存空间吗？

I hope by far, you should have a clue on what’s going on here. The map object is also a lazy object that can be iterated over. The computation x*2 will be done for only 1 item in each loop. When you do list(x), you basically compute all the values at one time. If you just want to iterate over the map object, you don’t have to do list(x).

我希望到目前为止，您应该对这里发生的事情有所了解。 map对象也是可以迭代的惰性对象。每个循环中仅对一项执行计算x*2 。当您执行list(x) ，基本上可以一次计算所有值。如果只想遍历map对象，则不必执行list(x) 。

map_obj = map(lambda x: x*2, [1,2,3,4,5])
for i in map_obj:
    print(i)
# 2,4,6,8,10


import sys
print(sys.getsizeof(x))
# 48


print(list(x))
# [2, 4, 6, 8, 10]
print(sys.getsizeof(list(x)))
# 56

如何编写惰性评估函数/类？(How to write a Lazy Evaluation function/class?)

In the last part of the article, I want to bring us to the next level where we write our own Lazy Evaluation function/class. This helps us to extend the capability beyond the built-in functions.

在本文的最后一部分，我想将我们带入一个新的水平，在那里我们编写自己的惰性评估函数/类。这有助于我们将功能扩展到内置功能之外。

As we have understood that a key part of Lazy Evaluation is nothing more than a generator. Therefore, we can simply write our function as a generator.

如我们所知，懒惰评估的关键部分仅是生成器。因此，我们可以简单地将函数编写为生成器。

Lazy function — generator

惰性函数-生成器

def lazy_loading(items):
    for i in items:
        # you can have complex logic here
        yield i ** 2


items = [i for i in range(100)]
for i in lazy_loading(items):
    print(i)

Lazy property — decorator

懒惰的属性—装饰

Another common use case of customized Lazy Evaluation is the initialization of class properties. When we initialize a class, certain properties might take long time to calculate. In the following example, the property cities takes longer time because it needs to invoke an API to get a list of city names. Therefore, it would be a waste of time if we don’t actually need this value for some country objects.

自定义的惰性评估的另一个常见用例是类属性的初始化。当我们初始化一个类时，某些属性可能需要很长时间才能计算出来。在以下示例中，属性cities需要更长的时间，因为它需要调用API来获取城市名称列表。因此，如果我们实际上不需要某些国家对象的此值，那将是浪费时间。

A nice solution present in this blog is to create a decorator for such lazy properties, so that the expensive operation will be done only if this property is needed. As you can see from the console output, cities property is called is printed out after we print out china.cities.

本博客中存在的一个不错的解决方案是为此类惰性属性创建装饰器，以便仅在需要此属性时才执行昂贵的操作。从控制台输出中可以看到，在我们打印出china.cities之后，被打印出来的cities property is called 。

def lazy_property(fn):
    attr_name = '_lazy_' + fn.__name__


    @property
    def _lazy_property(self):
        if not hasattr(self, attr_name):
            setattr(self, attr_name, fn(self))
        return getattr(self, attr_name)
    return _lazy_property


class Country:
    def __init__(self, name, capital):
        self.name = name
        self.capital = capital


    @lazy_property
    def cities(self):
        # expensive operation to get all the city names (API call)
        print("cities property is called")
        return ["city1", "city2"]


china = Country("china", "beijing")
print(china.capital)
# beijing
print(china.cities)
# cities property is called
# ['city1', 'city2']

I hope this article can inspire you to have a new view on code optimization.

我希望本文能激发您对代码优化有新的看法。

参考： (Reference:)

翻译自: https://towardsdatascience.com/what-is-lazy-evaluation-in-python-9efb1d3bfed0

weixin_26721705

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
什么是python中的惰性评估

If you’ve never heard of Lazy Evaluation before, Lazy Evaluation is an evaluation strategy which delays the evaluation of an expression until its value is needed and which also avoids repeated evaluat...
复制链接

扫一扫