python查找最快_python在列表中查找项目的最快方法

最新推荐文章于 2022-11-09 10:27:55 发布

weixin_26640581

最新推荐文章于 2022-11-09 10:27:55 发布

阅读量679

点赞数

文章标签： python java django

原文链接：https://medium.com/@sebawitowski/python-the-fastest-way-to-find-an-item-in-a-list-19fd950664ec

版权

本文探讨了在Python中如何以最高效的方式查找列表中的项目，引用了一篇来自Medium的文章，详细介绍了查找方法。

摘要由CSDN通过智能技术生成

python查找最快

If you want to find the first number that matches some criteria, what do you do? The easiest way is to write a loop that checks numbers one by one and returns when it finds the correct one.

如果要查找符合某些条件的第一个数字，该怎么办？最简单的方法是编写一个循环，逐一检查数字，并在找到正确的数字时返回。

Let’s say we want to get the first number divided by 42 and 43 (that’s 1806). If we don’t have a predefined set of elements (in this case, we want to check all the numbers starting from 1), we might use a “while loop”.

假设我们要获得第一个数字除以42和43(即1806)。如果我们没有一组预定义的元素(在这种情况下，我们要检查从1开始的所有数字)，则可以使用“ while循环”。

# find_item.py


def while_loop():
    item = 1
    # You don't need to use parentheses, but they improve readability
    while True:
        if (item % 42 == 0) and (item % 43 == 0):
            return item
        item += 1

It’s pretty straightforward:

这很简单：

Start from number 1
从1开始
Check if that number can be divided by 42 and 43
检查该数字是否可以除以42和43
If yes, return it and stop the loop. Otherwise, check the next number
如果是，请返回并停止循环。否则，请检查下一个号码

If we have a list of items that we want to check, we will use a “for loop” instead. I know that the number I’m looking for is smaller than 10 000, so let’s use that as the upper limit:

如果我们有要检查的项目列表，则将使用“ for循环”。我知道我要寻找的数字小于10000，所以让我们将其用作上限：

# find_item.py


def for_loop():
    for item in range(1, 10000):
        if (item % 42 == 0) and (item % 43 == 0):
            return item

Let’s compare both solutions (benchmarks are done with Python 3.8):

让我们比较两个解决方案(基准测试是使用python 3.8完成的)：

$ python -m timeit -s "from find_item import while_loop" "while_loop()"
2000 loops, best of 5: 134 usec per loop


$ python -m timeit -s "from find_item import for_loop" "for_loop()"
2000 loops, best of 5: 103 usec per loop

“While loop” is around 30% slower than the “for loop” (134/103≈1.301).

“循环”比“循环”(134 /103≈1.301)慢30％。

Loops are optimized to iterate over a collection of elements. Trying to manually do the iteration (for example, by referencing elements in a list through an index variable) will be a slower and often over-engineered solution.

循环经过优化，可以迭代元素集合。尝试手动执行迭代(例如，通过索引变量引用列表中的元素)将是一个较慢且通常过度设计的解决方案。

Python 2闪回 (Python 2 flashback)

In Python 2, functions like range, filter, or zip were eager , so they would always create the whole collection when initialized. All those elements would be loaded to the memory, increasing the execution time of your code and its memory usage. To avoid this behavior, you had to use their lazy equivalents like xrange, ifilter, or izip.

在Python 2中， range ， filter或zip类的函数很热切，因此它们在初始化时始终会创建整个集合。所有这些元素都将被加载到内存中，从而增加了代码的执行时间及其内存使用量。为了避免这种行为，您必须使用它们的惰性等效项，例如xrange ， ifilter或izip 。

Out of curiosity, let’s see how slow is the for_loop() function if we run it with Python 2.7.18 (the latest and last version of Python 2):

出于好奇，让我们看看如果使用Python 2.7.18(Python 2的最新和最新版本)运行for_loop()函数有多慢：

$ pyenv shell 2.7.18
$ python -m timeit -s "from find_item import for_loop" "for_loop()"
10000 loops, best of 3: 151 usec per loop

That’s almost 50% slower than running the same function in Python 3 (151/103≈1.4660). Updating Python version is one of the easiest performance wins you can get!

这比在Python 3(151 /103≈1.4660)中运行相同的函数慢了将近50％。更新Python版本是您可以获得的最简单的性能胜利之一！

If you are wondering what’s pyenv and how to use it to quickly switch Python versions, check out this section of my PyCon 2020 workshop on Python tools.

如果您想知道什么是pyenv以及如何使用它来快速切换Python版本，请查看我的PyCon 2020研讨会关于Python工具的这一部分。

演示地址

Let’s go back to our “while loop” vs. “for loop” comparison. Does it matter if the element we are looking for is at the beginning or at the end of the list?

让我们回到“ while循环”与“ for循环”的比较。我们要查找的元素在列表的开头还是结尾有关系吗？

def while_loop2():
    item = 1
    while True:
        if (item % 98 == 0) and (item % 99 == 0):
            return item
        item += 1


def for_loop2():
    for item in range(1, 10000):
        if (item % 98 == 0) and (item % 99 == 0):
            return item

This time, we are looking for number 9702, which is at the very end of our list. Let’s measure the performance:

这次，我们正在寻找号码9702，它位于列表的最后。让我们衡量一下性能：

$ python -m timeit -s "from find_item import while_loop2" "while_loop2()"
500 loops, best of 5: 710 usec per loop


$ python -m timeit -s "from find_item import for_loop2" "for_loop2()"
500 loops, best of 5: 578 usec per loop

There is almost no difference. “While loop” is around 22% slower this time (710/578≈1.223). I performed a few more tests (up to a number close to 100 000 000), and the difference was always similar (in the range of 20–30% slower).

几乎没有区别。这次“ While loop”慢了大约22％(710 /578≈1.223)。我进行了更多测试(最多接近1亿)，并且差异始终相似(慢20％至30％)。

So far, the collection of items we wanted to iterate over was limited to the first 10 000 numbers. But what if we don’t know the upper limit? In this case, we can use the count function from the itertools library.

到目前为止，我们要迭代的项的集合仅限于前10000个数字。但是，如果我们不知道上限怎么办？在这种情况下，我们可以使用itertools库中的count函数。

from itertools import count


def count_numbers():
    for item in count(1):
        if (item % 42 == 0) and (item % 43 == 0):
            return item

count(start=0, step=1) will start counting numbers from the start parameter, adding the step in each iteration. In my case, I need to change the start parameter to 1, so it works the same as the previous examples.

count(start=0, step=1)将从start参数开始对数字进行计数，并在每次迭代中添加step 。就我而言，我需要将start参数更改为1，因此其工作原理与前面的示例相同。

count works almost the same as the "while loop" that we made at the beginning. How about the speed?

count工作原理与开始时的“ while循环”几乎相同。速度如何？

$ python -m timeit -s "from find_item import count_numbers" "count_numbers()"
2000 loops, best of 5: 109 usec per loop

It’s almost the same as the “for loop” version. So count is a good replacement if you need an infinite counter.

它与“ for循环”版本几乎相同。因此，如果需要无限的计数器， count是一个很好的替代方法。

A typical solution for iterating over a list of items is to use a list comprehension. But we want to exit the iteration as soon as we find our number, and that’s not easy to do with a list comprehension. It’s a great tool to go over the whole collection, but not in this case.

迭代项目列表的典型解决方案是使用列表理解。但是，我们希望在找到我们的数字后立即退出迭代，而使用列表推导并不容易。这是遍及整个集合的好工具，但在这种情况下不是。

Let’s see how bad it is:

让我们看看它有多糟糕：

def list_comprehension():
    return [item for item in range(1, 10000) if (item % 42 == 0) and (item % 43 == 0)][0]

$ python -m timeit -s "from find_item import list_comprehension" "list_comprehension()"
500 loops, best of 5: 625 usec per loop

That’s really bad — it’s a few times slower than other solutions! It takes the same amount of time, no matter if we search for the first or last element. And we can’t use count here.

真的很糟糕–比其他解决方案要慢几倍！无论我们搜索第一个还是最后一个元素，它都花费相同的时间。而且我们不能在这里使用count 。

But using a list comprehension points us in the right direction — we need something that returns the first element it finds and then stops iterating. And that thing is a generator! We can use a generator expression to grab the first element matching our criteria.

但是使用列表推导会为我们指明正确的方向-我们需要一些返回它找到的第一个元素然后停止迭代的东西。那东西就是发电机！我们可以使用生成器表达式来获取符合我们条件的第一个元素。

def generator():
    return next(item for item in count(1) if (item % 42 == 0) and (item % 43 == 0))

The whole code looks very similar to a list comprehension, but we can actually use count. Generator expression will execute only enough code to return the next element. Each time you call next(), it will resume work in the same place where it stopped the last time, grab the next item, return it, and stop again.

整个代码看起来与列表理解非常相似，但是我们实际上可以使用count 。生成器表达式将仅执行足以返回下一个元素的代码。每次调用next() ，它将在上次停止的相同位置恢复工作，获取下一个项目，将其返回，然后再次停止。

$ python -m timeit -s "from find_item import generator" "generator()"
2000 loops, best of 5: 110 usec per loop

It takes almost the same amount of time as the best solution we have found so far. And I find this syntax much easier to read — as long as we don’t put too many if s there!

与迄今为止我们发现的最佳解决方案所花费的时间几乎相同。而且我觉得这句法更容易阅读-只要我们不要把太多的if在那儿！

Generators have the additional benefit of being able to “suspend” and “resume” counting. We can call next() multiple times, and each time we get the next element matching our criteria. If we want to get the first three numbers that can be divided by 42 and 43 - here is how easily we can do this with a generator expression:

生成器的另一个好处是能够“暂停”和“继续”计数。我们可以多次调用next() ，每次获取下一个符合条件的元素。如果我们想获得可以除以42和43的前三个数字-这是我们使用生成器表达式执行此操作的容易程度：

def generator_3_items():
    gen = (item for item in count(1) if (item % 42 == 0) and (item % 43 == 0))
    return [next(gen), next(gen), next(gen)]

Compare it with the “for loop” version:

将其与“ for循环”版本进行比较：

def for_loop_3_items():
    items = []
    for item in count(1):
        if (item % 42 == 0) and (item % 43 == 0):
            items.append(item)
            if len(items) == 3:
                return items

Let’s benchmark both versions:

让我们对两个版本进行基准测试：

$ python -m timeit -s "from find_item import for_loop_3_items" "for_loop_3_items()"
1000 loops, best of 5: 342 usec per loop


$ python -m timeit -s "from find_item import generator_3_items" "generator_3_items()"
1000 loops, best of 5: 349 usec per loop

Performance-wise, both functions are almost identical. So when would you use one over the other? “For loop” lets you write more complex code. You can’t put nested “if” statements or multiline code with side effects inside a generator expression. But if you only do simple filtering, generators can be much easier to read.

在性能方面，两个功能几乎相同。那么，什么时候可以使用另一种呢？ “ For循环”使您可以编写更复杂的代码。您不能将嵌套的“ if”语句或具有副作用的多行代码放在生成器表达式中。但是，如果仅执行简单过滤，则生成器可能更易于阅读。

Generator expression combined with next() is a great way to grab one or more elements based on specific criteria. It's memory-efficient, fast, and easy to read - as long as you keep it simple. When the number of "if statements" in the generator expression grows, it becomes much harder to read (and write).

生成器表达式与next()结合使用是一种根据特定条件获取一个或多个元素的好方法。只要保持简单，它就可以节省内存，快速且易于阅读。当生成器表达式中的“ if语句”数量增加时，读取(和写入)变得更加困难。

With complex filtering criteria or many if s, "for loop" is a more suitable choice that doesn't sacrifice the performance.

对于复杂的过滤条件或许多if ，“ for循环”是一个更合适的选择，而不会牺牲性能。

Originally published at https://switowski.com/blog/find-item-in-a-list.

最初发布在https://switowski.com/blog/find-item-in-a-list中。