数据科学家不应该错过的8种Python迭代技巧

最新推荐文章于 2024-07-12 16:16:27 发布

weixin_26749889

最新推荐文章于 2024-07-12 16:16:27 发布

阅读量144

点赞数

文章标签： python 人工智能机器学习

原文链接：https://towardsdatascience.com/8-python-iteration-skills-that-data-scientists-shouldnt-miss-out-6b57fb35e00f

版权

重点 (Top highlight)

One major automatic operation that our program does is to repeat particular tasks for us. This is usually achieved using the for loop, the most basic form of which is the following:

程序执行的一项主要自动操作是为我们重复执行特定任务。通常使用for循环来实现此目的，其最基本的形式如下：

for item in iterable:
    # Your tasks go here

Theoretically, we can utilize the basic form to address all iteration-related needs, but in many cases, our code can become more concise if we take advantage of existing functionalities that Python has to offer. In this article, I’d like to review 8 useful techniques that we should consider when we conduct our data science projects.

从理论上讲，我们可以利用基本形式来满足所有与迭代相关的需求，但是在许多情况下，如果我们利用Python必须提供的现有功能，我们的代码将变得更加简洁。在本文中，我想回顾执行数据科学项目时应考虑的8种有用技术。

To illustrate the usefulness of these techniques, I’ll contrast them with the code that only uses the most basic form. From these comparisons, you can see noticeable improvement in code readability.

为了说明这些技术的有用性，我将它们与仅使用最基本形式的代码进行对比。从这些比较中，您可以看到代码可读性的显着提高。

1.使用enumerate()跟踪迭代 (1. Track Iteration With enumerate())

Suppose that we need to track the counting of the iteration. In other words, we want to know how many loops we have iterated. In this case, we should consider the enumerate() function.

假设我们需要跟踪迭代的计数。换句话说，我们想知道我们迭代了多少个循环。在这种情况下，我们应该考虑enumerate()函数。

# An iterable to start with
numbers = ['one', 'two', 'three']


# The basic way
for i in range(len(numbers)):
    print(f"# {i + 1}: {numbers[i]}")
    
# Use enumerate()
for i, number in enumerate(numbers, 1):
    print(f"# {i}: {number}")

To get the index of the item of the sequence, the basic way involves creating a range object, because the typical way (i.e., for item in iterable) doesn’t have index-related information. Although we can find the index using the index() method with a list, it returns the index of the first found element by default. Thus, when there are duplicate items, it will give unintended information.
要获取序列中项目的索引，基本方法涉及创建一个范围对象，因为典型方法(即， for item in iterable )没有与索引相关的信息。尽管我们可以使用带有列表的index()方法找到索引，但是默认情况下，它返回找到的第一个元素的索引。因此，当有重复的项目时，它将给出意想不到的信息。
The enumerate() function creates an enumerate object as an iterator. It can take an optional argument start, which specifies the start of the counter. By default, it starts the counting from 0. In our case, we starts to count the first rendered element from 1. As you can see, the enumerate() function directly gives us the counter and the element.
enumerate()函数创建一个枚举对象作为迭代器。它可以采用可选参数start ，它指定计数器的开始。默认情况下，它从0开始计数。在本例中，我们从1开始对第一个渲染元素进行计数。如您所见， enumerate()函数直接为我们提供了计数器和元素。

2.将可迭代项与zip()配对 (2. Pair Iterables With zip())

When we have a few iterables to begin with and need to retrieve items from each of these iterables at the same positions, we should consider the zip() function, as shown in this example.

当我们有几个可迭代对象开始并且需要在相同位置从每个可迭代对象中检索项目时，我们应该考虑zip()函数，如本示例所示。

# Two iterables
students = ["John", "David", "Ashley"]
scores = [95, 93, 94]


# The basic way
for i in range(len(students)):
    student = students[i]
    score = scores[i]
    print(f"Student {student}: {score}")


# Use zip()
for student, score in zip(students, scores):
    print(f"Student {student}: {score}")

To get the elements at the same index, we create the index by using the range() function, as we did in the previous section. It’s a little tedious to use the indexing to retrieve the element from each iterable.
为了使元素具有相同的索引，我们可以像上一节一样使用range()函数创建索引。使用索引从每个可迭代对象中检索元素有点繁琐。
The zip() function can join multiple iterables and in each loop, it produces a tuple object that comprise elements from each iterable at the same index. We can unpack the tuple object to retrieve the elements very conveniently. The code looks much cleaner, doesn’t it?
zip()函数可以连接多个可迭代对象，并且在每个循环中，它会生成一个元组对象，其中包含来自每个可迭代对象的相同索引的元素。我们可以拆开元组对象以非常方便地检索元素。该代码看起来更简洁，不是吗？
Another thing to note is that the zip() function will zip the iterables matching the shortest iterable among them. If you want the zipping matching the longest iterable, you should use zip_longest() function in the itertools library.
还要注意的另一件事是， zip()函数将对可迭代对象中最短的可迭代对象进行压缩。如果希望最长匹配的zip_longest()迭代，则应在itertools库中使用zip_longest()函数。

3.使用reversed()进行反向迭代 (3. Reverse Iteration With reversed())

When you need to iterate a sequence of elements in the reverse order, it’s best to use the reversed() function. Suppose that students arrive at the classroom at slightly different times, you want to check their assignments using the reverse order — the first student that arrived gets checked last.

当需要以相反的顺序迭代元素序列时，最好使用reversed()函数。假设学生到达教室的时间略有不同，您想按相反的顺序检查作业-第一个到达的学生最后检查一次。

# The students arrival records
students_arrived = ["John", "David", "Ashley"]


# The typical ways
for i in range(1, len(students_arrived)+1):
    print(students_arrived[-i])


for student in students_arrived[::-1]:
    print(student)


# Use reversed()
for student in reversed(students_arrived):
    print(student)

If you stick with the range() function, you’ll use the reverse indexing of the sequence. In other words, we use -1 to refer to the last item of the list and so on.
如果坚持使用range()函数，则将使用序列的反向索引。换句话说，我们使用-1来引用列表的最后一项，依此类推。
Alternatively, we can reverse the list using [::-1] and then iterate the new created list object.
另外，我们可以使用[::-1]反转列表，然后迭代新创建的列表对象。
The best way to do is just simply use the reversed() function. It is a very flexible function, because it can take other sequence data, such as tuples and strings.
最好的方法就是简单地使用reversed()函数。这是一个非常灵活的功能，因为它可以获取其他序列数据，例如元组和字符串。

4.带有filter()的过滤器元素 (4. Filter Elements With filter())

You don’t always need to use all the items in the iterable. In these cases, we can usually check if items satisfy particular criteria before we apply the needed operations. Such condition evaluation and creation of the needed iterator can be easily integrated into one function call — filter(). Let’s see how it works in comparison to the typical way.

您不必总是使用迭代器中的所有项目。在这些情况下，我们通常可以在应用所需的操作之前检查项目是否满足特定条件。这样的条件评估和所需迭代器的创建可以轻松地集成到一个函数调用filter() 。让我们比较一下它与典型方式的工作方式。

# A list of numbers to process
numbers = [1, 3, 4, 8, 9]


# The typical way
for number in numbers:
    if number % 2:
        print(f"Do operations with odd number: {number}")
        
# Use filter()
for number in filter(lambda x: x % 2, numbers):
    print(f"Do operations with odd number: {number}")

The typical way involves evaluating each element.
典型的方法涉及评估每个元素。
The filter() function will evaluate the elements and render the elements as an iterator at the same time. In other words, the function returns an iterator such that it can be used in the for loop.
filter()函数将评估元素并将元素同时呈现为迭代器。换句话说，该函数返回一个迭代器，以便可以在for循环中使用它。
Depending on your needs, you can consider other filter functions, such as filterfalse() in the itertools library, which does the opposite operation (i.e., keep those that evaluate False).
根据您的需要，您可以考虑其他过滤器函数，例如itertools库中的filterfalse() ，它执行相反的操作(即，保留那些评估False函数)。

5.带有chain()的链可迭代项 (5. Chain Iterables With chain())

In a previous section, we’ve talked about how to work with multiple iterables using the zip() function, for which, you can think of that we concatenate iterables in the vertical direction. If you want to concatenate iterables head to tail, you should use the chain() function in the itertools library. Specifically, suppose that you have multiple iterables, you want to iterate each of them sequentially, which is a best use case of the chain() function.

在上一节中，我们讨论了如何使用zip()函数处理多个可迭代对象，为此，您可以想到我们在垂直方向上将可迭代对象串联在一起。如果要从头到尾串联可迭代对象，则应使用itertools库中的chain()函数。具体来说，假设您有多个可迭代对象，则要顺序地迭代它们，这是chain()函数的最佳用例。

from itertools import chain


# A few iterables to begin with
odd_numbers = [1, 3]
even_numbers = [2, 4]


# The typical way
numbers = odd_numbers + even_numbers
for number in numbers:
    print(f"Operate with number: {number}")
    
# Use chain()
for number in chain(odd_numbers, even_numbers):
    print(f"Operate with number: {number}")

The typical way involves concatenating the iterables manually, such as using an intermediate list. If you work with other iterables, such as dictionaries and sets, you need to know how to concatenate them.
典型的方法包括手动串联可迭代对象，例如使用中间列表。如果您使用其他可迭代变量，例如字典和集合，则需要知道如何将它们串联在一起。
The chain() function can chain any number of iterables and make another iterator that produces elements sequentially from each of the iterables. You don’t need to manage another temporary object that holds these elements.
的 chain() 函数可以链接任意数量的可迭代对象，并使另一个迭代器从每个可迭代对象中依次生成元素。您无需管理包含这些元素的另一个临时对象。

6.迭代字典 (6. Iterate Dictionaries)

Dictionaries are a very common data type that stores data in the form of key-value pairs. Because of the implementation using hashes, it’s very fast to look up and retrieve items from dictionaries, and thus they’re the favorite data structure for many developers. The storage of key-value pairs gives us different options to iterate dictionaries.

字典是一种非常常见的数据类型，以键值对的形式存储数据。由于使用散列实现，因此从字典中查找和检索项目非常快，因此它们是许多开发人员最喜欢的数据结构。键值对的存储为我们提供了迭代字典的不同选择。

# The dictionary object
grades = {"John": 99, "Danny": 95, "Ashley": 98}


# Iterate the keys
for name in grades:
    print(name)
for name in grades.keys():
    print(name)
    
# Iterate the values
for grade in grades.values():
    print(grade)
    
# Iterate the items: key-value pairs
for name, grade in grades.items():
    print(f"{name}: {grade}")

To iterate the keys, we’ll just use the keys() method on the dictionary object. Alternatively, we can just use the dictionary object itself as the iterable, which is just a syntactical sugar for the view object created by the keys() method.
要迭代键，我们只在字典对象上使用keys()方法。另外，我们也可以仅将字典对象本身用作可迭代对象，这只是由keys()方法创建的视图对象的语法糖。
To iterate the values, we’ll just use the values() method.
要迭代这些值，我们将只使用values()方法。
To iterate the items in the form of key-value pairs, we’ll use the items() method.
为了以键值对的形式迭代项目，我们将使用items()方法。
Notably, the objects created by these methods are dictionary view objects, which is pretty much like SQL views. In other words, these view objects will get updated when the dict object is updated, and a trivial example is shown below.
值得注意的是，这些方法创建的对象是字典视图对象，这与SQL视图非常相似。换句话说，当dict对象更新时，这些视图对象将被更新，下面是一个简单的示例。

>>> # Current keys
>>> names = grades.keys()
>>> print(f"Before updating: {names}")
Before updating: dict_keys(['John', 'Danny', 'Ashley'])
>>> # Add a new item and check the same view object
>>> grades['Jennifer'] = 97
>>> print(f"After updating: {names}")
After updating: dict_keys(['John', 'Danny', 'Ashley', 'Jennifer'])

7.将理解作为替代 (7. Consider Comprehensions As Alternatives)

If the purpose of the iteration is to create a new list, dictionary, or set object from the iterable, we should consider the comprehension technique, which is more performant and more concise.

如果迭代的目的是从可迭代对象创建新的列表，字典或设置对象，则应考虑理解技术，该技术性能更高且更简洁。

# A list of numbers
primes = [2, 3, 5]


# List Comprehension
# Instead of the following
squares_list0 = list()
for prime in primes:
    squares_list0.append(prime * prime)
# Do this
squares_list1 = [x * x for x in primes]


# Dictionary Comprehension
# Instead of the following
squares_dict0 = dict()
for prime in primes:
    squares_dict0[prime] = prime*prime
# Do this
squares_dict1 = {x: x*x for x in primes}


# Set Comprehension
# Instead of the following
squares_set0 = set()
for prime in primes:
    squares_set0.add(prime)
# Do this
squares_set1 = {x*x for x in primes}

The list comprehension has the following format: [expr for item in iterable], which is the preferred way to create a list object compared to the for loop.
列表推导具有以下格式： [expr for item in iterable] ，与for循环相比，这是创建列表对象的首选方法。
The dictionary comprehension has the following format: {key_expr: value_expr for item in iterable}. Similarly, it’s the preferred way to create a dict object from an iterable.
字典理解具有以下格式： {key_expr: value_expr for item in iterable} 。同样，这是从可迭代对象创建dict对象的首选方法。
The set comprehension has the following format: {expr for item in iterable}, which is the preferred way to create a set object from an iterable compared to the for loop.
set理解具有以下格式： {expr for item in iterable} ，与for循环相比，这是从iterable创建set对象的首选方法。

8.考虑其他条款 (8. Consider the else Clause)

The last but not the least is the consideration of using the else clause in the for loop. It should be noted that it’s not the most intuitive technique to use, as many people don’t even know the existence of the else clause with the for loop. The following case shows you a trivial example.

最后但并非最不重要的一点是要考虑在for循环中使用else子句。应当指出，这并不是最直观的技术，因为许多人甚至不知道for循环中是否存在else子句。下面的情况向您展示了一个简单的示例。

>>> def place_group_order(ordered_items):
...     menu_items = ['beef', 'pork', 'sausage', 'chicken']
...     for name, item in ordered_items.items():
...         if item not in menu_items:
...             print(f"Your group order can't be served, because {name}'s {item} isn't available.")
...             break
...     else:
...         print("Your group order can be served.")
... 
... 
... print("Group 0")
... group0_items = {"John": "beef", "Jack": "tuna", "Jacob": "chicken"}
... place_group_order(group0_items)
... 
... print("\nGroup 1")
... group1_items = {"Aaron": "beef", "Ashley": "pork", "Anna": "sausage"}
... place_group_order(group1_items)
... 
Group 0
Your group order can't be served, because Jack's tuna isn't available.


Group 1
Your group order can be served.

Unlike some people that have mistakenly thought, the code in the else clause will run following the for loop in regular situations. However, if execution encounters any break statement, the code in the else clause will be skipped. As shown in the first function call, the else clause didn’t execute.

与某些人误认为不同， else子句中的代码在常规情况下将在for循环之后运行。但是，如果执行遇到任何break语句，则else子句中的代码将被跳过。如第一个函数调用所示， else子句未执行。

结论 (Conclusions)

In this article, we reviewed eight techniques that we can consider using in the for loop beyond its basic form. Applying these techniques can make you code much more concise and more performant.

在本文中，我们回顾了可以考虑在for循环中使用的八种技术，其基本形式超出了其基本形式。应用这些技术可使您的代码更简洁，更高效。

Thanks for reading this piece.

感谢您阅读本文。

翻译自: https://towardsdatascience.com/8-python-iteration-skills-that-data-scientists-shouldnt-miss-out-6b57fb35e00f

weixin_26749889

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
数据科学家不应该错过的8种Python迭代技巧

重点 (Top highlight)One major automatic operation that our program does is to repeat particular tasks for us. This is usually achieved using the for loop, the most basic form of which is the following:...
复制链接

扫一扫