python过滤_在纯Python中映射，过滤和减少-CSDN博客

python过滤

基础 (The Basics)

Map, filter and reduce are functions that help you handle all kinds of collections. They are at the heart of modern technologies such as Spark and various other data manipulation and storage frameworks. But they can also very powerful helpers when working with vanilla Python.

映射，过滤和归约是可以帮助您处理各种集合的函数。它们是诸如Spark和其他各种数据处理和存储框架之类的现代技术的核心。但是在使用香草Python时，他们也可以提供非常强大的帮助。

地图 (Map)

Map is a function that takes as an input a collection e.g. a list [‘bacon’,’toast’,’egg’] and a function e.g. upper(). Then it will move every element of the collection through this function and produce a new collection with the same count of elements. Let’s look at an example

Map是一个函数，它以一个集合作为输入，例如列表['bacon'，'toast'，'egg']和一个函数(例如upper())作为输入。然后它将通过此函数移动集合的每个元素，并产生一个具有相同元素计数的新集合。让我们看一个例子

map_obj = map(str.upper,['bacon','toast','egg'])
print(list(map_obj))
>>['BACON', 'TOAST', 'EGG']

What we did here is use the map(some_function, some_iterable) function combined with the upper function (this function capitalizes each character of a string). As we can see we produced for every element in the input list another element in the output list. We receive always the same amount of elements in the output as we will put into it! Here we send 3 in and received 3 out, this is why we call it an N to N function. Let’s look at how one can use it.

我们在这里所做的是将map(some_function，some_iterable)函数与上层函数结合使用(该函数将字符串的每个字符大写)。如我们所见，我们为输入列表中的每个元素生成了输出列表中的另一个元素。我们在输出中始终收到与输入相同数量的元素！在这里，我们发送3 in并接收3 out，这就是为什么我们称其为N对N函数。让我们看看如何使用它。

def count_letters(x):
    return len(list(x))map_obj = map(count_letters,['bacon','toast','egg'])
print(list(map_obj))
>>[6, 5, 3]

In this example we defined our own function count_letters(). The collection was passed through the function and in the output we have the amount of letters of each string! Let’s make this a little bit sexier using a lambda expression.

在此示例中，我们定义了自己的函数count_letters()。集合通过函数传递，在输出中，我们有每个字符串的字母数量！让我们使用lambda表达式使它更性感。

map_obj = map(lambda x:len(list(x)),['bacon','toast','egg'])
print(list(map_obj))
>>[6, 5, 3]

A lambda expression is basically just a short hand notation for defining a function. If you are not familiar with them you can check out how they work here. However, it should be fairly easy to understand how they work from the following examples.

Lambda表达式基本上只是用于定义函数的简写形式。如果您不熟悉它们，可以在这里查看它们的工作方式。但是，从以下示例中很容易理解它们的工作原理。

过滤 (Filter)

In contrast to Map, which is a N to N function. Filter is a N to M function where N≥M. What this means is that it reduces the number of elements in the collection. In other words, it filters them! As with map the notation goes filter(some_function, some_collection). Let’s check this out with an example.

与Map相反，后者是N对N的函数。过滤器是N到M的函数，其中N≥M。这意味着它减少了集合中元素的数量。换句话说，它过滤了它们！与map一样，符号使用filter(some_function，some_collection)。我们来看一个例子。

def has_the_letter_a_in_it(x):
    return 'a' in x# Let's first check out what happens with map
map_obj = map(has_the_letter_a_in_it,['bacon','toast','egg'])
print(list(map_obj))
>>[True,True,False]# What happens with filter?
map_obj = filter(has_the_letter_a_in_it,['bacon','toast','egg'])
print(list(map_obj))
>>['bacon', 'toast']

As we can see it reduces the number of elements in the list. It does so by calculating the return value for the function has_the_letter_a_in_it() and only returns the values for which the expression returns True.

如我们所见，它减少了列表中的元素数量。它通过计算函数has_the_letter_a_in_it()的返回值来实现，并且仅返回表达式返回True的值。

Again this looks much sexier using our all-time favorite lambda!

再次使用我们一直以来最喜欢的lambda看起来更性感！

map_obj = filter(lambda x: 'a' in x, ['bacon','toast','egg'])
print(list(map_obj))
>>['bacon', 'toast']

减少 (Reduce)

Let’s meet the final enemy and probably the most complicated of the 3. But no worries, it is actually quite simple. It is a N to 1 relation, meaning no matter how much data we pour into it we will get one result out of it. The way it does this is by applying a chain of the function we are going to pass it. Out of the 3 it is the only one we have to import from the functools. In contrast to the other two it can most often be found using three arguments reduce(some_function, some_collection, some_starting_value), the starting value is optional but it is usually a good idea to provide one. Let’s have a look.

让我们遇到最后的敌人，也许是最复杂的3个敌人。但是不用担心，这实际上很简单。这是N对1的关系，这意味着无论我们注入多少数据，我们都将从中得到一个结果。它的实现方式是通过应用一系列函数来传递它。在这3个中，这是我们必须从functools导入的唯一一个。与其他两个相比，通常可以使用三个参数reduce(some_function，some_collection，some_starting_value)来找到它，起始值是可选的，但是通常最好提供一个。我们来看一下。

from functools import reducemap_obj = reduce(lambda x,y: x+" loves "+y, ['bacon','toast','egg'],"Everyone")
print(map_obj)
>>'Everyone loves bacon loves toast loves egg'

As we can see we had to use a lambda function which takes two arguments at a time, namely x,y. Then it chains them through the list. Let’s visualize how it goes through the list

正如我们所看到的，我们必须使用一个lambda函数，该函数一次需要两个参数，即x，y。然后，将它们链接通过列表。让我们看一下它如何遍历列表

x=“Everyone”, y=”bacon”: return ”Everyone loves bacon“
x =“所有人”，y =“培根”：返回“每个人都喜欢培根”
x=”Everyone loves bacon“, y=”toast”: return ”Everyone loves bacon loves toast“
x =“每个人都喜欢培根”，y =“吐司”：返回“每个人都喜欢烟肉爱烤面包”
x=”Everyone loves bacon loves toast“, y=”egg” : return ”Everyone loves bacon loves toast loves eggs“
x =“每个人都爱培根爱烤面包”，y =“蛋”：返回“每个人都爱培根爱烤面包爱鸡蛋”

So we have our final element ”Everyone loves bacon loves toast loves eggs“. Those are the basic concepts to move with more ease through your processing pipeline. One honorable mention here is that you can not in every programming language assume that the reduce function will handle the element in order, e.g. in some languages it could be “‘Everyone loves egg loves toast loves bacon’”.

因此，我们有最后一个要素“每个人都爱培根，爱吐司，爱鸡蛋”。 这些是在处理管道中轻松移动的基本概念。这里值得一提的是，您不能在每种编程语言中都假定reduce函数将按顺序处理元素，例如，在某些语言中，它可能是“'每个人都爱鸡蛋爱烤面包爱培根'”。

结合 (Combine)

To make sure we understood the concepts let’s use them together and build a more complex example.

为了确保我们理解这些概念，让我们一起使用它们并构建一个更复杂的示例。

from functools import reducevals = [0,1,2,3,4,5,6,7,8,9]
# Let's add 1 to each element >> [1,2,3,4,5,6,7,8,9,10]
map_obj = map(lambda x: x+1,vals)
# Let's only take the uneven ones >> [1, 3, 5, 7, 9]
map_obj = filter(lambda x: x%2 == 1,map_obj)
# Let's reduce them by summing them up, ((((0+1)+3)+5)+7)+9=25
map_obj = reduce(lambda x,y: x+y,map_obj,0)
print(map_obj)
>> 25

As we can see we can build pretty powerful things using the combination of the 3. Let’s move to one final example to illustrate what this might be used for in practice. To do so we load up a small subset of a dataset and will print the cities which are capitals and have more than 10 million inhabitants!

正如我们所看到的，我们可以使用3的组合来构建功能强大的东西。让我们来看一个最后的示例，以说明这在实践中可能会用到什么。为此，我们加载了数据集的一小部分，并将打印首都和人口超过一千万的城市！

from functools import reduce#Let's define some data
data=[['Tokyo', 35676000.0, 'primary'], ['New York', 19354922.0, 'nan'], ['Mexico City', 19028000.0, 'primary'], ['Mumbai', 18978000.0, 'admin'], ['São Paulo', 18845000.0, 'admin'], ['Delhi', 15926000.0, 'admin'], ['Shanghai', 14987000.0, 'admin'], ['Kolkata', 14787000.0, 'admin'], ['Los Angeles', 12815475.0, 'nan'], ['Dhaka', 12797394.0, 'primary'], ['Buenos Aires', 12795000.0, 'primary'], ['Karachi', 12130000.0, 'admin'], ['Cairo', 11893000.0, 'primary'], ['Rio de Janeiro', 11748000.0, 'admin'], ['Ōsaka', 11294000.0, 'admin'], ['Beijing', 11106000.0, 'primary'], ['Manila', 11100000.0, 'primary'], ['Moscow', 10452000.0, 'primary'], ['Istanbul', 10061000.0, 'admin'], ['Paris', 9904000.0, 'primary']]map_obj = filter(lambda x: x[2]=='primary' and x[1]>10000000,data)
map_obj = map(lambda x: x[0], map_obj)
map_obj = reduce(lambda x,y: x+", "+y, map_obj, 'Cities:')
print(map_obj)
>> Cities:, Tokyo, Mexico City, Dhaka, Buenos Aires, Cairo, Beijing, Manila, Moscow