python 陷阱_五个python陷阱

python 陷阱

Many posts out there catalog the “gotchas” of Python and/or its most popular packages. This blog is yet another entry in that series, but with one difference: I’ve actually, genuinely made all these mistakes myself (some of them with embarrassing frequency). However, just knowing the definition and type of each classic Python object should help you avoid most (if not all) of these mistakes in your work!

那里的许多帖子都列出了Python和/或其最受欢迎的软件包的“陷阱”。 该博客是该系列的另一篇文章,但有一个不同:我实际上是真正地亲自犯了所有这些错误(其中一些错误的发生频率令人尴尬)。 但是,仅了解每个经典Python对象的定义和类型将有助于您避免工作中的大多数(如果不是全部)错误!

Without any further ado, let’s begin.

事不宜迟,让我们开始吧。

1.真假还是虚假:NumPy.nan和Pandas.nan (1. Truthy Or Falsy: NumPy.nan and Pandas.nan)

You probably know that for checking if an object’s value is True or False, you can do something like the following:

您可能知道,要检查对象的值是True还是False,可以执行以下操作:

lst = [1, 2, 3]
a = None# rather than this ...
if len(lst) > 0 or a is not None: print('success')# you can simply do this ...
if lst or not a: print('success')

That’s because empty lists (and all other empty sequences/collections), False, None, 0 (of any numeric type) all evaluate to False. This set of objects and values are therefore known as “falsy”.

这是因为空列表(以及所有其他空序列/集合), FalseNone ,0(任何数字类型)都计算为False。 因此,这组对象和值称为“ falsy ”。

Consider the following example: you have a dictionary of items and their costs, which you use to build a dataframe for some analysis.

考虑以下示例:您拥有一个包含项及其成本的字典,可用于构建数据框进行某些分析。

import pandas as pdd1 = {'item': ['foo', 'bar', 'baz'], 'cost': [100, None, 20]}
df = pd.DataFrame(d1)
# lots of analysis here ...# if an item has a cost, print the item and its cost
for i, r in df.iterrows():
if r['cost']:
print(f"item = {r['item']} and cost = {r['cost']}")

You expect:

您期望:

item = foo, cost = 100.0
item = baz, cost = 20.0

But you get:

但是你得到:

item = foo, cost = 100.0
item = bar, cost = nan
item = baz, cost = 20.0

The reason is that Pandas considers None to be missing or unavailable, and thus represents it with nan. Since nan is not falsy, it flows through.

原因是熊猫认为None缺失或不可用,因此用nan表示它。 由于nan不是虚假的,所以它会流过。

This may seem counterintuitive at first, but nans are missing values, and if something is missing, you don’t really know what it is. For example, could a given nan in a numeric column represent 100 (truthy) or 0 (falsy)? If it’s a string, is it ‘hello’ (truthy) or an empty string (which, you guessed it, is falsy)? Pandas isn’t sure, so it doesn’t assume falsiness.

乍一看,这似乎违反直觉,但是nan缺少值,如果丢失了某些内容,则您实际上并不知道它是什么。 例如,数字列中的给定nan代表100(真实)还是0(虚假)? 如果是字符串,是“ hello”(真实)还是空字符串(您猜到是虚假的)? 熊猫不确定,因此不承担虚假行为。

Be careful to avoid considering nan falsy.

注意避免考虑nan falsy。

2. NaN == NaN吗? (2. Is NaN == NaN?)

Consider the example below:

考虑下面的示例:

>>> s1 = {True, 1, 1.0}
>>> s1
{True}

This is expected, as we know that 1==1.0==True evaluates to True.

这是预料之中的,因为我们知道1==1.0==True计算结果为True。

Now take a look at this case:

现在看一下这种情况:

>>> s2 = {float('nan'), float('nan'), float('nan')}
>>> s2
{nan, nan, nan}

Consider the logic from the previous example: since nan is a missing value, it’s impossible to know if the three missing values are the same. Consequently, nan == nan always evaluates to False. (If you’d like to read more about this, make sure to check out the PEP 754 — IEEE 754 Floating Point Special Values.)

考虑上一个示例中的逻辑:由于nan是一个缺失值,因此不可能知道三个缺失值是否相同。 因此, nan == nan总是求值为False。 (如果您想了解更多有关此内容,请确保签出PEP 754 — IEEE 754浮点特殊值 。)

3.全部和任何 (3. All and Any)

I had thought any and all work as follows: If you have an iterable, any will return True if any element of the iterable is true. And all will return True if all the elements of the iterable are true.

我曾想过anyall的工作如下:如果你有一个迭代, any会返回True如果迭代的任何元素是真实的。 如果iterable的所有元素都为 true,则所有人 all将返回True。

So, let’s check them out. First, any:

因此,让我们检查一下。 首先, any

>>> any([0, None, False])  # expected False
False
>>> any([0, None, False, 'False']) # expected True
True
>>> any([float('nan'), 0]) # expected True
True
>>> any([]) # expected False
False

So far, so good! Let’s check all:

到目前为止,一切都很好! 让我们检查all

>>> all((1, 0, 0))  # expected False
False
>>> all([float('nan'), 1]) # expected True
True
>>> all([]) # expected False
True

I thought all the elements should be true and an empty list is clearly falsy. So why did all([]) evaluate to True?

我认为所有要素都应该是真实的,而空白清单显然是虚假的。 那么,为什么all([])评估结果都为True?

The answer is in Python documentation (my emphasis added):

答案在Python文档中 (我强调了):

all(iterable)

all (可迭代)

Return True if all elements of the iterable are true (or if the iterable is empty).

如果iterable的所有元素都为true( 或者iterable为空 ), 返回True

so close!

But why is that? Well, long story short, it’s because of vacuous truth. “If I am seven feet tall, I am also a trillionaire superhero” is always true for me. I’m about six inches short of that (disregard any of my colleagues’ testimony to the contrary), so the statement is true no matter what comes after the comma. It’s a statement that’s never false because I’m not seven feet tall, and therefore it’s impossible to evaluate my trillionaire superhero status.

但是为什么呢? 好吧,长话短说,这是因为虚无的事实 。 “如果我七英尺高,我也是亿万富翁超级英雄”对我来说永远都是正确的。 我还差六英寸(不管我的同事的证词相反),所以无论逗号后面是什么 ,这种说法都是正确的。 这是永远不会出错的声明,因为我身高不到七英尺,因此无法评估我的亿万富翁超级英雄身份。

Apart from reading the documentation more closely, the best way to remember this behavior of all() is to think of it not as “if all the elements of the iterable are true,” but as “if there is no false element in the iterable.” When the iterable is empty, there can be no false element within it, meaning all([]) evaluates to True.

除了更仔细地阅读文档之外,记住all()这种行为的最佳方法不是将其视为“如果iterable的所有元素都是正确的”,而是将其视为“如果iterable中没有错误的元素” 。” 当iterable为空时,其中将没有false元素,这意味着all([])计算结果为True。

4.可变默认参数 (4. Mutable Default Arguments)

I think this is by far the most common Python “gotcha”. Let’s just dive in.

我认为这是迄今为止最常见的Python“陷阱”。 让我们潜入吧。

Consider the following function:

考虑以下功能:

def foo(a, a_list=[]):
a_list.append(a)
return a_list

Let’s use this function foo to create two separate lists:

让我们使用此函数foo创建两个单独的列表:

>>> my_list = foo(10)
>>> print(my_list) # expected [10]
[10]>>> another_list = foo(20)
>>> print(another_list) # expected [20][10, 20]

You might have expected that each function call creates a new list, but the list created during the first invocation of foo is used in each successive call.

您可能希望每个函数调用都会创建一个新列表,但是在每个后​​续调用中都将使用在首次调用foo期间创建的列表。

This happens because, in Python, default arguments are evaluated only when the function is defined, not each time the function is called (you can read more about that in Python’s documentation). If we use a mutable default argument (like the a_list=[] in foo) and change it in the function, that object is mutated whenever the function is called.

发生这种情况的原因是,在Python中, 仅在定义函数时才对默认参数进行求值,而不是在每次调用该函数时才求值(您可以在Python文档中了解更多内容)。 如果我们使用可变的默认参数(例如fooa_list=[] )并在函数中对其进行更改,则每次调用该函数时,该对象都会发生突变。

The best way to avoid any such confusion is to use an immutable default argument in the function. Here’s an accordingly updated version of foo:

避免任何此类混淆的最佳方法是在函数中使用不可变的默认参数。 这是foo的相应更新版本:

def foo_v2(a, a_list=None):
if a_list is None:
a_list = []
a_list.append(a)
return a_list

This definition of foo assumes that having a new list in each call is desirable. However, there are cases in which you might wish to intentionally pass a mutable object. One such situation is when writing recursive functions, which require that an object’s state is saved from one call to the next. The following implementation of the depth-first search (DFS) algorithm is an example of one such case:

foo此定义假定在每个调用中都有一个新列表是可取的。 但是,在某些情况下,您可能希望故意传递可变对象。 这种情况的一种情况是编写递归函数时,这要求将对象的状态从一次调用保存到另一次调用。 深度优先搜索(DFS)算法的以下实现是这种情况的一个示例:

def dfs(graph, node, path=[]):"""dfs from a given node"""path += [node]
for neighbor in graph[node]:
if neighbor not in path:
path = dfs(graph, neighbor, path)
return path>>> graph = {'A': ['B', 'C'], 'B': ['D', 'E'], 'C': ['G'],
'D': ['F'], 'E': [], 'F': [], 'G': []}
>>> print(dfs(graph, 'A')) # give the path starting from node 'A'
['A', 'B', 'D', 'F', 'E', 'C', 'G']

5.遍历列表时修改列表 (5. Modifying a List While Iterating Over It)

I’ve already discussed this last “gotcha” in my bite-sized recipes post, but since I’ve seen several people fall into this trap firsthand, I’ll mention it here too.

我已经在我一口吃不完的食谱中讨论了最后一个“陷阱”,但是由于我已经看到几个人直接陷入这个陷阱,因此我在这里也要提到。

Assume you want to remove all the numbers less than 5 from a list.

假设您要从列表中删除所有小于5的数字。

Wrong Implementation: Remove the elements while iterating!

错误的实现:迭代时删除元素!

nums = [1, 2, 3, 5, 6, 7, 0, 1]
for ind, n in enumerate(nums):
if n < 5:
del(nums[ind])# expected: nums = [5, 6, 7]
>>> nums
[2, 5, 6, 7, 1]

Correct Implementation:

正确的实现:

Use list comprehension to create a new list containing only the elements you want:

使用列表推导创建一个仅包含所需元素的新列表:

>>> id(nums)  # before modification 
2090656472968
>>> nums = [n for n in nums if n >= 5]
>>> nums
[5, 6, 7]
>>> id(nums) # after modification
2090656444296

You can see above that id(nums) is checked before and after to show that in fact, the two lists are different. So, if the list is used in other places and it’s important to mutate the existing list, rather than creating a new one with the same name, assign it to the slice:

您可以在上面看到id (nums)之前和之后进行检查,以表明实际上这两个列表是不同的。 因此,如果该列表在其他地方使用,并且对现有列表进行突变很重要,而不是创建一个具有相同名称的新列表,则将其分配给分片:

>>> nums = [1, 2, 3, 5, 6, 7, 0, 1]
>>> id(nums) # before modification
2090656472008
>>> nums[:] = [n for n in nums if n >= 5]
>>> id(nums) # after modification
2090656472008

I hope you find this blog useful. I can be reached on Twitter and LinkedIn and I welcome any feedback.

我希望这个博客对您有用。 可以在 Twitter LinkedIn上 与我联系,我们欢迎任何反馈。

翻译自: https://towardsdatascience.com/five-python-gotchas-3073145fe083

python 陷阱

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值