系列的真值含糊不清。 使用a.empty,a.bool(),a.item(),a.any()或a.all()

在Pandas中,使用布尔条件过滤数据帧时,可能会遇到'真理值含糊不清'的错误。解决方案包括使用a.empty、a.bool()、a.item()、a.any()或a.all()。文章解释了在不同场景下如何正确使用这些方法,以避免布尔操作的模糊性,并提供了示例代码。
摘要由CSDN通过智能技术生成

本文翻译自:Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

Having issue filtering my result dataframe with an or condition. 在使用or条件过滤我的结果数据帧时出现问题。 I want my result df to extract all column var values that are above 0.25 and below -0.25. 我希望我的结果df提取大于0.25且小于-0.25的所有列var值。

This logic below gives me an ambiguous truth value however it work when I split this filtering in two separate operations. 下面的逻辑为我提供了一个模糊的真实值,但是当我将此过滤分为两个独立的操作时,它可以工作。 What is happening here? 这是怎么回事 not sure where to use the suggested a.empty(), a.bool(), a.item(),a.any() or a.all() . 不知道在哪里使用建议的a.empty(), a.bool(), a.item(),a.any() or a.all()

 result = result[(result['var']>0.25) or (result['var']<-0.25)]

#1楼

参考:https://stackoom.com/question/2uV5l/系列的真值含糊不清-使用a-empty-a-bool-a-item-a-any-或a-all


#2楼

The or and and python statements require truth -values. orand python语句需要truth For pandas these are considered ambiguous so you should use "bitwise" | 对于pandas它们被认为是模棱两可的,因此应使用“按位” | (or) or & (and) operations: (或)或& (和)操作:

result = result[(result['var']>0.25) | (result['var']<-0.25)]

These are overloaded for these kind of datastructures to yield the element-wise or (or and ). 对于此类数据结构,它们会重载以产生按元素or (或and )。


Just to add some more explanation to this statement: 只是为该语句添加更多解释:

The exception is thrown when you want to get the bool of a pandas.Series : 当您想获取pandas.Seriesbool ,抛出该异常:

>>> import pandas as pd
>>> x = pd.Series([1])
>>> bool(x)
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

What you hit was a place where the operator implicitly converted the operands to bool (you used or but it also happens for and , if and while ): 您所击中的是一个运算符将操作数隐式转换为bool (您使用or但也发生在andifwhile ):

>>> x or x
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> x and x
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> if x:
...     print('fun')
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> while x:
...     print('fun')
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Besides these 4 statements there are several python functions that hide some bool calls (like any , all , filter , ...) these are normally not problematic with pandas.Series but for completeness I wanted to mention these. 除了这4条语句外,还有一些python函数可以隐藏一些bool调用(如anyallfilter ,...),这些通常对于pandas.Series不是问题,但是为了完整pandas.Series ,我想提及这些。


In your case the exception isn't really helpful, because it doesn't mention the right alternatives . 在您的情况下,该异常并没有真正的帮助,因为它没有提到正确的替代方法 For and and or you can use (if you want element-wise comparisons): 对于andor可以使用(如果要逐元素比较):

If you're using the operators then make sure you set your parenthesis correctly because of the operator precedence . 如果您使用的是运算符,请确保由于运算符优先级而正确设置了括号。

There are several logical numpy functions which should work on pandas.Series . pandas.Series应该几个逻辑上的numpy函数


The alternatives mentioned in the Exception are more suited if you encountered it when doing if or while . 如果您在进行ifwhile时遇到它,则Exception中提到的替代方法更适合。 I'll shortly explain each of these: 我将在下面简短地解释每个:

  • If you want to check if your Series is empty : 如果要检查您的系列是否为

     >>> x = pd.Series([]) >>> x.empty True >>> x = pd.Series([1]) >>> x.empty False 

    Python normally interprets the len gth of containers (like list , tuple , ...) as truth-value if it has no explicit boolean interpretation. 通常的Python解释len容器的GTH(如listtuple ,...)作为真值,如果它没有明显的布尔解释。 So if you want the python-like check, you could do: if x.size or if not x.empty instead of if x . 因此,如果要进行类似python的检查,则可以执行以下操作: if x.sizeif not x.empty而不是if x

  • If your Series contains one and only one boolean value: 如果您的Series包含一个且只有一个布尔值:

     >>> x = pd.Series([100]) >>> (x > 50).bool() True >>> (x < 50).bool() False 
  • If you want to check the first and only item of your Series (like .bool() but works even for not boolean contents): 如果要检查系列的第一个也是唯一的项目 (如.bool()但即使不是布尔型内容也可以使用):

     >>> x = pd.Series([100]) >>> x.item() 100 
  • If you want to check if all or any item is not-zero, not-empty or not-False: 如果要检查所有任何项目是否为非零,非空或非假:

     >>> x = pd.Series([0, 1, 2]) >>> x.all() # because one element is zero False >>> x.any() # because one (or more) elements are non-zero True 

#3楼

For boolean logic, use & and | 对于布尔逻辑,请使用&| .

np.random.seed(0)
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))

>>> df
          A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
2  0.950088 -0.151357 -0.103219
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863

>>> df.loc[(df.C > 0.25) | (df.C < -0.25)]
          A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863

To see what is happening, you get a column of booleans for each comparison, eg 要查看发生了什么,您可以为每个比较获得一列布尔值,例如

df.C > 0.25
0     True
1    False
2    False
3     True
4     True
Name: C, dtype: bool

When you have multiple criteria, you will get multiple columns returned. 当您有多个条件时,将返回多个列。 This is why the the join logic is ambiguous. 这就是为什么联接逻辑模棱两可的原因。 Using and or or treats each column separately, so you first need to reduce that column to a single boolean value. 分别使用andor对待每列,因此您首先需要将该列减少为单个布尔值。 For example, to see if any value or all values in each of the columns is True. 例如,查看每个列中的任何值或所有值是否为True。

# Any value in either column is True?
(df.C > 0.25).any() or (df.C < -0.25).any()
True

# All values in either column is True?
(df.C > 0.25).all() or (df.C < -0.25).all()
False

One convoluted way to achieve the same thing is to zip all of these columns together, and perform the appropriate logic. 一种实现相同目的的复杂方法是​​将所有这些列压缩在一起,并执行适当的逻辑。

>>> df[[any([a, b]) for a, b in zip(df.C > 0.25, df.C < -0.25)]]
          A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863

For more details, refer to Boolean Indexing in the docs. 有关更多详细信息,请参阅文档中的布尔索引


#4楼

Or, alternatively, you could use Operator module. 或者,您也可以使用操作员模块。 More detailed information is here Python docs 更详细的信息在这里Python文档

import operator
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))
df.loc[operator.or_(df.C > 0.25, df.C < -0.25)]

          A         B         C
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.4438

#5楼

This excellent answer explains very well what is happening and provides a solution. 这个极好的答案很好地解释了正在发生的事情并提供了解决方案。 I would like to add another solution that might be suitable in similar cases: using the query method: 我想添加另一种可能在类似情况下适用的解决方案:使用query方法:

result = result.query("(var > 0.25) or (var < -0.25)")

See also http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-query . 另请参见http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-query

(Some tests with a dataframe I'm currently working with suggest that this method is a bit slower than using the bitwise operators on series of booleans: 2 ms vs. 870 µs) (对我目前正在使用的数据帧进行的一些测试表明,该方法比在一系列布尔值上使用按位运算符要慢一些:2 ms vs. 870 µs)

A piece of warning : At least one situation where this is not straightforward is when column names happen to be python expressions. 警告 :至少其中一种情况并非如此简单,那就是列名恰好是python表达式。 I had columns named WT_38hph_IP_2 , WT_38hph_input_2 and log2(WT_38hph_IP_2/WT_38hph_input_2) and wanted to perform the following query: "(log2(WT_38hph_IP_2/WT_38hph_input_2) > 1) and (WT_38hph_IP_2 > 20)" 我有名为WT_38hph_IP_2WT_38hph_input_2log2(WT_38hph_IP_2/WT_38hph_input_2)并想执行以下查询: "(log2(WT_38hph_IP_2/WT_38hph_input_2) > 1) and (WT_38hph_IP_2 > 20)"

I obtained the following exception cascade: 我获得了以下异常级联:

  • KeyError: 'log2'
  • UndefinedVariableError: name 'log2' is not defined
  • ValueError: "log2" is not a supported function

I guess this happened because the query parser was trying to make something from the first two columns instead of identifying the expression with the name of the third column. 我猜这是因为查询解析器试图从前两列中获取内容,而不是用第三列的名称来标识表达式。

A possible workaround is proposed here . 这里提出一种可能的解决方法。


#6楼

Well pandas use bitwise '&' '|' 好吧熊猫使用按位'&''|' and each condition should be wrapped in a '()' 并且每个条件都应该用'()'包装

For example following works 例如以下作品

data_query = data[(data['year'] >= 2005) & (data['year'] <= 2010)]

But the same query without proper brackets does not 但是没有适当括号的相同查询不会

data_query = data[(data['year'] >= 2005 & data['year'] <= 2010)]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值