pandas 实现 in 和 not in 的用法及心得

最新推荐文章于 2025-02-08 19:57:03 发布

Ch3nnn

最新推荐文章于 2025-02-08 19:57:03 发布

阅读量3.1w

点赞数 14

分类专栏： Pandas数据分析

本文链接：https://blog.csdn.net/weixin_43064185/article/details/91374033

版权

Pandas数据分析专栏收录该内容

21 篇文章

订阅专栏

本文介绍如何使用Pandas的isin方法实现SQL中的IN和NOT IN功能，通过实例展示了如何筛选DataFrame中包含或不包含特定值的行，提供了一种更简洁高效的数据处理方式。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

经常在处理数据中从一个总数据中清洗出数据, 但是有时候需要把没有处理的数据也统计出来.

这时候就需要使用:

pandas.DataFrame.isin

DataFrame中的每个元素是否都包含在值中

pandas文档位置

例子:

如何实现SQL的等价物IN和NOT IN?

我有一个包含所需值的列表。下面是一个场景：

df = pd.DataFrame({'countries':['US','UK','Germany','China']})
countries = ['UK','China']

# pseudo-code:
df[df['countries'] not in countries]

之前的做法是这样:

df = pd.DataFrame({'countries':['US','UK','Germany','China']})
countries = pd.DataFrame({'countries':['UK','China'], 'matched':True})

# IN
df.merge(countries,how='inner',on='countries')

# NOT IN
not_in = df.merge(countries,how='left',on='countries')
not_in = not_in[pd.isnull(not_in['matched'])]

但上面这样做觉得很不好, 也翻了文档才找到比较好解决方式.

# IN
something.isin(somewhere)

# NOT IN
~something.isin(somewhere)

例子:

>>> df
  countries
0        US
1        UK
2   Germany
3     China
>>> countries
['UK', 'China']
>>> df.countries.isin(countries)
0    False
1     True
2    False
3     True
Name: countries, dtype: bool
>>> df[df.countries.isin(countries)]
  countries
1        UK
3     China
>>> df[~df.countries.isin(countries)]
  countries
0        US
2   Germany