pandas.Series.str.contains详解

最新推荐文章于 2025-03-27 16:30:45 发布

沉观

最新推荐文章于 2025-03-27 16:30:45 发布

阅读量2.2w

点赞数 11

文章标签： pandas python 数据分析

本文链接：https://blog.csdn.net/weixin_43484764/article/details/89847241

版权

’’‘Series.str.contains（pat，case = True，flags = 0，na = nan，regex = True)’’'
测试pattern或regex是否包含在Series或Index的字符串中。

返回布尔值系列或索引，具体取决于给定模式或正则表达式是否包含在系列或索引的字符串中。

pat ： str类型
字符序列或正则表达式。

case ： bool，默认为True
如果为True，区分大小写。

flags ： int，默认为0（无标志）
标志传递到re模块，例如re.IGNORECASE。

na ：默认NaN
填写缺失值的值。

regex ： bool，默认为True
如果为True，则假定pat是正则表达式。

如果为False，则将pat视为文字字符串。

返回：
布尔值的系列或索引
布尔值的Series或Index，指示给定模式是否包含在Series或Index的每个元素的字符串中。

实例：


>>> s1 = pd.Series(['Mouse', 'dog', 'house and parrot', '23', np.NaN])
>>> s1.str.contains('og', regex=False)
0    False
1     True
2    False
3    False
4      NaN
dtype: object

使用case指定区分大小写。


>>> s1.str.contains('oG', case=True, regex=True)
0    False
1    False
2    False
3    False
4      NaN
dtype: object

na = True 就表示把有NAN的转换为布尔值True
na = False 就表示把有NAN的转换为布尔值True

>>> s1.str.contains('og', na=False, regex=True)
0    False
1     True
2    False
3    False
4    False
dtype: bool

使用带正则表达式的标志忽略区分大小写。

>>> import re
>>> s1.str.contains('PARROT', flags=re.IGNORECASE, regex=True)
0    False
1    False
2     True
3    False
4      NaN
dtype: object

使用正则表达式返回任何数字。

>>> s1.str.contains('\d', regex=True)
0    False
1    False
2    False
3     True
4      NaN
dtype: object

当regex设置为True 时，确保pat不是文字模式。请注意，在以下示例中，可能只希望s2 [1]和s2 [3]返回True。但是，作为正则表达式的“.0”匹配任何后跟0的字符。

>>> s2 = pd.Series(['40','40.0','41','41.0','35'])
>>> s2.str.contains('.0', regex=True)
0     True
1     True
2    False
3     True
4    False
dtype: bool