之前写的脚本是过滤所有行中只包含一个字段的:
area='AREA=2'
df2 = df[df[columnname].str.contains(area)]
过滤前:
过滤后:
但如果想同时过滤包含不同字段的行,就得改成正则表达式的写法:
area='AREA=5|AREA=6|AREA=7|AREA=8'
df2 = df[df[columnname].str.contains(area, regex=True)]
过滤前:
过滤后:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.contains.html
Series.str.
contains
(pat, case=True, flags=0, na=nan, regex=True)[source]
regex : bool, default True
If True, assumes the pat is a regular expression. 设为True,表示将pat 参数看做正则表达式
If False, treats the pat as a literal string. 设为False,表示将pat 参数看做字符串
Returning ‘house’ and ‘parrot’ within same string.
>>> s1.str.contains('house|parrot', regex=True)
0 False
1 False
2 True
3 False
4 NaN
dtype: object