pandas.Series.str.contains查询每行是否包含指定的字符串

实际上df的也有str.contains,代码如下:

def fillarea(area='PLMN=1,AREA=2', path=r'C:\Users\Administrator\Desktop\20181119\小时报表',
             columnname='网元', saveto=r'C:\Users\Administrator\Desktop\20181119\perhour'):

    # 去除首位空格
    path = path.strip()
    saveto = saveto.strip()
    # 去除尾部 \ 符号
    path = path.rstrip("\\")
    saveto = saveto.rstrip("\\")

    # 读取路径不存在时,退出程序
    if not os.path.exists(path):
        return -1

    # 保存路径不存在时,创建保存路径
    if not os.path.exists(saveto):
        # 多层创建目录
        os.makedirs(saveto)

    # 遍历读取路径中的全部文件
    for filename in os.listdir(path):
        print(filename)
        abspath = os.path.join(path, filename)
        print(abspath)
        savepath = os.path.join(saveto, filename[:-3])
        # wb rb 的 b是二进制,wt rt 的 t是text
        # 因为此csv文件并非二进制文件, 只是一个文本文件。所以用wt rt不用rb wb,
        # 但是:pandas 要用 rb格式,csv.reader要用rt格式,否则报错
        with gzip.open(abspath, 'rb') as f_in, \
                gzip.open(abspath, 'rt') as f_in2, \
                open(savepath, 'wt', newline='') as f_out:
            df = pd.read_csv(f_in, encoding='GBK', header=3, low_memory=False)
            df2 = df[df[columnname].str.contains(area)]
            # 打开csv文件,用于读取文件头3行的零散字段
            csv1 = csv.reader(f_in2)
            i = 0
            rows = []
            # 循环读取前三行
            for row in csv1:
                if i < 3:
                    print(row)
                    rows.append(row)
                else:
                    break
                i += 1
            # 第四行加入表头
            rows.append(df2.columns.tolist())
            # 从第五行开始,依次增加数据,例如:小区表数据多,需要一行一行的加
            for datarow in df2.values.tolist():
                rows.append(datarow)
            writer = csv.writer(f_out)
            writer.writerows(rows)

其中:

df2 = df[df[columnname].str.contains(area)]  就是选择包含area变量中的字符串的行。

 

 

 

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.contains.html

 

Series.str.contains(patcase=Trueflags=0na=nanregex=True)[source]

Test if pattern or regex is contained within a string of a Series or Index.

Return boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.

Parameters:

pat : str

Character sequence or regular expression.

case : bool, default True

If True, case sensitive.

flags : int, default 0 (no flags)

Flags to pass through to the re module, e.g. re.IGNORECASE.

na : default NaN

Fill value for missing values.

regex : bool, default True

If True, assumes the pat is a regular expression.

If False, treats the pat as a literal string.

Returns:

Series or Index of boolean values

A Series or Index of boolean values indicating whether the given pattern is contained within the string of each element of the Series or Index.

See also

match

analogous, but stricter, relying on re.match instead of re.search

Examples

Returning a Series of booleans using only a literal pattern.

>>> s1 = pd.Series(['Mouse', 'dog', 'house and parrot', '23', np.NaN])
>>> s1.str.contains('og', regex=False)
0    False
1     True
2    False
3    False
4      NaN
dtype: object

Returning an Index of booleans using only a literal pattern.

>>> ind = pd.Index(['Mouse', 'dog', 'house and parrot', '23.0', np.NaN])
>>> ind.str.contains('23', regex=False)
Index([False, False, False, True, nan], dtype='object')

Specifying case sensitivity using case.

>>> s1.str.contains('oG', case=True, regex=True)
0    False
1    False
2    False
3    False
4      NaN
dtype: object

Specifying na to be False instead of NaN replaces NaN values with False. If Series or Index does not contain NaN values the resultant dtype will be bool, otherwise, an object dtype.

>>> s1.str.contains('og', na=False, regex=True)
0    False
1     True
2    False
3    False
4    False
dtype: bool

Returning ‘house’ and ‘parrot’ within same string.

>>> s1.str.contains('house|parrot', regex=True)
0    False
1    False
2     True
3    False
4      NaN
dtype: object

Ignoring case sensitivity using flags with regex.

>>> import re
>>> s1.str.contains('PARROT', flags=re.IGNORECASE, regex=True)
0    False
1    False
2     True
3    False
4      NaN
dtype: object

Returning any digit using regular expression.

>>> s1.str.contains('\d', regex=True)
0    False
1    False
2    False
3     True
4      NaN
dtype: object

Ensure pat is a not a literal pattern when regex is set to True. Note in the following example one might expect only s2[1] and s2[3] to return True. However, ‘.0’ as a regex matches any character followed by a 0.

>>> s2 = pd.Series(['40','40.0','41','41.0','35'])
>>> s2.str.contains('.0', regex=True)
0     True
1     True
2    False
3     True
4    False
dtype: bool

 

  • 9
    点赞
  • 24
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值