python删除数据框中的字符串列,Python / Pandas:从列表中的字符串匹配中删除数据框中的行...

该博客介绍了如何在导入CSV文件到Pandas DataFrame后,通过查找并删除包含特定列表中子串的行。提供了两种方法:一是使用`isin()`结合否定逻辑;二是利用正则表达式通过`str.contains()`来过滤数据。作者建议在DataFrame中进行后处理过滤,因为这样可能更高效,而非在读取时逐行过滤。
摘要由CSDN通过智能技术生成

I have a .csv file of contact information that I import as a pandas data frame.

>>> import pandas as pd

>>>

>>> df = pd.read_csv('data.csv')

>>> df.head()

fName lName email title

0 John Smith jsmith@gmail.com CEO

1 Joe Schmo jschmo@business.com Bagger

2 Some Person some.person@hotmail.com Clerk

After importing the data, I'd like to drop rows where one field contains one of several substrings in a list. For example:

to_drop = ['Clerk', 'Bagger']

for i in range(len(df)):

for k in range(len(to_drop)):

if to_drop[k] in df.title[i]:

# some code to drop the rows from the data frame

df.to_csv("results.csv")

What is the preferred way to do this in Pandas? Should this even be a post-processing step, or is it preferred to filter this prior to writing to the data frame in the first place? My thought was that this would be easier to manipulate once in a data frame object.

解决方案

Use isin and pass your list of terms to search for you can then negate the boolean mask using ~ and this will filter out those rows:

In [6]:

to_drop = ['Clerk', 'Bagger']

df[~df['title'].isin(to_drop)]

Out[6]:

fName lName email title

0 John Smith jsmith@gmail.com CEO

Another method is to join the terms so it becomes a regex and use the vectorised str.contains:

In [8]:

df[~df['title'].str.contains('|'.join(to_drop))]

Out[8]:

fName lName email title

0 John Smith jsmith@gmail.com CEO

IMO it will be easier and probably faster to perform the filtering as a post processing step because if you decide to filter whilst reading then you are iteratively growing the dataframe which is not efficient.

Alternatively you can read the csv in chunks, filter out the rows you don't want and append the chunks to your output csv

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值