python df删除特定行,如何在python中删除组中的某些行

最新推荐文章于 2024-04-19 09:02:53 发布

超级队长汉勤

最新推荐文章于 2024-04-19 09:02:53 发布

阅读量1.5k

点赞数

文章标签： python df删除特定行

I'm having a dataframe and I'd like to do a groupby() based a column and then sort the values within each group based on a date column. Then, from a each I'd like to remove records whose value for column_condition == 'B' until I reach to a row whose column_condition == 'A'. For example, Assume the table below is one of the groups

ID, DATE, column_condition

--------------------------

1, jan 2017, B

1, Feb 2017, B

1, Mar 2017, B

1, Aug 2017, A

1, Sept 2017, B

So, I'd like to remove the first three rows and leave this group with only the last two rows. How can I do that?

解决方案

I think I finally understand your question: you wish to groupby a dataframe by 'ID', sort by date, and keep the rows after the first ocurrence of 'A' in your condition column. I've come up with the following one liner solution:

Setting up dummy data

import pandas as pd

import datetime as dt

d = {

'ID': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2], # Assuming only two unique IDs for simplicity

'DATE': [ # Dates already sorted, but it would work anyways

dt.date(2018, 7, 19), dt.date(2018, 8, 18),

dt.date(2018, 9, 17), dt.date(2018, 10, 17),

dt.date(2018, 11, 16), dt.date(2018, 7, 19),

dt.date(2018, 8, 18), dt.date(2018, 9, 17),

dt.date(2018, 10, 17), dt.date(2018, 11, 16)

'condition': ['B', 'B', 'B', 'A', 'B', 'B', 'B', 'B', 'A', 'B']

}

# 'DATE' but with list comprehension:

# [dt.date.today() + dt.timedelta(days=30*x) for y in range(0, 2) for x in range(0, 5)]

df = pd.DataFrame(d)

Interpreter

>>> (df.sort_values(by='DATE') # we should call pd.to_datetime() first if...

... .groupby('ID') # 'DATE' is not datetime already

... .apply(lambda x: x[(x['condition'].values == 'A').argmax():]))

ID DATE condition

1 3 1 2018-10-17 A

4 1 2018-11-16 B

2 8 2 2018-10-17 A

9 2 2018-11-16 B

You can also call reset_index(drop=True), if you need something like this:

ID DATE condition

0 1 2018-10-17 A

1 1 2018-11-16 B

2 2 2018-10-17 A

3 2 2018-11-16 B

(x['condition'].values == 'A') returns a bool np.array, and calling argmax() gives us then index where the first ocurrence of True happens (where condition == 'A' in this case). Using that index, we're subsetting each of the groups with a slice.

EDIT: Added filter for dealing with groups that only contain the undesired condition.

d = {

'ID': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2], # Assuming only two unique IDs for simplicity

'DATE': [ # Dates already sorted, but it would work anyways

dt.date(2018, 7, 19), dt.date(2018, 8, 18),

dt.date(2018, 9, 17), dt.date(2018, 10, 17),

dt.date(2018, 11, 16), dt.date(2018, 7, 19),

dt.date(2018, 8, 18), dt.date(2018, 9, 17),

dt.date(2018, 10, 17), dt.date(2018, 11, 16)

], # ID 1 only contains 'B'

'condition': ['B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'A', 'B']

}

df = pd.DataFrame(d)

Interpreter

>>> df

ID DATE condition

0 1 2018-07-19 B

1 1 2018-08-18 B

2 1 2018-09-17 B

3 1 2018-10-17 B

4 1 2018-11-16 B

5 2 2018-07-19 B

6 2 2018-08-18 B

7 2 2018-09-17 B

8 2 2018-10-17 A

9 2 2018-11-16 B

>>> (df.sort_values(by='DATE')

... .groupby('ID')

... .filter(lambda x: (x['condition'] == 'A').any())

... .groupby('ID')

... .apply(lambda x: x[(x['condition'].values == 'A').argmax():]))

ID DATE condition

2 8 2 2018-10-17 A

9 2 2018-11-16 B

超级队长汉勤

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python df删除特定行,如何在python中删除组中的某些行

I'm having a dataframe and I'd like to do a groupby() based a column and then sort the values within each group based on a date column. Then, from a each I'd like to remove records whose value for col...
复制链接

扫一扫