pandas 如何删掉第一行_删除特定值之前的第一行-Pandas

最新推荐文章于 2024-08-26 16:32:27 发布

weixin_39933484

最新推荐文章于 2024-08-26 16:32:27 发布

阅读量422

点赞数

文章标签： pandas 如何删掉第一行

本文链接：https://blog.csdn.net/weixin_39933484/article/details/111531322

版权

I am trying to remove all rows before an initial value for a group. For instance, if my max_value = 250, then all rows for a group before that value should be removed. If a consequtive value of 250 or less appears again for that group, it is not removed.

import pandas as pd

df = pd.DataFrame({

'date': ['2019-01-01','2019-02-01','2019-03-01', '2019-04-01',

'2019-01-01','2019-02-01','2019-03-01', '2019-04-01',

'2019-01-01','2019-02-01','2019-03-01', '2019-04-01'],

'Asset': ['Asset A', 'Asset A', 'Asset A', 'Asset A', 'Asset A', 'Asset A', 'Asset B', 'Asset B',

'Asset B', 'Asset B', 'Asset B', 'Asset B'],

'Monthly Value': [100, 200, 300, 400, 500, 600, 100, 200, 300, 200, 300, 200]

})

unique_list = list(df['Asset'].unique())

max_value = 250

print(df)

date Asset Monthly Value

0 2019-01-01 Asset A 100

1 2019-02-01 Asset A 200

2 2019-03-01 Asset A 300

3 2019-04-01 Asset A 400

4 2019-01-01 Asset A 500

5 2019-02-01 Asset A 600

6 2019-03-01 Asset B 100

7 2019-04-01 Asset B 200

8 2019-01-01 Asset B 300

9 2019-02-01 Asset B 200

10 2019-03-01 Asset B 300

11 2019-04-01 Asset B 200

if the threshold or max_value is 250, then the dataframe should look like this (below). Notice the first time a value under 250 is detected for a group, all of those rows are removed. If the value 250 or higher is shown again, it is kept. Any help would be appreciated.

date Asset Monthly Value

2 2019-03-01 Asset A 300

3 2019-04-01 Asset A 400

4 2019-01-01 Asset A 500

5 2019-02-01 Asset A 600

8 2019-01-01 Asset B 300

9 2019-02-01 Asset B 200

10 2019-03-01 Asset B 300

11 2019-04-01 Asset B 200

解决方案

This should do the trick:

df[df.groupby('Asset')['Monthly Value'].apply(lambda x: x.gt(max_value).cumsum().ne(0))]

Yields:

date Asset Monthly Value

2 2019-03-01 Asset A 300

3 2019-04-01 Asset A 400

4 2019-01-01 Asset A 500

5 2019-02-01 Asset A 600

8 2019-01-01 Asset B 300

9 2019-02-01 Asset B 200

10 2019-03-01 Asset B 300

11 2019-04-01 Asset B 200

Additionally, if you store your max values in a dictionary like max_value = {'Asset A': 250, 'Asset B': 250}, you can do the following to achieve the same result:

df[df.groupby('Asset')['Monthly Value'].apply(lambda x: x.gt(max_value[x.name]).cumsum().ne(0))]