I am trying to remove all rows before an initial value for a group. For instance, if my max_value = 250, then all rows for a group before that value should be removed. If a consequtive value of 250 or less appears again for that group, it is not removed.
import pandas as pd
df = pd.DataFrame({
'date': ['2019-01-01','2019-02-01','2019-03-01', '2019-04-01',
'2019-01-01','2019-02-01','2019-03-01', '2019-04-01',
'2019-01-01','2019-02-01','2019-03-01', '2019-04-01'],
'Asset': ['Asset A', 'Asset A', 'Asset A', 'Asset A', 'Asset A', 'Asset A', 'Asset B', 'Asset B',
'Asset B', 'Asset B', 'Asset B', 'Asset B'],
'Monthly Value': [100, 200, 300, 400, 500, 600, 100, 200, 300, 200, 300, 200]
})
unique_list = list(df['Asset'].unique())
max_value = 250
print(df)
date Asset Monthly Value
0 2019-01-01 Asset A 100
1 2019-02-01 Asset A 200
2 2019-03-01 Asset A 300
3 2019-04-01 Asset A 400
4 2019-01-01 Asset A 500
5 2019-02-01 Asset A 600
6 2019-03-01 Asset B 100
7 2019-04-01 Asset B 200
8 2019-01-01 Asset B 300
9 2019-02-01 Asset B 200
10 2019-03-01 Asset B 300
11 2019-04-01 Asset B 200
if the threshold or max_value is 250, then the dataframe should look like this (below). Notice the first time a value under 250 is detected for a group, all of those rows are removed. If the value 250 or higher is shown again, it is kept. Any help would be appreciated.
date Asset Monthly Value
2 2019-03-01 Asset A 300
3 2019-04-01 Asset A 400
4 2019-01-01 Asset A 500
5 2019-02-01 Asset A 600
8 2019-01-01 Asset B 300
9 2019-02-01 Asset B 200
10 2019-03-01 Asset B 300
11 2019-04-01 Asset B 200
解决方案
This should do the trick:
df[df.groupby('Asset')['Monthly Value'].apply(lambda x: x.gt(max_value).cumsum().ne(0))]
Yields:
date Asset Monthly Value
2 2019-03-01 Asset A 300
3 2019-04-01 Asset A 400
4 2019-01-01 Asset A 500
5 2019-02-01 Asset A 600
8 2019-01-01 Asset B 300
9 2019-02-01 Asset B 200
10 2019-03-01 Asset B 300
11 2019-04-01 Asset B 200
Additionally, if you store your max values in a dictionary like max_value = {'Asset A': 250, 'Asset B': 250}, you can do the following to achieve the same result:
df[df.groupby('Asset')['Monthly Value'].apply(lambda x: x.gt(max_value[x.name]).cumsum().ne(0))]