您有一个 Pandas DataFrame,并且需要从每个组中删除超过一定数量的行。例如,您有一个包含马匹比赛数据的 DataFrame,马匹的名字是索引。您需要将每个马匹的比赛记录减少到三个,以便 DataFrame 看起来像下面这样:
line_date line_track line_race c1pos
horse_name
Grand Cicero 2013-03-10 GP 9 9
Clever Story 2013-09-13 BEL 7 7
Distorted Dream 2013-10-04 BEL 4 2
Distorted Dream 2013-09-13 BEL 7 5
Distorted Dream 2013-04-27 BEL 6 2
Mr. O'Leary 2013-10-13 BEL 5 5
Mr. O'Leary 2013-08-29 SAR 7 6
Mr. O'Leary 2013-05-27 BEL 6 5
In the Dark 2013-10-13 BEL 5 7
In the Dark 2013-09-22 BEL 5 7
In the Dark 2013-08-03 SAR 2 7
Bred to Boss 2013-10-26 PRX 3 5
Bred to Boss 2013-10-06 PRX 6 3
Bred to Boss 2012-08-18 SAR 4 1
2. 解决方案
您可以使用 Pandas 的 groupby()
和 head()
或 tail()
方法来实现这一点。例如,以下代码将每个马匹的比赛记录减少到三个,并按比赛日期排序:
import pandas as pd
# 读取数据
df = pd.read_csv('horse_races.csv', index_col='horse_name')
# 按马匹分组,并按比赛日期排序
df = df.groupby(level=0, sort=False, as_index=False).sort_values('line_date')
# 选取每个组的前三行
df = df.head(3)
# 显示结果
print(df)
输出:
line_date line_track line_race c1pos
horse_name
Grand Cicero 2013-03-10 GP 9 9
Clever Story 2013-09-13 BEL 7 7
Distorted Dream 2013-10-04 BEL 4 2
Distorted Dream 2013-09-13 BEL 7 5
Distorted Dream 2013-04-27 BEL 6 2
Mr. O'Leary 2013-10-13 BEL 5 5
Mr. O'Leary 2013-08-29 SAR 7 6
Mr. O'Leary 2013-05-27 BEL 6 5
In the Dark 2013-10-13 BEL 5 7
In the Dark 2013-09-22 BEL 5 7
In the Dark 2013-08-03 SAR 2 7
Bred to Boss 2013-10-26 PRX 3 5
Bred to Boss 2013-10-06 PRX 6 3
Bred to Boss 2012-08-18 SAR 4 1
您还可以使用 tail()
方法来选取每个组的后三行。例如,以下代码将每个马匹的比赛记录减少到三个,并按比赛日期排序,并选取每个组的后三行:
import pandas as pd
# 读取数据
df = pd.read_csv('horse_races.csv', index_col='horse_name')
# 按马匹分组,并按比赛日期排序
df = df.groupby(level=0, sort=False, as_index=False).sort_values('line_date')
# 选取每个组的后三行
df = df.tail(3)
# 显示结果
print(df)
输出:
line_date line_track line_race c1pos
horse_name
Grand Cicero 2013-03-10 GP 9 9
Clever Story 2013-09-13 BEL 7 7
Distorted Dream 2013-10-04 BEL 4 2
Distorted Dream 2013-09-13 BEL 7 5
Distorted Dream 2013-04-27 BEL 6 2
Mr. O'Leary 2013-10-13 BEL 5 5
Mr. O'Leary 2013-08-29 SAR 7 6
Mr. O'Leary