python 判断两列有重复数据,根据另一列Python，Pandas中的值，删除一列的重复

最新推荐文章于 2022-04-30 18:57:12 发布

李大雷

最新推荐文章于 2022-04-30 18:57:12 发布

阅读量433

点赞数

文章标签： python 判断两列有重复数据

I have a dataframe like this:

Date PlumeO Distance

2014-08-13 13:48:00 754.447905 5.844577

2014-08-13 13:48:00 754.447905 6.888653

2014-08-13 13:48:00 754.447905 6.938860

2014-08-13 13:48:00 754.447905 6.977284

2014-08-13 13:48:00 754.447905 6.946430

2014-08-13 13:48:00 754.447905 6.345506

2014-08-13 13:48:00 754.447905 6.133567

2014-08-13 13:48:00 754.447905 5.846046

2014-08-13 16:59:00 754.447905 6.345506

2014-08-13 16:59:00 754.447905 6.694847

2014-08-13 16:59:00 754.447905 5.846046

2014-08-13 16:59:00 754.447905 6.977284

2014-08-13 16:59:00 754.447905 6.938860

2014-08-13 16:59:00 754.447905 5.844577

2014-08-13 16:59:00 754.447905 6.888653

2014-08-13 16:59:00 754.447905 6.133567

2014-08-13 16:59:00 754.447905 6.946430

I'm trying to keep the date with the smallest distance, so drop the duplicates dates and keep the with the smallest distance.

Is there a way to achieve this in pandas' df.drop_duplicates or am I stuck using if statements to find the smallest distance?

解决方案

Sort by distances and drop by dates:

df.sort_values('Distance').drop_duplicates(subset='Date', keep='first')

Out:

Date PlumeO Distance

0 2014-08-13 13:48:00 754.447905 5.844577

13 2014-08-13 16:59:00 754.447905 5.844577

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

关注关注