import pandas as pd
1.duplicated 保留重复值
源码默认标记重复的第一个为不重复第,duplicated(keep=‘first‘)
# duplicated 标记重复值,若想第一次出现和最后一次出现不标记那么在参数keep填充相应的参数,如果想标记全部出现的重复值,那么keep=False
animals = pd.Series([‘lama‘, ‘cow‘, ‘lama‘, ‘beetle‘, ‘lama‘])
animals1 = animals.duplicated(keep=‘first‘)
print(animals1)
animals2 = animals.duplicated(keep=‘last‘)
print(animals2)
animals3 = animals.duplicated(keep= False)
print(animals3)2.
drop_duplicates 去除重复值
源码默认保留第一个,可用inplace 直接修改数据源drop_duplicates(keep=‘first‘, inplace=False)
# drop_duplicates 去除重复值,若想保留第一次出现或者保留最后一次出现,那么在参数keep填充相应的参数animals_d1 = animals.drop_duplicates(keep=‘first‘)
print(animals_d1)
animals_d2 = animals.drop_duplicates(keep=‘last‘)
print(animals_d2)
原文:https://www.cnblogs.com/lgyxta/p/13293056.html