1. 使用场景
使用场景,DataFrame样本数据中有重复值的情况
duplicated()函数检测重复的行
drop_duplicated()函数删除重复的行
一般是对行进行操作,列一般不会重复
2. 源数据
def make_df(indexs, columns):
data = [[str(j) + str(i) for j in columns] for i in indexs]
df = pd.DataFrame(data=data, index=indexs, columns=columns)
return df
df = make_df([1, 2, 3, 4], list('ABCD'))
df.iloc[0] = df.iloc[1]
3. duplicated()【函数检测重复的行】
print(df.duplicated())
print("=====================================================")
print(df.duplicated(keep=False))
df.iloc[0, 3] = 'DDD'
print(df)
print(df.duplicated(subset=['A', 'B', 'C']))
4.drop_duplicated()【删除重复的行】
df.drop_duplicates(subset=['A', 'B', 'C'])
df.drop_duplicates(subset=['A', 'B', 'C'], keep='last')