函数下午茶（4）：检测与处理重复值_检测重复值的函数-CSDN博客

本文链接：https://blog.csdn.net/qq_45473330/article/details/112968129

函数下午茶（4）：检测与处理重复值

DataFrame.drop_duplicates()函数

介绍
pandas提供了⼀个名为drop_duplicates的去重⽅法。该⽅法只对DataFrame或者Series类型有效。这
种⽅法不会改变数据原始排列，并且兼具代码简洁和运⾏稳定的特点。该⽅法不仅⽀持单⼀特征的数据
去重，还能够依据DataFrame的其中⼀个或者⼏个特征进⾏去重操作。

pandas.DataFrame(Series).drop_duplicates(self, subset=None, keep='first', inplace=False)

参数说明

参数名称	⽤途
Subset	接收string或sequence。表示进⾏去重的列。默认为None，表示全部列。
Keep	接受特定字符串{‘first’, ‘last’,False}。表示重复时保留第⼏个数据。first：保留第⼀个。last：保留最后⼀个。False：只要有重复都不保留。默认为first。
Inplace	接收boolean。表示是否在原表上进⾏操作。默认为False。

举例

1 #产⽣数据源
2 df = pd.DataFrame({ 
3 'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'], 
4 'style': ['cup', 'cup', 'cup', 'pack', 'pack'], 
5 'rating': [4, 4, 3.5, 15, 5] }) 
6 df 
7 >> brand style rating 
8 0 Yum Yum cup 4.0 
9 1 Yum Yum cup 4.0 
10 2 Indomie cup 3.5 
11 3 Indomie pack 15.0 
12 4 Indomie pack 5.0 
13 #返回去掉重复值后的数据 
14 df.drop_duplicates() 
15 >> brand style rating 
16 0 Yum Yum cup 4.0 
17 2 Indomie cup 3.5 
18 . 3 Indomie pack 15.0 
19 4 Indomie pack 5.0 
20 .
21 #删除brand列的重复值，默认保存第⼀个值
22 df.drop_duplicates(subset=['brand']) 
23 >> brand style rating 
24 0 Yum Yum cup 4.0 
25 2 Indomie cup 3.5 
26 #删除brand、style列的重复值，保存最后出现值。 
27 df.drop_duplicates(subset=['brand', 'style'], keep='last') 
28 >> brand style rating 
29 1 Yum Yum cup 4.0 
30 2 Indomie cup 3.5 
31 4 Indomie pack 5.0