python的drop duplicates_Python Dataframe有条件地drop_duplicates(Python Dataframe conditionally drop_dupli...

最新推荐文章于 2020-12-11 11:11:07 发布

weixin_39874379

最新推荐文章于 2020-12-11 11:11:07 发布

阅读量532

点赞数

文章标签： python的drop duplicates

Python Dataframe有条件地drop_duplicates(Python Dataframe conditionally drop_duplicates)

我想根据列的值类型删除数据帧的重复行。例如，我的数据框是：

A B

3 4

3 5

yes 8

no 8

yes 8

如果df['A']是一个数字，我想drop_duplicates() 。

如果df['A']是一个字符串，我想保留重复项。

所以期望的结果是：

A B

3 4

3 5

yes 8

no 8

yes 8

除了使用for循环外，还有Pythonic方法吗？谢谢！

I want to drop duplicated rows for a dataframe, based on the type of values of a column. For example, my dataframe is:

A B

3 4

3 5

yes 8

no 8

yes 8

If df['A'] is a number, I want to drop_duplicates().

If df['A'] is a string, I want to keep the duplicates.

So the desired result would be:

A B

3 4

3 5

yes 8

no 8

yes 8

Besides using for loops, is there a Pythonic way to do that? thanks!

原文：https://stackoverflow.com/questions/33239863

2020-06-11 22:06

满意答案

创建新列C ：如果A列是数字，则在C指定公共值，否则在C指定唯一值。

在那之后，只需drop_duplicates正常。

注意：有一个很好的isnumeric()方法用于测试单元格是否类似数字。

In [47]:

df['C'] = np.where(df.A.str.isnumeric(), 1, df.index)

print df

A B C

0 3 4 1

1 3 4 1

2 3 5 1

3 yes 8 3

4 no 8 4

5 yes 8 5

In [48]:

print df.drop_duplicates()[['A', 'B']] #reset index if needed

A B

0 3 4

2 3 5

3 yes 8

4 no 8

5 yes 8

Create a new column C: if A columns is numeric, assign a common value in C, otherwise assign a unique value in C.

After that, just drop_duplicates as normal.

Note: there is a nice isnumeric() method for testing if a cell is number-like.

In [47]:

df['C'] = np.where(df.A.str.isnumeric(), 1, df.index)

print df

A B C

0 3 4 1

1 3 4 1

2 3 5 1

3 yes 8 3

4 no 8 4

5 yes 8 5

In [48]:

print df.drop_duplicates()[['A', 'B']] #reset index if needed

A B

0 3 4

2 3 5

3 yes 8

4 no 8

5 yes 8

2015-10-20

相关问答

只能用索引切片的方式如根据索引对data进行删除重复行，并保留第一个数据： data_unique = data[~data.index.duplicated()]

您可以使用str.replace删除D字符串的结束数字部分： In [204]: df['D'] = df['D'].str.replace(r'_[0-9]+$', '')

In [205]: df

Out[205]:

A B C D

0 1 A_Task WID WI_DTL

1 1 A_adhoc_load ATT IXN

2 1 A_ad...

熊猫版本0.20.3 python 3.6。当我运行这段代码时： df.drop_duplicates(['b', 'D'])

有 KeyError：'b' 在你的例子中，第4行是奇怪的情况。第一 df.loc[4,'B'] = 87

丢弃重复后： df.loc[4,'B'] = 82

看起来你在这些步骤之间有一些额外的操作。 Pandas version 0.20.3 python 3.6. When I run this line of code: df.drop_duplicate...

如果你有一个数据帧 df = pd.DataFrame([['a',np.nan,np.nan,'M'],['a',12,np.nan,'M'],['c',np.nan,np.nan,'M'],['d',np.nan,np.nan,'M']],columns=['Name','Age','Region','Gender'])

基于nan计数对行进行排序，并通过保留第一个来删除具有子集'Name'的重复项可能有帮助 df['count'] = pd.isnull(df).sum(1)

df= df...

如果您不希望在检查不同记录时考虑索引列，则可以使用以下命令删除列，或仅选择所需的列。 df = df.drop('p_index') // Pass column name to be dropped

df = df.select('name', 'age') // Pass the required columns

drop_duplicates()是dropDuplicates()的别名。 https://spark.apache.org/docs/latest/api/python/py...

df.drop_duplicates(subset=['Variable','Value'],keep='first')

# time Variable Value

#2014-07-11 19:50:20 Var1 10

#2014-07-11 19:50:30 Var1 20

#2014-07-11 19:50:50 Var2 30

#2014-07-11 19:50:60 Var2 40

...

你可以使用duplicated df[~df.iloc[:,:-1].duplicated()]

Out[53]:

0 1 2 3 4 5 6

0 0 12 1 99 23 2 75

2 5 12 1 99 23 2 66

You can using duplicated df[~df.iloc[:,:-1].duplicated()]

Out[53]:

0 1 2 3 4 5 6

0 0 12 1 99...

将group_keys=False传递给groupby ： In [273]:

df.groupby(level='first', group_keys=False).apply(lambda d: d.drop_duplicates())

Out[273]:

0 1

first second

bar one 1 1

baz one 1 1

foo one 1 1

qux one 1 1

Pass...

创建新列C ：如果A列是数字，则在C指定公共值，否则在C指定唯一值。在那之后，只需drop_duplicates正常。注意：有一个很好的isnumeric()方法用于测试单元格是否类似数字。 In [47]:

df['C'] = np.where(df.A.str.isnumeric(), 1, df.index)

print df

A B C

0 3 4 1

1 3 4 1

2 3 5 1

3 yes 8 3

4 no 8 4

5...

将drop_duplicates与参数subset一起使用： df.drop_duplicates(subset=['A'],inplace=True)

print (df)

A B

0 John Miller

2 Mark Robinson

3 Jeffrey Robinson

文档： subset ：列标签或标签序列，可选仅考虑用于标识重复项的某些列，默认情况下使用所有列 Use drop_duplicates with par...

Python 编程语言具有很高的灵活性，它支持多种编程方法，包括过程化的、面向对象的和函数式的。但最重

...

python2和python3的区别，1.性能 Py3.0运行 pystone benchmark的速

...

Python的文件类型 Python有三种文件类型，分别是源代码文件、字节码文件和优化代码文件

源代

...

python的官网：http://www.python.org/ 有两个版本，就像struts1和st

...

好久没有写了，还不是近期刚过的期末考试和期中考试最近因为一个微信公众平台大赛在学phthon 找了本

...

weixin_39874379

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python的drop duplicates_Python Dataframe有条件地drop_duplicates(Python Dataframe conditionally drop_dupli...

Python Dataframe有条件地drop_duplicates(Python Dataframe conditionally drop_duplicates)我想根据列的值类型删除数据帧的重复行。例如，我的数据框是：A B3 43 43 5yes 8no 8yes 8如果df['A']是一个数字，我想drop_duplicates() 。如果df['A']...
复制链接

扫一扫

python的drop duplicates_Python Dataframe有条件地drop_duplicates(Python Dataframe conditionally drop_dupli...

“相关推荐”对你有帮助么？