集合set()、unique()、nunique()

最新推荐文章于 2024-01-12 17:17:13 发布

喜东东only

最新推荐文章于 2024-01-12 17:17:13 发布

阅读量938

点赞数 20

文章标签：数据库 python pandas

本文链接：https://blog.csdn.net/weixin_50951788/article/details/135361351

版权

文章目录

一、集合 (set)
二、unique()
三、nunique()

一、集合 (set)

集合内置方法完整列表

方法	描述
add()	给集合添加元素
update()	给集合添加元素，参数可以是列表，元组，字典等
clear()	移除集合中的所有元素
copy()	拷贝一个集合
difference()	返回多个集合的差集
discard()	删除集合中指定的元素
intersection()	返回集合的交集
pop()	随机移除元素
remove()	移除指定元素
symmetric_difference()	返回两个集合中不重复的元素集合
union()	返回两个集合的并集
len()	计算集合元素个数

1.set是一个无序且不重复的元素集合

创建方式：

使用大括号 { } 创建集合，元素之间用逗号 “,”分隔
使用 set() 函数创建集合
创建一个空集合必须用 set() 而不是 { }

set1 = {1, 2, 3, 4, 3, 5}            # 直接使用大括号创建集合
set2 = set([4, 5, 6, 6, 7])      # 使用 set() 函数从列表创建集合

2 in set1   
#True
2 in set2
#False

创建空集合

创建一个空集合必须用 set() 而不是 { }

set()

2、集合的交集，并集，差集

交集：集合a和b中都包含了的元素
并集：集合a或b中包含的所有元素
差集：集合a中包含而集合b中不包含的元素

set1 = {1, 2, 3, 4, 3, 5}        
set2 = set([4, 5, 6, 6, 7])  

set1-set2  ##差集： 集合set1中包含而集合set2中不包含的元素
#{1, 2, 3}
set1&set2  ##交集： 集合set1或set2中包含的所有元素
#{4, 5}
set1|set2  ##并集： 集合set1和set2中都包含了的元素
#{1, 2, 3, 4, 5, 6, 7}
set1^set2  # 不同时包含于set1和set2的元素
#{1, 2, 3, 6, 7}

3.difference()、intersection() 、union()

difference()方法返回一个包含两个集合之间的差异的集合。就是返回的集合包含仅存在于第一个集合中而不存在于两个集合中的元素，即差集。

set1=set([2,3,4])
set2=set([0,2,4,5])
set1.difference(set2)
#{3}

intersection() 方法用于返回两个或更多集合中都包含的元素，即交集。

set1 = {1, 2, 3, 4, 3, 5}         
set2 = set([4, 5, 6, 6, 7])
set.intersection(set1,set2)
#{4, 5}

union() 方法返回包含了所有集合的元素，重复的元素只会出现一次，即并集。

set1 = {1, 2, 3, 4, 3, 5}         
set2 = set([4, 5, 6, 6, 7])
set.union(set1,set2)
#{1, 2, 3, 4, 5, 6, 7}

4、集合的增、删、改、查

添加元素

set.add( x )

set.add( x ) 如果元素已存在，则不进行任何操作

set.update( x )

set.update( x ) 添加元素，参数可以是列表，元组，字典

a = set(('p','y','t','h','o','n'))  
#{'h', 'n', 'o', 'p', 't', 'y'}
a.add(5)  
#{5, 'h', 'n', 'o', 'p', 't', 'y'}
a.update((5,8)) 
#{5, 8, 'h', 'n', 'o', 'p', 't', 'y'}
a.update([5,6],[8,9]) 
#{5, 6, 8, 9, 'h', 'n', 'o', 'p', 't', 'y'}

移除元素

set.remove( x ) 将元素 x 从集合 set 中移除，如果元素不存在，则会发生错误

set.remove( x )

a = set(('p','y','t','h','o','n'))   
#{'h', 'n', 'o', 'p', 't', 'y'}
a.remove('o')
#{'h', 'n', 'p', 't', 'y'}

set.discard( x ) 移除集合中的元素，且如果元素不存在，不会发生错误

a = set(('p','y','t','h','o','n'))
a.discard( 'x' )
#{'h', 'n', 'o', 'p', 't', 'y'}

set.pop() 随机删除集合中的一个元素

a = set(('p','y','t','h','o','n'))
a.pop()
#{'h', 'n', 'p', 't', 'y'}

计算集合元素个数

len(set)

a = set(('p','y','t','h','o','n'))
# {'h', 'n', 'o', 'p', 't', 'y'}
len(a)
# 6

清空集合

set.clear()

a = set(('p','y','t','h','o','n'))
# {'h', 'n', 'o', 'p', 't', 'y'}
a.clear()
#set()

判断元素是否在集合中存在

x in set

set1 = {1, 2, 3, 4, 3, 5}          
set2 = set([4, 5, 6, 6, 7])   
2 in set1   
#True
2 in set2
#False

二、unique()

unique()是以数组形式（numpy.ndarray）返回列的所有唯一值（特征的所有唯一值）
统计list中的不同值时，返回的是array.它有三个参数，可分别统计不同的量，返回的都是array.

import numpy as np
a = [1,5,4,2,3,3,5,7]
# 返回一个array
print(np.unique(a))
# [1 2 3 4 5 7]

# 返回该元素在list中第一次出现的索引
print(np.unique(a,return_index=True))
# (array([1, 2, 3, 4, 5, 7]), array([0, 3, 4, 2, 1, 7], dtype=int64))
 
# 返回原list中每个元素在新的list中对应的索引
print(np.unique(a,return_inverse=True))
# (array([1, 2, 3, 4, 5, 7]), array([0, 4, 3, 1, 2, 2, 4, 5], dtype=int64))

统计series中的不同值时，返回的是array，它没有其它参数

import pandas as pd
se = pd.Series([1,6,4,5,2,2,3])
print(se.unique())
# [1 6 4 5 2 3]

三、nunique()

nunique() Return number of unique elements in the object.即返回的是唯一值的个数
可直接统计dataframe中每列的不同值的个数,也可用于series,但不能用于list.返回的是不同值的个数.

df=pd.DataFrame({'A':[0,2,1],'B':[5,5,6]})
print(df)
print('\n')
print(df.nunique())

   A  B
0  0  5
1  2  5
2  1  6

A    3
B    2
dtype: int64

也可与groupby结合使用,统计每个块的不同值的个数.
dataframe.groupby().agg()：分组聚合函数(第一个括号分组，第二个括号聚合)
df.groupby(by=[‘x1’,’x2’…])[‘Y’].agg({dict})：表示根据x1,x2…对Y分组，通过agg()进行聚合

df=pd.DataFrame({'name':['ab','ef','gg','hg'],
               'sex':[1,0,1,1],
               'class':[1,2,3,2]})

df
  name sex class
0 	ab 	1 	1
1 	ef 	0 	2
2 	gg 	1 	3
3 	hg 	1 	2

df.groupby(['sex'])['class'].agg(['nunique']).reset_index()
   sex nunique
0 	0 	1
1 	1 	3

喜东东only

关注

20
点赞
踩
24

收藏

觉得还不错? 一键收藏
1
评论
集合set()、unique()、nunique()

difference()方法返回一个包含两个集合之间的差异的集合。就是返回的集合包含仅存在于第一个集合中而不存在于两个集合中的元素，即差集。统计list中的不同值时，返回的是array.它有三个参数，可分别统计不同的量，返回的都是array.dataframe.groupby().agg()：分组聚合函数(第一个括号分组，第二个括号聚合)set.discard( x ) 移除集合中的元素，且如果元素不存在，不会发生错误。union() 方法返回包含了所有集合的元素，重复的元素只会出现一次，即并集。
复制链接

扫一扫