Python——集合(set)

最新推荐文章于 2022-12-31 23:00:00 发布

Donquixote Corazon

最新推荐文章于 2022-12-31 23:00:00 发布

阅读量829

点赞数

分类专栏： Python 文章标签： Python 集合 set

本文链接：https://blog.csdn.net/qq_39754970/article/details/103268866

版权

Python 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

集合(set)：

集合是一些唯一的、不可变的对象组成的一个无序集合体（collection），这些对象支持与数学集合论相对应的操作，一个元素在集合中只能出现一次，不管它被添入了多少次，因此集合被广泛的应用于数值和数据库的工作中

因为集合是其他对象的集合体，因此集合具有列表与字典的共同行为，如可迭代

一个集合的行为类似于一个有键无值的字典，但由于集合是无序的，而且不会把键映射到值，因此集合既不是序列也不是映射类型，而是自成一体的类型

集合的定义：

Python2.6及之前的版本：

set() -> new empty set object
set(iterable) -> new set object

# Python2.6及之前版本的集合定义

#空集合的定义
s = set()
print(s)
# 输出结果：set()


# 使用列表或可迭代对象进行集合的定义
s = set([1, 'a', (2, 3)])
print(s)
# 列表为无序集合体，所以输出是无序的
# 输出结果：{(2, 3), 1, 'a'}

s = set(range(4))
print(s)
# 输出结果：{0, 1, 2, 3}


# 在集合中重复出现的元素将会被删除
s = set([1, 1, 2, 3, 1])
print(s)
print(len(s))
# 输出结果：{1, 2, 3}
# {1, 2, 3}
# 3

Python3.X与Python2.7中的定义方式：

使用集合字面量形式，该形式使用花括号来定义集合

# 该形式不能用于定义空集合，会被认为是字典类型
s = {}
print(type(s))
# 输出结果：<class 'dict'>

s = set()
print(type(s))
# 输出结果：<class 'set'>

s = {1}
print(type(s))
# 输出结果：<class 'set'>



# 正确定义形式
s = {1, 'a', (2, 3)}
print(s)
# 输出结果：{(2, 3), 1, 'a'}

# 使用集合推导形式来定义集合
s = {x ** 2 for x in [1, 2, 3, 4]}
print(s)
# 输出结果：{16, 1, 4, 9}

集合只能包含不可变的（可哈希化的）对象类型，因此不能将不可哈希的类型插入集合中，如list、dict、 bytearray，否则将会报错

可hash的类型：

数值型：int，float，complex
布尔型：True，False
字符串：string，bytes
元组：tuple
None

# 使用hash()函数测试
l = [1, 2, 3]
print(hash(l))
# 输出结果：TypeError: unhashable type: 'list'

s = set([1, [2, 3]])
print(s)
# 输出结果：TypeError: unhashable type: 'list'

集合中的方法：

S.add(…)，Add an element to a set，This has no effect if the element is already present

# #S.add()
s = {1, 2, 3}
s.add(4)
print(s)
# 输出结果：{1, 2, 3, 4}

# add()仅能接收一个参数
s.add(4, 5)
print(s)
# 输出结果：TypeError: add() takes exactly one argument (2 given)

S.update(…)，Update a set with the union of itself and others

# S.update()
s = {1, 2, 3}
s.update([4, 5, 6], [7, 8])
print(s)
# 输出结果：{1, 2, 3, 4, 5, 6, 7, 8}

# update仅能接收可迭代类型的对象
s.update(4)
print(s)
# 输出结果：TypeError: 'int' object is not iterable

S.remove(…)，Remove an element from a set; it must be a member

# S.remove()
# remove()仅能接收一个参数
s = {1, 2, 3, 4, 5}
s.remove(4)
s.remove(2)
print(s)
# 输出结果：{1, 3, 5}

s = {'a', 'b'}
s.remove('a')
print(s)
# 输出结果：{'b'}

# remove()中接收的参数必须是集合中存在的成员
s = {1, 2, 3}
s.remove(4)
print(s)
# 输出结果：KeyError: 4
# 由于集合通过哈希值作为key遍历，所以此处会输出KeyError

S.discard(…)，Remove an element from a set if it is a member，If the element is not a member, do nothing

# S.discard()
# discard()仅接收一个参数
s = {1 ,2 ,3}
s.discard(1)
# 输出结果：{2, 3}

# discard()一个不存在的值
s.discard(4)
print(s)
# 输出结果：{1, 2, 3}

S.pop(…) -> item，Remove and return an arbitrary set element，Raises KeyError if the set is empty

# S.pop()
s = {1 ,2 ,3}
s.pop()
i = s.pop()

print(s)
print(i)
# 输出结果：
# {3}
# 2

# pop()不能接收参数
# s = {1 ,2 ,3}
# s.pop(1)
# print(s)
# 输出结果：TypeError: pop() takes no arguments (1 given)

# 当集合为空时，使用pop()会报错
s = set()
s.pop()
print(s)
# 输出结果：KeyError: 'pop from an empty set'

S.clear(…)，Remove all elements from this set

# S.clear()
s = {1 ,2 ,3}
s.clear()
print(s)
# 输出结果：set()

S.copy(…) ->new set，Return a shallow copy of a set

# S.copy()
s = {1 ,2 ,3}
s1 = s.copy()
print(s1)
# 输出结果：{1, 2, 3}

set和线性结构：

在线性结构中查询操作的时间复杂度为O(n)，查询时间随数据规模的增大而增大

在set、dict等结构中，其内部使用hash值作为key，时间复杂度可以做到O(1)，且查询时间与数据规模无关，效率极高

集合运算（仅能用于集合）：

交集：

S.intersection(…) ->set ，Return the intersection of two or more sets as a new set，等同于 set1 & set2
S.intersection_update(…)，等同于 set1 &= set2

# S.intersection()
s1 = {1, 2, 3, 4}
s2 = {3, 4, 5, 6}
s3 = s1.intersection(s2)
s4 = s1 & s2
print(s1, s2, s3, s4)
# {1, 2, 3, 4}
{3, 4, 5, 6}
{3, 4}
{3, 4}
# s1与s2不改变


# S.intersection_update()
s1 = {1, 2, 3, 4}
s2 = {3, 4, 5, 6}
s3 = {5, 6, 7, 8}
s1.intersection_update(s2)
s2 &= s3
print(s1, s2)
# 输出结果：
# {3, 4}
# {5, 6}
# s1与s2改变

并集：

S.union(…) ->new set ，Return the union of sets as a new set，等同于 set 1 | set 2
S.update(…) ，等同于 set 1 |= set 2

# S.union()
s1 = {1, 2, 3, 4}
s2 = {3, 4, 5, 6}
s3 = s1.union(s2)
s4 = s1 | s2
print(s1, s2, s3, s4)
# 输出结果：
# {1, 2, 3, 4}
# {3, 4, 5, 6}
# {1, 2, 3, 4, 5, 6}
# {1, 2, 3, 4, 5, 6}



# S.update()
s1 = {1, 2, 3, 4}
s2 = {3, 4, 5, 6}
s3 = {5, 6, 7, 8}
s1.update(s2)
s2 |= s3
print(s1, s2)
# 输出结果：
# {1, 2, 3, 4, 5, 6}
# {3, 4, 5, 6, 7, 8}

差集：

S.difference(…) ->new set，Return the difference of two or more sets as a new set，all elements that are in this set but not the others，等同于 set 1 - set 2
S.difference_update(…)，Remove all elements of another set from this set ，等同于 set 1 -= set 2

# S.difference()
s1 = {1, 2, 3, 4}
s2 = {3, 4, 5, 6}
s3 = s1.difference(s2)
s4 = s1 - s2
print(s1, s2, s3, s4)
# 输出结果：
# {1, 2, 3, 4}
# {3, 4, 5, 6}
# {1, 2}
# {1, 2}


# S.difference_update()
s1 = {1, 2, 3, 4}
s2 = {3, 4, 5, 6}
s3 = {5, 6, 7, 8}
s1.difference_update(s2)
s2 -= s3
print(s1, s2)
# 输出结果：
# {1, 2}
# {3, 4}

对称差集（XOR）：

S.symmetric_difference(…) ->new set ，Return the symmetric difference of two sets as a new set，all elements that are in exactly one of the sets，等同于 set1 ^ set2
S.symmetric_difference_update(…)，Update a set with the symmetric difference of itself and another ，等同于 set1 ^= set2

# S.symmetric_difference()
s1 = {1, 2, 3, 4}
s2 = {3, 4, 5, 6}
s3 = s1.symmetric_difference(s2)
s4 = s1 ^ s2
print(s1, s2, s3, s4)
# 输出结果：
# {1, 2, 3, 4}
# {3, 4, 5, 6}
# {1, 2, 5, 6}
# {1, 2, 5, 6}


# S.symmetric_difference_update()
s1 = {1, 2, 3, 4}
s2 = {3, 4, 5, 6}
s3 = {5, 6, 7, 8}
s1.symmetric_difference_update(s2)
s2 ^= s3
print(s1, s2)
# 输出结果：
# {1, 2, 5, 6}
# {3, 4, 7, 8}

其他集合运算：

S.issubset(…) -> Bool，Report whether another set contains this set等同于set1 <= set2，若判断是否是真子集，使用 set1 < set2 形式
S.issuperset(…) -> Bool， Report whether this set contains another set 等同于set1 >= set2，若判断是否是真超集，使用 set1 > set2 形式
S.isdisjoint(…) -> Bool， Return True if two sets have a null intersection

# S.issubset()
s1 = {1, 2}
s2 = {1, 2, 3}
s3 = s2.copy()

print(s1.issubset(s2))
print(s1 <= s3)
print(s2 < s3)
# 输出结果：
# True
# True
# False


#S.issuperset()
s1 = {1, 2}
s2 = {1, 2, 3}
s3 = s2.copy()

print(s2.issuperset(s1))
print(s3 >= s1)
print(s3 > s2)
# 输出结果：
# True
# True
# False


# S.isdisjoint()
s1 = {1, 2, 3}
s2 = {4, 5, 6}
s3 = {2, 3, 4}

print(s1.isdisjoint(s2))
print(s1.isdisjoint(s3))
# 输出结果：
# True
# False

使用运算符与函数进行集合运算的区别：

使用运算符仅能够进行集合与集合间的运算，使用函数能够进行集合与可迭代对象间的运算（包括列表），运算后结果为集合类型

# 使用函数与运算符进行集合运算的区别
s1 = {1, 2, 3, 4}
t = (3, 4, 5, 6)
# 此处使用函数进行集合运算，t为元组类型
s2 = s1.intersection(t)
print(s2)
# 输出结果：{3, 4}

s1 = {1, 2, 3, 4}
t = (3, 4, 5, 6)
# 此处使用运算符进行运算
s2 = s1 & t
print(s2)
# 输出结果：TypeError: unsupported operand type(s) for &: 'set' and 'tuple'

为什么使用集合：

过滤集合体中的重复项

# 使用集合过滤集合体中的重复项

l = [1 ,2 ,3, 5, 1, 1, 4, 2]
l = list(set(l))
print(l)
# 输出结果：[1, 2, 3, 4, 5]
# 由于集合是无序的，所以在该过程中会重新排序

进行顺序无关的等价性测试

# 进行顺序无关的等价性测试
l1, l2 = [1, 2, 3], [3, 2, 1]
print(l1 == l2)
print(set(l1) == set(l2))
print(sorted(set(l1)) == sorted(set(l2)))
# 输出结果：
# False
# True
# True

提取其他可迭代对象中的差异

# 提取其他可迭代对象中的差异
str1 = 'hello world'
str2 = 'helloworld'

print(set(str1) - set(str2))
print(set(str1) & set(str2))
# 输出结果：
# {' '}
# {'h', 'r', 'e', 'd', 'o', 'w', 'l'}