Self-study Python Fish-C Note10 P40to41

weixin_46530456

已于 2024-06-13 17:07:29 修改

阅读量842

点赞数 14

分类专栏：鱼C_python课程文章标签： python 数据库

于 2024-06-13 16:37:12 首次发布

本文链接：https://blog.csdn.net/weixin_46530456/article/details/139658173

版权

鱼C_python课程专栏收录该内容

19 篇文章 0 订阅

订阅专栏

集合

初步认识

与字典的区别联系：

type({})

dict

type({'a'})

set

type({1:'a'})

dict

集合的特性：
（1）唯一性：集合中所有元素都是独一无二的 (字典也有唯一性，但是是键值唯一）
(2) 无序性：我们不能使用下标索引的方式来访问

创建集合

（1）使用花括号

{'a','b','c','d','e','f'}

{'a', 'b', 'c', 'd', 'e', 'f'}

se = {'a','b','c','d','e','f'}
type(se)

set

(2) 集合推导式

{i for i in 'apple'} 
# 这里的结果也说明的集合的无序性，传进去的是 'apple' 而出来的是却不按照原来的顺序。（不同的电脑可能不一样，打印顺序是随机的）

{'a', 'e', 'l', 'p'}

(3) 类型构造器 `set()`

set('apple')

{'a', 'e', 'l', 'p'}

判断某个，某些元素是否在集合内 `in` 和 `not in`

s1 = set('apple')
'a' in s1

True

'A' not in s1

True

[i in s1 for i in ['a', 'b', 'c', 'd', 'e', 'f']]

[True, False, False, False, True, False]

[i in s1 for i in {'a', 'b', 'c', 'd', 'e', 'f'}] # 这里再次说明集合的无序性， {'a', 'b', 'c', 'd', 'e', 'f'} 是无序的

[False, True, False, False, False, True]

{i in s1 for i in ['a', 'b', 'c', 'd', 'e', 'f']} # 这里再次说明集合的唯一性，重复的元素都会被去除

{False, True}

访问集合中的元素（迭代）

s1 = set('apple')
for i in s1:
    print(i)

p
a
e
l

这里再次体现出集合的无序性和唯一性

利用集合去重（集合唯一性）

set([0,1,2,3,4,5,6,7,8,9,9,9,9,9,9])

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

判断一个列表中是否有相同的元素：

l1 = [0,1,2,3,4,5,6,7,8,9,9,9,9,9,9]
len(l1) == len(set(l1))

False

集合的方法

`copy()` 方法实现浅拷贝（与序列，字典相同）

s1 = set('apple')
a = s1.copy()
a

{'a', 'e', 'l', 'p'}

`isdisjoint()` 方法检测两个集合是否毫不相干（是否有交集）

s1 = set('apple')
s2 = set('banana')
s1.isdisjoint(s2) # 这里的 False 说明两者是有相关的，即有交集的 交集 {'a'}

False

s1 = set('pple')
s2 = set('banana')
s1.isdisjoint(s2) # 这里就没有交集了

True

r1 = s1.isdisjoint(set('banana'))
print(r1)
r2 = s1.isdisjoint('banana') # 对于 `isdisjoint()` 方法 可以直接传入可迭代对象，会自动把其转化为集合
print(r2)

True
True

`issubset()` 方法检测一个集合是不是另一个集合的子集

子集：对于两个集合A 和 B，如果集合A中任意一个元素都是集合B中的元素，我们就说这两个集合有包含关系，称集合A为集合B的子集

s1 = set('135')
s1.issubset('02468')

False

s1 = set('135')
s1.issubset('13579')

True

`issuperset()` 方法检测一个集合是不是另一个集合的超集

超集：对于两个集合A 和 B，如果集合B中任意一个元素都是集合A中的元素，我们就说这两个集合有包含关系，称集合A为集合B的超集

s1 = set('02468')
s1.issuperset('02')

True

s1 = set('02468')
s1.issuperset('021')

False

计算当前集合和其他对象共同构建的并集、交集、差集以及对称差集

预备两个集合用来举例：

s1 = set('13579')
s2 = set('02468')
print(s1,s2)

{'7', '1', '9', '3', '5'} {'8', '4', '0', '2', '6'}

并集 `union()` 方法

并集：对于两个集合 A、B, 把它们所有的元素合并在一起组成的集合，叫做集合 A 与集合 B 的并集

s1.union(s2)

{'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'}

s1.union('0123456')

{'0', '1', '2', '3', '4', '5', '6', '7', '9'}

交集 `intersection()` 方法

交集：对于两个集合 A、B, 由所有属于集合 A 且属于集合 B 的元素所组成的集合，叫做集合A和集合B的交集

s1.intersection('0123456')

{'1', '3', '5'}

差集 `difference()` 方法

差集：对于两个集合 A、B, 由所有属于集合 A 且不属于集合 B 的元素所组成的集合，叫做集合A和集合B的差集

s1.difference('0123456')

{'7', '9'}

s2.difference('0123456')

{'8'}

注意：并集 union() 方法，交集 intersection() 方法，差集 difference() 方法，都可以支持多参数

r1 = s1.union('a', '234','bc')
print(r1)

{'b', 'c', '5', 'a', '7', '1', '4', '9', '3', '2'}

r1 = s1.intersection('1','23456', '789')
print(r1) # 四个集合没有共同的交集
r2 = s1.intersection('1','123', 'S1')
print(r2) # 四个集合共同的交集就是 {'1'}

set()
{'1'}

r1 = s1.difference('1','23456', '789')
print(r1) 
r2 = s1.difference('1','35', '7')
print(r2)

set()
{'9'}

对称差集 `symmetric_difference()` 方法

对称差集：对于两个集合 A、B, 先排除集合A与集合B的所有共同元素，由剩余元素组成的集合，叫做集合A与集合B的对称差集
!!注意只支持一个参数！！

s1.symmetric_difference('12345')

{'2', '4', '7', '9'}

# 体会一下和差集的不同
r1 = s1.symmetric_difference('135')
r2 = s1.difference('135')
print(r1,r2)
r1 = s1.symmetric_difference('12345')
r2 = s1.difference('12345')
print(r1,r2)

{'7', '9'} {'7', '9'}
{'7', '4', '9', '2'} {'7', '9'}

使用运算符来运算上面的操作

注意：使用运算符的话，符号两边都要是集合类型的数据，否则报错

检测子集 `<=` 和真子集 `<`

set('123') <= set('123')

True

set('123') < set('123456')

True

检测超集 `>=` 和真超集 `>`

set('123') >= set('123')

True

set('123456') > set('123456')

False

set('123456') > set('135')

True

并集 `|` （管道号）

set('123') | set('456') | {'789'}

{'1', '2', '3', '4', '5', '6', '789'}

交集 `&`

set('123456') & set('123456') & {'1'}

{'1'}

差集 `-`

set('123456') - set('123') - {'4', '5'}

{'6'}

对称差集 `^` (脱字符）

s1 = set('13579')
s1^set('12345')

{'2', '4', '7', '9'}

不可变集合 `frozenset()`

特点：相比于集合set(), frozenset() 是不可变的。我们上文讨论的集合的方法（计算交集，判断子集等），都不会修改不可变集合的内容，所以他们既适用于 set() 也适用于 frozenset()。
从这个位置开始，后面的方法会改变集合的内容，也就是只能用于集合set()，不能用于不可变集合 frozenset()。

# example for frozenset()
r1 = frozenset('abcedf')
print(r1)

frozenset({'f', 'a', 'b', 'd', 'c', 'e'})

改变集合内容的方法（不可以用于不可变集合 `frozenset()`

`update(*others)` 方法

作用：使用 others 参数指定的值来更新集合。
小技巧：python文档里，others表示支持多个参数，other 只支持一个参数

s1 = set('abc')
s1.update('def')
print(s1)

{'f', 'a', 'b', 'd', 'c', 'e'}

s1 = set('abc')
s1.update(['d', 'e', 1])
print(s1)

{1, 'a', 'b', 'd', 'c', 'e'}

s1 = set('abc')
s1.update([1, 1, 'def'], '123')
print(s1)

{1, 'a', '1', 'b', 'c', '3', 'def', '2'}

`intersection_update(others)`, `difference_update(others)`, `symmetric_difference_update(*others)` 方法

描述：intersection_update(*others), difference_update(*others), symmetric_difference_update(*others) 分别使用交集，差集，对称差集来更新集合。之前讲的 intersection(), difference(), symmetric_difference() 方法只返回计算结果，不更新集合。union() 加 update 就是 update() 方法。

s1 = set('123456')
s1.intersection_update('135', '13')
print(s1)

{'3', '1'}

# compart intersection_update() with intersection()
s1 = set('123')
s2 = s1.intersection('1')
print(s1,s2)
s1 = set('123')
s2 = s1.intersection_update('1') # 直接更新 s1，没有返回值，s2 此时赋值为 None
print(s1,s2)

{'3', '2', '1'} {'1'}
{'1'} None

s1 = set('123456')
s1.difference_update('135', '43')
print(s1)

{'2', '6'}

s1 = set('123456')
s1.symmetric_difference_update('13579')
print(s1)

{'7', '4', '9', '2', '6'}

`add(elem)` 方法

描述：单纯的往集合里面添加某一个数据
对比 update() 方法，update() 传入的可迭代对象，如传入字符串，是会迭代获取其中每个字符作为元素插入到集合中。但是，add()方法传入字符串，会将整个字符串作为元素插入集合。

s1 = set('123456')
s1.add('13579')
print(s1)

{'1', '4', '13579', '3', '2', '6', '5'}

# 对比 `update()` 和 `add()`
s1 = set('123')
s2 = set('123')
s1.update('456')
s2.add('456')
print(s1,s2)
# more example for comparison
## s1.update([4,5,6]) 可以运行，但是 s1.add([4,5,6]) 就会报错 TypeError: unhashable type: 'list'
## example 2:
s1 = set('123')
s2 = s1.copy()
s1.update((1,2,3))
s2.add((1,2,3))
print(s1,s2)

{'1', '4', '3', '2', '6', '5'} {'3', '2', '1', '456'}
{1, 2, 3, '1', '3', '2'} {'3', '2', '1', (1, 2, 3)}

`remove(elem)` 方法 and `discard(elem)` 方法

区别：如果指定的元素不存在，remove(elem) 方法会抛出异常，discard(elem) 方法则会静默处理

s1 = set('123')
s1.remove('1')
print(s1)

{'3', '2'}

s1 = set('123')
s1.discard('1')
print(s1)

{'3', '2'}

s1 = set('123')
s1.discard(1) # 此时静默处理了。如果 s1.remove(1) 则会报错
print(s1)

{'3', '2', '1'}

`pop(elem)` 方法

作用：随机从集合中弹出一个元素（返回值就是这个元素）。空集合就不能再弹了。

s1 = set('123')
s1.pop()

'3'

`clear()` 方法

描述：直接将集合清空

s1 = set('123')
s1.clear()
print(s1)

set()

可哈希

创建集合和字典有一个要求：字典的键和集合的元素必须是可哈希的。
如果一个对象是可哈希的，就要求其哈希值必须在整个程序的生命周期中保持不变。
python 中，大多数不可变的对象是可哈希的（如字符串），大多数不可变的对象是不可哈希的（如列表）

获取哈希值： `hash()` 函数

hash(1)

print(hash(1)) # 对整数 求哈希值，其哈希值等于自身
print(hash(1.0)) # 如果两个对象的值是相等的，尽管他们是不同的对象，比如 1(int) 和 1.0(float), 那么他们的哈希值也应该是相等的
print(hash(1.0001))

1
1
230584300921345

print(hash('abc'))
print(hash(('abc','def'))) # 元组可哈希，但是列表不可以 hash([1,2,3]) 会报错

-8654782928733943188
7886174532616745686

只有可哈希的对象，才能做字典的键或者集合的对象

# 只有可哈希的对象，才能做字典的键或者集合的对象
d1 = {(1,2,3):1}
s1 = {(1,2,3), 1}
print(d1,s1)

{(1, 2, 3): 1} {1, (1, 2, 3)}

嵌套集合是不可行的（集合不可哈希，不能做集合的元素），如果要实现嵌套集合就要用不可变集合 frozenset()

# {{1,2,3},2,3} 会报错因为，集合是可变的，不可哈希的。
s1 = frozenset([1,2,3])
s2 = {s1,2,3}
print(s1, '\n', s2)

frozenset({1, 2, 3}) 
 {frozenset({1, 2, 3}), 2, 3}

这里有一个小疑惑？为啥 python 中哈希 -1 与哈希 -2 都是 -2 ？

print(hash(-2))
print(hash(-1))
print(hash(-2) == hash(-1))

-2
-2
True

字典和集合与列表的速度对比

例子：

import random
import timeit

haystack = [random.randint(1,10000000) for i in range(10000000)]
needles = [random.randint(1,1000) for i in range(1000)]

# 在此处添加一行代码，使得查找过程的执行效率提高10000倍以上。

def find():
    found = 0
    for each in needles:
        if each in haystack:
            found+=1
    
    print(f'We found {found} times match in total')
    
t = timeit.timeit('find()', setup='from __main__ import find', number=1)
print(f'The whole process for find() take {t} seconds')

We found 640 times match in total
The whole process for find() take 68.0143387 seconds

# 优化后 （优化方法：就是把 haystack 换成集合）
haystack = set(haystack)
t = timeit.timeit('find()', setup='from __main__ import find', number=1)
print(f'The whole process for find() take {t} seconds')
# 如果数据量再大的话，这个差距会更加夸张。为什么列表变成集合后，效率提升这么大：
## 因为集合背后有散列表的支持，而列表却没有。打个比方，比如在新华字典里找一个 ‘在’ 字，in 在列表上的查找方式如同从头到尾挨个去翻字典，每一个对比是不是‘在’字，in 在集合上的查找方式是通过‘在’的笔画索引，得到所在的页码而找到这个字。
## 但是集合或者字典的效率背后是有代价的，代价就是要牺牲海量的存储空间，以空间换时间。

We found 640 times match in total
The whole process for find() take 0.0002829999999960364 seconds

附言：
题目：Self-study Python Fish-C Note-10 P40-P41
本文为自学B站上鱼C的python课程随手做的笔记。一些概念和例子我个人为更好的理解做了些查询和补充
因本人水平有限，如有任何问题，欢迎大家批评指正！
原视频链接：https://www.bilibili.com/video/BV1c4411e77t?p=8