Python 中Collections 库的使用

KevinShi_BJ

已于 2024-09-05 11:03:08 修改

阅读量241

点赞数 4

文章标签： python 开发语言

于 2024-09-03 16:42:53 首次发布

本文链接：https://blog.csdn.net/m0_46699540/article/details/141865453

版权

Collections 是Python的数据容器类。官方文档（collections --- 容器数据类型 — Python 3.12.5 文档）说 -- “这个模块实现了一些专门化的容器，提供了对 Python 的通用内建容器 dict、list、set 和 tuple 的补充。”

其中 Counter, defaultdict, ChainMap 等应用广泛。

namedtuple()	一个工厂函数，用来创建元组的子类，子类的字段是有名称的。
deque	类似列表的容器，但 append 和 pop 在其两端的速度都很快。
ChainMap	类似字典的类，用于创建包含多个映射的单个视图。
Counter	用于计数 hashable 对象的字典子类
OrderedDict	字典的子类，能记住条目被添加进去的顺序。
defaultdict	字典的子类，通过调用用户指定的工厂函数，为键提供默认值。
UserDict	封装了字典对象，简化了字典子类化
UserList	封装了列表对象，简化了列表子类化
UserString	封装了字符串对象，简化了字符串子类化

知乎中有篇文章，讲得比较深入

https://zhuanlan.zhihu.com/p/343747724

下面举几个例子，说明应用：

1. defaultdict

from collections import *

class Solution(object):
    #将一个句子中的单词提取并按单词长度排序
    def words_count(self, sentence: str):
        tmp = defaultdict(list)
        print(f"ori tmp -- {tmp}")
        for i in sentence.lower().replace(".", "").split():
            tmp[len(i)].append(i)
        print(f"after defaultdict handing -- {tmp}")
        result = []
        print(tmp.keys())
        len_list = sorted(tmp.keys())
        print(len_list)
        #使用extend()
        # for i in len_list:
        #     result.extend(tmp[i])
        # 或者使用 list.append()
        for i in len_list:
            for j in range(len(tmp[i])):
                result.append(tmp[i][j])
        return result

if __name__ == "__main__":
    sentence = "A yellow fox runs to the mountain."
    s = Solution()
    print(s.words_count(sentence))

2. Chainmap

baseline = {'music': 'bach', 'art': 'rembrandt'}
>>> adjustments = {'art': 'van gogh', 'opera': 'carmen'}
>>> list(ChainMap(adjustments, baseline))
['music', 'art', 'opera']

3. Counter

统计各单词次数是一绝。Counter 的使用异常灵活。遇到相关场景先看下官方文档（collections --- 容器数据类型 — Python 3.12.5 文档），了解一下它的能力。

# Tally occurrences of words in a list
>>> cnt = Counter()
>>> for word in ['red', 'blue', 'red', 'green', 'blue', 'blue']:
...     cnt[word] += 1
...
>>> cnt
Counter({'blue': 3, 'red': 2, 'green': 1})

>>> # Find the ten most common words in Hamlet
>>> import re
>>> words = re.findall(r'\w+', open('hamlet.txt').read().lower())
>>> Counter(words).most_common(10)
[('the', 1143), ('and', 966), ('to', 762), ('of', 669), ('i', 631),
 ('you', 554),  ('a', 546), ('my', 514), ('hamlet', 471), ('in', 451)]