【collections】模块——defaultdict,orderedDict,deque,counter,namedtuple

最新推荐文章于 2021-11-18 15:09:38 发布

panbaoran913

最新推荐文章于 2021-11-18 15:09:38 发布

阅读量481

点赞数 1

分类专栏： # python 模块文章标签： python 列表队列

本文链接：https://blog.csdn.net/panbaoran913/article/details/110004372

版权

python 模块专栏收录该内容

19 篇文章 3 订阅

订阅专栏

collections模块中的数据结构

Python中内置了4种数据类型，包括：list，tuple，set，dict，这些数据类型都有其各自的特点，但是这些特点（比如dict无序）在一定程度上对数据类型的使用产生了约束，在某些使用场景下效率会比较低，比如有时候我们可能需要维护一个有序的字典等情况。
在这种场景下我们可以使用Python内建的collections模块，它包括了很多有用的集合类，合理的使用可以提高我们代码的运行效率。

调用collections模块：

from collections import *     #方式一
import collections            #方式二

一、`defaultdict`设置默认格式的字典

dict在使用时，当key值不存在时，直接添加value时会出现错误，使用defaultdict可以很好的规避该错误。defaultdict是对字典类型的补充，它可以给字典的值设置一个类型，当key不存在时可以自动生成相应类型的value。

help(collections.defaultdict)
class defaultdict(builtins.dict)
 |  defaultdict(default_factory[, ...]) --> dict with default factory
 ---------
 defaultdict的参数默认是dict，也可以为list,tuple

将一个列表分成两部分，以子列表为对应的值，创建一个新的字典。

values = [11,22,33,44,55,66,77,88,99,90]
mydic = {}

for v in values:
    if v>66:
        if 'k1' in mydic.keys():
            mydic['k1'].append(v)
        else:
            mydic['k1']=[v]      #原键不存在，则插入新键
    else:
        if 'k2' in mydic.keys():
            mydic['k2'].append(v)
        else:
            mydic['k2']=[v]
print(mydic)

使用defaultdict函数时更简便

from collections import defaultdict
 
values = [11,22,33,44,55,66,77,88,99,90] 
my_dict = defaultdict(list) 

for v in values:
    if v>66:
        my_dict['k1'].append(v)
    else:
        my_dict['k2'].append(v)
print(my_dict)

一个栗子：对姓名和名字的二维列表，改为字典格式，以性别为字典的键。

from collections import defaultdict
members=[
    ['male','John'],
    ['male','Jack'],
    ['female','Pony'],
    ['female','Lucy']
]
result=defaultdict(list)
for sex,name in members:
    result[sex].append(name)
print(result)

二、`OrderedDict`有序字典

在Python3.6之前的字典是无序的，但是有时候我们需要保持字典的有序性，orderDict可以在dict的基础上实现字典的有序性，这里的有序指的是按照字典key插入的顺序来排列，这样就实现了一个先进先出的dict，当容量超出限制时，先删除最早添加的key。

from collections import OrderedDict
original_dict = {'a': 2, 'b': 4, 'c': 5}
#for key,value in original_dict:  #报错，字典无序
for key,value in original_dict.items():    
    print (key,value)

ordered_dict=OrderedDict([('a',2),('b',4),('c',5)])    
for key,value in ordered_dict.items():
    print (key,value)

三、`deque`双向队列

Python中的list是基于数组实现的，所以，查找容易，但是插入和删除操作时间复杂度较大。
deque就是为了高效实现插入和删除操作的双向列表.(4种队列种的一种)

list只提供了append和pop方法来从list的尾部插入或者删除元素，deque新增了appendleft/popleft等方法可以更高效的在元素的开头来插入/删除元素。

help(collections.deque)
class deque(builtins.object)
 |  deque([iterable[, maxlen]]) --> deque object
内置函数有：
append()\appendleft()\clear()\copy()\count()\extend()\extendleft()
\index()\insert()\pop()\popleft()\remove()\reverse()\rotate
内置属性：
maxlen:队列的最大型号,如果无界的，则返回none

优势
相比于list实现的队列，deque实现拥有更低的时间和空间复杂度。list实现在出队（pop）和插入（insert）时的空间复杂度大约为O(n)，deque在出队（pop）和入队（append）时的时间复杂度是O(1)。

所以deque更有优越性而且deque既可以表示队列又可以表示栈实在是太方便了

deque支持in操作符

q = collections.deque([1, 2, 3, 4])
print(5 in q)  # False
print(1 in q)  # True

顺逆时针旋转.

#help(collections.deque.rotate)
#rotate(...)
#    Rotate the deque n steps to the right (default n=1).  
#    If n is negative, rotates left.步长为负数时，为逆时针
#顺时针
q = collections.deque([1,2,3,4,5,6])
q.rotate(1)  #顺时针，步长为1（默认）
print(q)  
q.rotate(2)  #顺时针，步长为2
print(q) 
#逆时针
q = collections.deque([1,2,3,4,5,6])
q.rotate(-1)
print(q)  
q.rotate(-2)
print(q)

队列可复制可清楚

q = collections.deque([1,2,3,4,5,6])
q1=q.copy()
q.clear()    #清楚队列q

在队列的尾部（或头部）添加元素（或列表）

#尾部添加
q = collections.deque([1,2,3,4,5,6])
q.append('a')
q.extend(['b','c'])
#头部添加
q = collections.deque([1,2,3,4,5,6])
q.appendleft('a')
q.extendleft(['b','c'])    #先添加'b',再添加'c'

查找索引的位置

# help(collections.deque.index)
# D.index(value, [start, [stop]]) -> integer -- return first index of value.
q = collections.deque([1,2,3,4,5,6])
q.index(3)   #返回元素3的索引#2
# q.index(3,3,5)   （元素3，索引起始位置3，索引起始位置5）
# 在索引3：5（取前不取后）的区间内找不到元素3，则报错

在指定位置插入元素

q = collections.deque([1,2,3,4,5,6])
q.insert(4,'a')

队列元素位置翻转

q = collections.deque([1,2,3,4,5,6])
q.reverse()

删除元素

#删除指定元素
q = collections.deque(list('abcdef'))
q.remove('b')

#pop删除并返回尾部（最右）元素
q = collections.deque(list('abcdef'))
q1=q.pop()
print(q)
print(q1)

#popleft删除并返回头部（最左）元素
q = collections.deque(list('abcdef'))
q1=q.popleft()
print(q)
print(q1)

四、`Counter`类

Counter类的目的是用来跟踪值出现的次数。它是一个无序的容器类型，以字典的键值对形式存储，其中元素作为key，其计数作为value。计数值可以是任意的Interger（包括0和负数）。Counter类和其他语言的bags或multisets很相似。

4.1 基础

创建

#创建空counter类
c=Counter()

# 从一个字典对象创建
c = Counter({'a': 4, 'b': 2})   #Counter({'a': 4, 'b': 2})

# 从一组键值对创建
c = Counter(a=4, b=2）          #Counter({'a': 4, 'b': 2})

## 从一个可iterable对象（list,元组，字符串等）
c = Counter('gallahad')         # Counter({'g': 1, 'a': 3, 'l': 2, 'h': 1, 'd': 1})

访问

c = Counter('gallahad') # Counter({'g': 1, 'a': 3, 'l': 2, 'h': 1, 'd': 1})
print(c['a'])   #3
print(c['z'])   #0 #键'z'不存在的时候，返回0，而不是报错

计数器的（增加式）更新update

更新的方式包括iterable对象或者另一个Counter对象来跟新键值

c=Counter('which')
print(c)             #Counter({'h': 2, 'w': 1, 'i': 1, 'c': 1})
c.update('wich')     ##使用可迭代对象（iterable）
print(c)             #Counter({'h': 3, 'w': 2, 'i': 2, 'c': 2})

cc=Counter('wacth')  
c.update(cc)         #使用另一个Counter对象更新
print(cc)            #Counter({'h': 3, 'w': 2, 'c': 2, 'i': 1, 'a': 1, 't': 1})

计数器（减少式）更新subtract

c=Counter('which')
print(c)             #Counter({'h': 2, 'w': 1, 'i': 1, 'c': 1})
c.subtract('wich')   #使用iterable
print(c)             #Counter({'h': 1, 'w': 0, 'i': 0, 'c': 0})

cc=Counter('wacth')
c.subtract(cc)      #使用另一个Counter
print(c)            #Counter({'h': 1, 'i': 1, 'w': 0, 'c': 0, 'a': -1, 't': -1})  #新元素为负数

键的删除

c=Counter({'a': 2, 'c': 2, 'b': 2, 'd': 1})
c['b']=0
print(c)   #Counter({'a': 2, 'c': 2, 'd': 1, 'b': 0})#赋值为0，并不意味着删除
del c['a']
print(c)   #Counter({'c': 2, 'd': 1, 'b': 0})

元素elements()

返回一个迭代器。元素被重复了多少次，在该迭代器中就包含多少个该元素。所有元素按照字母序排序，个数小于1的元素不被包含.

c = Counter(a=4, b=2, c=0, d=-2)
c.elements()      #<itertools.chain at 0x16380d0> 返回一个迭代器
list(c.elements())  #['a', 'a', 'a', 'a', 'b', 'b']

most_common([n])返回元素个数最多的

返回一个TopN列表。如果n没有被指定，则返回所有元素。当多个元素计数值相同时，按照字母序排列

c = Counter('abracadabra')
print(c.most_common())     #默认返回所有#[('a', 5), ('b', 2), ('r', 2), ('c', 1), ('d', 1)]
c.most_common(3)           #返回个数前3个#[('a', 5), ('b', 2), ('r', 2)]
c = Counter(a=3, b=1,c=7,d=5,e=2)
c.most_common()[:-4:-1]    # 取出计数最少的n-1个元素

复制copy()
算术和集合操作

#+
c = Counter(a=3, b=1)
d = Counter(a=1, b=2)
c+d                  #Counter({'a': 4, 'b': 3})  # c[x] + d[x]
c-d                  #Counter({'a': 2})          # subtract（只保留正数计数的元素）
c&d                  #Counter({'a': 1, 'b': 1})  # 交集: min(c[x], d[x])
c|d                  #Counter({'a': 3, 'b': 2})  # 并集: max(c[x], d[x])

pytyon自带函数的操作

c = Counter(a=3, b=1)
sum(c.values())         # 所有计数的总数   #4
c.clear()               # 重置Counter对象，注意不是删除
list(c)                 # 将c中的'键'转为列表    #['a', 'b']
set(c)                  # 将c中的键转为set       #{'a', 'b'}
dict(c)                 # 将c中的键值对转为字典   #{'a': 3, 'b': 1}
c.items()               # 转为(elem, cnt)格式的列表  #dict_items([('a', 3), ('b', 1)])
Counter(dict([('a',3),('b',3)]))   # 从(elem, cnt)格式的列表转换为Counter类对象 #dict(list_of_pairs)   
c = Counter(a=-3, b=1,c=0,d=5,e=2)
c += Counter()           # 移除0和负
c                        #Counter({'b': 1, 'd': 5, 'e': 2})
#循环式
for key in c:
    print(key)           # a # b  #是键元素，看list(c)转化的是键

栗子一：

#判断两个字符串是否由相同的字母集合调换顺序而成的（anagram）
def is_anagram(word1, word2):
  """
  Checks whether the words are anagrams.
  word1: string
  word2: string
  returns: boolean
  """
  return Counter(word1) == Counter(word2)
  
  is_anagram('sbs','pjh')

4.2应用

4.2.1 多元集合

创造：一个类——>多元集合（MultiSets），自带判断子集合功能

#字定义一个类 Mulstiset,自带判断子集合的功能
class Multiset(Counter):
  """A multiset is a set where elements can appear more than once."""
  def is_subset(self, other):
    """
    Checks whether self is a subset of other.
    other: Multiset
    returns: boolean
    """
    for char, count in self.items():
        if other[char] < count:
            return False
    return True
  # map the <= operator to is_subset
    __le__ = is_subset

4.2.2 概率质量函数

创造：一个类——>求解概率质量函数的功能

概率质量函数（probability mass function，简写为pmf）是离散随机变量在各特定取值上的概率。可以利用Counter表示概率质量函数。

normalize: 归一化随机变量出现的概率，使它们之和为1
add: 返回的是两个随机变量分布两两组合之和的新的概率质量函数
render: 返回按值排序的(value, probability)的组合对，方便画图的时候使用

class Pmf(Counter):
     #A Counter with probabilities.""" 
    def normalize(self):
    #Normalizes the PMF so the probabilities add to 1."""
        total = float(sum(self.values()))
        for key in self:
            self[key] /= total 
            
    def __add__(self, other):
#     """Adds two distributions.
 
#     The result is the distribution of sums of values from the
#     two distributions.
 
#     other: Pmf
 
#     returns: new Pmf
#     """
        pmf = Pmf()
        for key1, prob1 in self.items():
            for key2, prob2 in other.items():
                pmf[key1 + key2] += prob1 * prob2
        return pmf
    
    def __hash__(self):
    #"""Returns an integer hash value."""
        return id(self)
    def __eq__(self, other):
        return self is other
    def render(self):
    #"""Returns values and their probabilities, suitable for plotting."""
        return zip(*sorted(self.items()))

应用

#掷一次骰子的结果与概率
c=Pmf([1,2,3,4,5,6])
c.normalize()
c.name='one die'
print(c)
#Pmf({1: 0.16666666666666666, 2: 0.16666666666666666, 3: 0.16666666666666666, 4: 0.16666666666666666, 5: 0.16666666666666666, 6: 0.16666666666666666})

#掷两次骰子的结果和与概率
c_twice=c+c
c_twice.name='two dices'

for key,prob in c_twice.items():
    print(key,prob)
#结果    
#2 0.027777777777777776
#3 0.05555555555555555
#4 0.08333333333333333
#5 0.1111111111111111
#6 0.1388888888888889
#7 0.16666666666666669
#8 0.1388888888888889
#9 0.1111111111111111
#10 0.08333333333333333
#11 0.05555555555555555
#12 0.027777777777777776

#掷三次的结果和与概率
c_thrice=c+c+c
c_thrice.name='three dices'

import matplotlib.pyplot as plt
#最后可以使用render返回结果，利用matplotlib把结果画图表示出来：
for die in [c,c_twice,c_thrice]:
    xs,ys=die.render()
    plt.plot(xs,ys,label=die.name,linewidth=3,alpha=0.5)
    
plt.xlabel('Total')
plt.ylabel('Probablility')
plt.legend()
plt.show()

4.2.3贝叶斯统计

继续用掷骰子的例子来说明用Counter如何实现贝叶斯统计。现在假设，一个盒子中有5种不同的骰子，分别是：4面、6面、8面、12面和20面的。假设我们随机从盒子中取出一个骰子，投出的骰子的点数为6。那么，取得那5个不同骰子的概率分别是多少？

$P(A|B)=\frac{P(B|A)P(A)}{P(B)}$

事件A：取出一个骰子

事件B：骰子的点数

（1）首先，我们需要生成每个骰子的概率质量函数：

def make_die(num_sides):                    #任意一枚骰子的点数概率分布
    die=Pmf(range(1,num_sides+1))
    die.name='d%d' % num_sides
    die.normalize()
    return die

dice=[make_die(x) for x in [2,4,6,8,12,20]] #要求的所有的骰子的点数概率分布
print (dice)                                #数据结构形式为：【{字典}，{字典}，...】
dice[0][1]                                  #dice[列表索引][字典的键]

（2）接下来，定义一个抽象类Suite。Suite是一个概率质量函数表示了一组假设(hypotheses)及其概率分布。Suite类包含一个bayesian_update函数，用来基于新的数据来更新假设(hypotheses)的概率。

class Suite(Pmf):
    #Map from hypothesis to probability.   #hypothesis假设
    def bayesian_update(self,data):
        #     Performs a Bayesian update.
        #     Note: called bayesian_update to avoid overriding dict.update  #overriding覆盖
        #     data: result of a die roll   #骰子掷后的结果
        for hypo in self:
            like=self.likeihood(data,hypo)
            self[hypo]*=like
        self.normalize()

五、namedtuple

元组子类。我们知道，Python中元组的一个重要特征就是元素不可增删改，而查找tuple元素时一般采取索引。
使用namedtuple(typename, field_name)可以命名tuple中的元素，之后便可使用名字来查找tuple中的值，有点类似于字典中的查找

from collections import *

Mytuple=namedtuple('Mytuple',['x','y','z'])  
#创建了一个'Mytuple'元组类型，元素的名字 分别为‘x','y','z'
n=Mytuple(1,2,3)
#n=Mytuple(1,2)     #个数不匹配，报错
#n=Mytuple(1,2,3,4)#个数不匹配，报错
print(n.x)        #按名称取值
print(n[0])       #按索引取值

m=Mytuple(x=1,y=3,z=5)    #按名称（关键字方式）创建Mytuple类型数组
print(m)
print(n._field_defaults)  #默认元组元素名称
print(n._fields)          #设置的元组元素名称

与元组一样的uppack特性

Point = namedtuple('Point', ['x', 'y'])
p=Point(x=11,y=22)
x1,x2=p
print(x1,x2)

与字典的相互转化

#元组转换为字典
d=p._asdict()
d['x']

#字典转化为元组
dic1={'a':1,'b':2}
dic2={'x':3,'y':(4,5)}
#Point(**dic1) 转化不成功，键不对应
Point(**dic2)

采用映射的方式_make

websites = [
    ('Sohu', 'http://www.google.com/', u'张朝阳'),
    ('Sina', 'http://www.sina.com.cn/', u'王志东'),
    ('163', 'http://www.163.com/', u'丁磊')
]
 
Website = namedtuple('Website', ['name', 'url', 'founder'])
 
for website in websites:
    website_ = Website._make(website)
    print(website_)

heapq 模块（优先队列）

一种著名的数据结构是堆（heap），它是一种优先队列。优先队列让你能够以任意顺序添加对象，并随时（可能是在两次添加对象之间）找出（并删除）最小的元素。相比于列表方法min，这样做的效率要高得多。
实际上，Python没有独立的堆类型，而只有一个包含一些堆操作函数的模块。这个模块名为heapq（其中的q表示队列），它包含6个函数，其中前4个与堆操作直接相关。必须使用列表来表示堆对象本身。

模块heapq中一些重要的函数

函数	描述
heappush(heap,x)	将x压入堆中
heappop(heap)	从堆中弹出最小的元素
heapify(heap,x)	让列表具备堆特征
heapreplace(heap,x)	弹出最小的元素，并将x压入堆中
nlarest(n,iter)	返回iter中n个最大的元素
nsmallest(n,iter)	返回iter中n个最小的元素

heappush

注：函数heappush用于在堆中添加一个元素。请注意，不能将它用于普通列表，而只能用于使用各种堆函数创建的列表。原因是元素的顺序很重要（虽然元素的排列顺序看起来有点随意，并没有严格地排序）

from heapq import * 
from random import shuffle  #shuffle洗牌

data=list(range(10))
shuffle(data)   #shuffle洗牌，打乱列表的顺序，返回None
print(data)    
heap=[]
for n in data:
    heappush(heap,n)
print(heap)

#[0, 7, 3, 1, 8, 5, 2, 6, 9, 4]
#[0, 1, 2, 6, 4, 5, 3, 7, 9, 8]

注：元素的排列顺序并不像看起来那么随意。它们虽然不是严格排序的，但必须保证一点：位置i处的元素总是大于位置i // 2处的元素（反过来说就是小于位置2 * i和2 * i + 1处的元素）。这是底层堆算法的基础，称为堆特征（heap property）

heappop

heappop(heap)
#0
heappop(heap)
#1
heappop(heap)
#2

heapify

函数heapify通过执行尽可能少的移位操作将列表变成合法的堆（即具备堆特征）。如果你的堆并不是使用heappush创建的，应在使用heappush和heappop之前使用这个函数。

heap=[4,5,6,1,2,3]
heapify(heap)  #先堆化
heappop(heap)

heapreplace

它从堆中弹出最小的元素，再压入一个新元素。相比于依次执行函数heappop和heappush，这个函数的效率更高。

heap=[4,5,6,1,2,3]
heapify(heap)
heapreplace(heap,0.5)
heap

nlargest与nsmallest

分别用于找出可迭代对象iter中最大和最小的n个元素。这种任务也可通过先排序（如使用函数sorted）再切片来完成，但堆算法的速度更快，使用的内存更少（而且使用起来也更容易）。

heap=[4,5,6,1,2,3]
heapify(heap)
nlargest(3,heap)    #[6, 5, 4]

nsmallest(4,heap)   #[1, 2, 3, 4]

panbaoran913

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
1
评论
【collections】模块——defaultdict,orderedDict,deque,counter,namedtuple

collections模块中的数据结构Python中内置了4种数据类型，包括：list，tuple，set，dict，这些数据类型都有其各自的特点，但是这些特点（比如dict无序）在一定程度上对数据类型的使用产生了约束，在某些使用场景下效率会比较低，比如有时候我们可能需要维护一个有序的字典等情况。在这种场景下我们可以使用Python内建的collections模块，它包括了很多有用的集合类，合理的使用可以提高我们代码的运行效率。调用collections模块：from collections imp
复制链接

扫一扫