python counter转dict_Python Counter() 的实现

collections.Counter 源码实现

Counter 的相关源码在lib下的collections.py里,本文所提及的源码是python2.7版本, 可参见github。

__init__

class Counter(dict):

'''Dict subclass for counting hashable items. Sometimes called a bag

or multiset. Elements are stored as dictionary keys and their counts

are stored as dictionary values.

'''

def __init__(*args, **kwds):

'''Create a new, empty Counter object. And if given, count elements

from an input iterable. Or, initialize the count from another mapping

of elements to their counts.

>>> c = Counter() # a new, empty counter

>>> c = Counter('gallahad') # a new counter from an iterable

>>> c = Counter({'a': 4, 'b': 2}) # a new counter from a mapping

>>> c = Counter(a=4, b=2) # a new counter from keyword args

'''

if not args:

raise TypeError("descriptor '__init__' of 'Counter' object "

"needs an argument")

self = args[0]

args = args[1:]

if len(args) > 1:

raise TypeError('expected at most 1 arguments, got %d' % len(args))

super(Counter, self).__init__()

self.update(*args, **kwds)

Counter 继承字典类来实现,初始化中对参数进行有效性校验,其中 args 接受除了 self 外最多一个未知参数。校验完成后调用自身的 update 方法来具体创建数据结构。

update

def update(*args, **kwds):

'''Like dict.update() but add counts instead of replacing them.

'''

if not args:

raise TypeError("descriptor 'update' of 'Counter' object "

"needs an argument")

self = args[0]

args = args[1:]

if len(args) > 1:

raise TypeError('expected at most 1 arguments, got %d' % len(args))

iterable = args[0] if args else None

if iterable is not None:

if isinstance(iterable, Mapping):

if self:

self_get = self.get

for elem, count in iterable.iteritems():

self[elem] = self_get(elem, 0) + count

else:

super(Counter, self).update(iterable) # fast path when counter is empty

else:

self_get = self.get

for elem in iterable:

self[elem] = self_get(elem, 0) + 1

if kwds:

self.update(kwds)

update 方法先检查参数,位置参数除了self外只允许有一个。然后对传入的参数进行判断,如果是以 Counter(a=1,b=2) 的方式调用的,这时候取出 kwds({'a':1,'b'=2}) 再调用自身,将关键字参数转化为位置参数处理。

如果传入的位置参数是一个mapping类型的,对应于 Counter({'a':1,'b':2}) 这样的方式调用,这种情况会判断self是否为空,在初始化状态下self总是空的,这边加上判断是因为update 方法不仅近在 __init__() 里调用,还可以这样调用:

x1 = collections.Counter({'a': 1, 'b': 2})

x2 = collections.Counter(a=1, b=2)

x1.update(x2) # Counter()类型 isinstance(iterable, Mapping) 也返回 True

# 或者这样调用

x1 = collections.Counter({'a': 1, 'b': 2})

x1.update('aab')

如果传入的不是一个mapping类型,那么会迭代该参数的每一项作为key添加到Counter中

most_common

def most_common(self, n=None):

'''List the n most common elements and their counts from the most

common to the least. If n is None, then list all element counts.

>>> Counter('abcdeabcdabcaba').most_common(3)

[('a', 5), ('b', 4), ('c', 3)]

'''

# Emulate Bag.sortedByCount from Smalltalk

if n is None:

return sorted(self.iteritems(), key=_itemgetter(1), reverse=True)

return _heapq.nlargest(n, self.iteritems(), key=_itemgetter(1))

如果调用 most_common 不指定参数n则默认返回全部(key, value)组成的列表,按照value降序排列。

itemgetter

这里用到了有趣的 itemgetter(代码里用了别名_itemgetter) , 它是来自 operator 模块中的方法,可以从下面的代码感受一下:

# 例子来源python文档

# 举例:

After f = itemgetter(1), the call f(r) returns r[1].

After g = itemgetter(2, 5, 3), the call g(r) returns (r[2], r[5], r[3]).

# 实现:

def itemgetter(*items):

if len(items) == 1:

item = items[0]

def g(obj):

return obj[item]

else:

def g(obj):

return tuple(obj[item] for item in items)

return g

# 常见用法:

>>> itemgetter(1)('ABCDEFG')

'B'

>>> itemgetter(1,3,5)('ABCDEFG')

('B', 'D', 'F')

>>> itemgetter(slice(2,None))('ABCDEFG')

'CDEFG'

>>> inventory = [('apple', 3), ('banana', 2), ('pear', 5), ('orange', 1)]

>>> getcount = itemgetter(1)

>>> map(getcount, inventory)

[3, 2, 5, 1]

>>> sorted(inventory, key=getcount)

[('orange', 1), ('banana', 2), ('apple', 3), ('pear', 5)]

heapquue

heap queue是“queue algorithm”算法的python实现,调用 _heapq.nlargest() 返回了根据每个value排序前n个大的(key, value)元组组成的列表。具体heap queue使用参见文档。

elements

elements 方法实现了按照value的数值重复返回key。它的实现很精妙,只有一行:

def elements(self):

'''Iterator over elements repeating each as many times as its count.

>>> c = Counter('ABCABC')

>>> sorted(c.elements())

['A', 'A', 'B', 'B', 'C', 'C']

'''

return _chain.from_iterable(_starmap(_repeat, self.iteritems()))

该实现里用到了 itertools 里的 repeat starmap chain 三个方法, 直接按照每项计数的次数重复返回每项内容,拼成一个列表。

repeat

repeat生成一个迭代器,根据第二个参数不停滴返回接受的第一个参数。直接看实现,很好理解, 类似实现如下:

def repeat(object, times=None):

# repeat(10, 3) --> 10 10 10

if times is None:

while True:

yield object

else:

for i in xrange(times):

yield object

starmap

starmap接受的第一个参数是一个函数,生成一个迭代器,不停滴将该函数以第二个参数传来的每一项为参数进行调用(说得抽象,看例子好理解),类似实现如下:

def starmap(function, iterable):

# starmap(pow, [(2,5), (3,2), (10,3)]) --> 32 9 1000

for args in iterable:

yield function(*args)

chain.from_iterable

chain.from_iterable 接受一个可迭代对象,返回一个迭代器,不停滴返回可迭代对象的每一项,类似实现如下:

def from_iterable(iterables):

# chain.from_iterable(['ABC', 'DEF']) --> A B C D E F

for it in iterables:

for element in it:

yield element

substract

substract的实现和update实现很像,不同之处在counter()相同的项的计数相加改成了相减。

def subtract(*args, **kwds):

'''Like dict.update() but subtracts counts instead of replacing them.

Counts can be reduced below zero. Both the inputs and outputs are

allowed to contain zero and negative counts.

Source can be an iterable, a dictionary, or another Counter instance.

>>> c = Counter('which')

>>> c.subtract('witch') # subtract elements from another iterable

>>> c.subtract(Counter('watch')) # subtract elements from another counter

>>> c['h'] # 2 in which, minus 1 in witch, minus 1 in watch

0

>>> c['w'] # 1 in which, minus 1 in witch, minus 1 in watch

-1

'''

if not args:

raise TypeError("descriptor 'subtract' of 'Counter' object "

"needs an argument")

self = args[0]

args = args[1:]

if len(args) > 1:

raise TypeError('expected at most 1 arguments, got %d' % len(args))

iterable = args[0] if args else None

if iterable is not None:

self_get = self.get

if isinstance(iterable, Mapping):

for elem, count in iterable.items():

self[elem] = self_get(elem, 0) - count

else:

for elem in iterable:

self[elem] = self_get(elem, 0) - 1

if kwds:

self.subtract(kwds)

**

+, -, &, |

通过对 __add__, __sub__, __or__, __and__ 的定义,重写了 +, -, &, | ,实现了Counter间类似于集合的操作, 代码不难理解,值得注意的是,将非正的结果略去了:

def __add__(self, other):

'''Add counts from two counters.

>>> Counter('abbb') + Counter('bcc')

Counter({'b': 4, 'c': 2, 'a': 1})

'''

if not isinstance(other, Counter):

return NotImplemented

result = Counter()

for elem, count in self.items():

newcount = count + other[elem]

if newcount > 0:

result[elem] = newcount

for elem, count in other.items():

if elem not in self and count > 0:

result[elem] = count

return result

def __sub__(self, other):

''' Subtract count, but keep only results with positive counts.

>>> Counter('abbbc') - Counter('bccd')

Counter({'b': 2, 'a': 1})

'''

if not isinstance(other, Counter):

return NotImplemented

result = Counter()

for elem, count in self.items():

newcount = count - other[elem]

if newcount > 0:

result[elem] = newcount

for elem, count in other.items():

if elem not in self and count < 0:

result[elem] = 0 - count

return result

def __or__(self, other):

'''Union is the maximum of value in either of the input counters.

>>> Counter('abbb') | Counter('bcc')

Counter({'b': 3, 'c': 2, 'a': 1})

'''

if not isinstance(other, Counter):

return NotImplemented

result = Counter()

for elem, count in self.items():

other_count = other[elem]

newcount = other_count if count < other_count else count

if newcount > 0:

result[elem] = newcount

for elem, count in other.items():

if elem not in self and count > 0:

result[elem] = count

return result

def __and__(self, other):

''' Intersection is the minimum of corresponding counts.

>>> Counter('abbb') & Counter('bcc')

Counter({'b': 1})

'''

if not isinstance(other, Counter):

return NotImplemented

result = Counter()

for elem, count in self.items():

other_count = other[elem]

newcount = count if count < other_count else other_count

if newcount > 0:

result[elem] = newcount

return result

其它

# 当用Pickler序列化时,遇到不知道怎么序列化时,查找__reduce__方法

def __reduce__(self):

return self.__class__, (dict(self),)

# 重写删除方法,当Counter有这个key再删除,避免KeyError

def __delitem__(self, elem):

'Like dict.__delitem__() but does not raise KeyError for missing values.'

if elem in self:

super(Counter, self).__delitem__(elem)

# %s : String (converts any Python object using str()).

# %r : String (converts any Python object using repr()).

def __repr__(self):

if not self:

return '%s()' % self.__class__.__name__

items = ', '.join(map('%r: %r'.__mod__, self.most_common()))

return '%s({%s})' % (self.__class__.__name__, items)

@classmethod

def fromkeys(cls, iterable, v=None):

# There is no equivalent method for counters because setting v=1

# means that no element can have a count greater than one.

raise NotImplementedError(

'Counter.fromkeys() is undefined. Use Counter(iterable) instead.')

# 实现__missing__方法,当Couter['no_field'] => 0, 字典默认的__missing__ 方法不实现会报错(KeyError)

def __missing__(self, key):

'The count of elements not in the Counter is zero.'

# Needed so that self[missing_item] does not raise KeyError

return 0

总结

总体来说,Counter通过对内置字典类型的继承重写来的实现,比较简洁,逻辑也很清楚,从源码中可以学到很多标准库里提供的很多的不常见的方法的使用,可以使代码更加简洁,思路更加流畅。

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
提供的源码资源涵盖了安卓应用、小程序、Python应用和Java应用等多个领域,每个领域都包含了丰富的实例和项目。这些源码都是基于各自平台的最新技术和标准编写,确保了在对应环境下能够无缝运行。同时,源码中配备了详细的注释和文档,帮助用户快速理解代码结构和实现逻辑。 适用人群: 这些源码资源特别适合大学生群体。无论你是计算机相关专业的学生,还是对其他领域编程感兴趣的学生,这些资源都能为你提供宝贵的学习和实践机会。通过学习和运行这些源码,你可以掌握各平台开发的基础知识,提升编程能力和项目实战经验。 使用场景及目标: 在学习阶段,你可以利用这些源码资源进行课程实践、课外项目或毕业设计。通过分析和运行源码,你将深入了解各平台开发的技术细节和最佳实践,逐步培养起自己的项目开发和问题解决能力。此外,在求职或创业过程中,具备跨平台开发能力的大学生将更具竞争力。 其他说明: 为了确保源码资源的可运行性和易用性,特别注意了以下几点:首先,每份源码都提供了详细的运行环境和依赖说明,确保用户能够轻松搭建起开发环境;其次,源码中的注释和文档都非常完善,方便用户快速上手和理解代码;最后,我会定期更新这些源码资源,以适应各平台技术的最新发展和市场需求。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值