Python Collections Counter

Python Collections Counter

Overview of the Collections Module

The Collections module implements high-performance container datatypes (beyond 
the built-in types list, dict and tuple) and contains many useful data structures
that you can use to store information in memory. 

This article will be about the Counter object.

Counter

A Counter is a container that tracks how many times equivalent values are added. 

It can be used to implement the same algorithms for which other languages commonly
use bag or multiset data structures.

Importing the module

Import collections makes the stuff in collections available as:
collections.something
import collections
Since we are only going to use the Counter, we can simply do this:
from collections import Counter

Initializing

Counter supports three forms of initialization. 

Its constructor can be called with a sequence of items (iterable), a dictionary
containing keys and counts (mapping, or using keyword arguments mapping string
names to counts (keyword args).
import collections

print collections.Counter(['a', 'b', 'c', 'a', 'b', 'b'])

print collections.Counter({'a':2, 'b':3, 'c':1})

print collections.Counter(a=2, b=3, c=1)
The results of all three forms of initialization are the same.
$ python collections_counter_init.py

Counter({'b': 3, 'a': 2, 'c': 1})
Counter({'b': 3, 'a': 2, 'c': 1})
Counter({'b': 3, 'a': 2, 'c': 1})

Create and Update Counters

An empty Counter can be constructed with no arguments and populated via the 
update() method.
import collections

c = collections.Counter()
print 'Initial :', c

c.update('abcdaab')
print 'Sequence:', c

c.update({'a':1, 'd':5})
print 'Dict    :', c
The count values are increased based on the new data, rather than replaced. 

In this example, the count for a goes from 3 to 4.
$ python collections_counter_update.py

Initial : Counter()

Sequence: Counter({'a': 3, 'b': 2, 'c': 1, 'd': 1})

Dict    : Counter({'d': 6, 'a': 4, 'b': 2, 'c': 1})

Accessing Counters

Once a Counter is populated, its values can be retrieved using the dictionary API.
import collections

c = collections.Counter('abcdaab')

for letter in 'abcde':
    print '%s : %d' % (letter, c[letter])
Counter does not raise KeyError for unknown items. 

If a value has not been seen in the input (as with e in this example), 
its count is 0.
$ python collections_counter_get_values.py

a : 3
b : 2
c : 1
d : 1
e : 0

Elements

The elements() method returns an iterator over elements repeating each as many
times as its count. 

Elements are returned in arbitrary order.
import collections

c = collections.Counter('extremely')

c['z'] = 0

print c

print list(c.elements())
The order of elements is not guaranteed, and items with counts less than zero are
not included.
$ python collections_counter_elements.py

Counter({'e': 3, 'm': 1, 'l': 1, 'r': 1, 't': 1, 'y': 1, 'x': 1, 'z': 0})
['e', 'e', 'e', 'm', 'l', 'r', 't', 'y', 'x']

Most_Common

Use most_common() to produce a sequence of the n most frequently encountered
input values and their respective counts.
import collections

c = collections.Counter()
with open('/usr/share/dict/words', 'rt') as f:
    for line in f:
        c.update(line.rstrip().lower())

print 'Most common:'
for letter, count in c.most_common(3):
    print '%s: %7d' % (letter, count)
This example counts the letters appearing in all of the words in the system
dictionary to produce a frequency distribution, then prints the three most common
letters. 

Leaving out the argument to most_common() produces a list of all the items,
in order of frequency.
$ python collections_counter_most_common.py

Most common:
e:  234803
i:  200613
a:  198938

Arithmetic

Counter instances support arithmetic and set operations for aggregating results.
import collections

c1 = collections.Counter(['a', 'b', 'c', 'a', 'b', 'b'])
c2 = collections.Counter('alphabet')

print 'C1:', c1
print 'C2:', c2

print '
Combined counts:'
print c1 + c2

print '
Subtraction:'
print c1 - c2

print '
Intersection (taking positive minimums):'
print c1 & c2

print '
Union (taking maximums):'
print c1 | c2
Each time a new Counter is produced through an operation, any items with zero or
negative counts are discarded. 

The count for a is the same in c1 and c2, so subtraction leaves it at zero.
$ python collections_counter_arithmetic.py

C1: Counter({'b': 3, 'a': 2, 'c': 1})
C2: Counter({'a': 2, 'b': 1, 'e': 1, 'h': 1, 'l': 1, 'p': 1, 't': 1})

#Combined counts:
Counter({'a': 4, 'b': 4, 'c': 1, 'e': 1, 'h': 1, 'l': 1, 'p': 1, 't': 1})

#Subtraction:
Counter({'b': 2, 'c': 1})

#Intersection (taking positive minimums):
Counter({'a': 2, 'b': 1})

#Union (taking maximums):
Counter({'b': 3, 'a': 2, 'c': 1, 'e': 1, 'h': 1, 'l': 1, 'p': 1, 't': 1})

Counting words

Tally occurrences of words in a list. 
cnt = Counter()

for word in ['red', 'blue', 'red', 'green', 'blue', 'blue']:
    cnt[word] += 1

print cnt

Counter({'blue': 3, 'red': 2, 'green': 1})
The counter takes an iterable and could also be written like this:
mywords = ['red', 'blue', 'red', 'green', 'blue', 'blue']

cnt = Counter(mywords)

print cnt

Counter({'blue': 3, 'red': 2, 'green': 1})

Find the most common words

Find the ten most common words in Hamlet
import re

words = re.findall('w+', open('hamlet.txt').read().lower())

print Counter(words).most_common(10)

[('the', 1143), ('and', 966), ('to', 762), ('of', 669), ('i', 631),
 ('you', 554),  ('a', 546), ('my', 514), ('hamlet', 471), ('in', 451)]
Sources
Please don't forget to read the links below for more information.
http://www.doughellmann.com/PyMOTW/collections/
http://docs.python.org/2/library/collections.html#collections.Counter

 Counter objects

A counter tool is provided to support convenient and rapid tallies. For example:

>>>
>>> # Tally occurrences of words in a list
>>> cnt = Counter()
>>> for word in ['red', 'blue', 'red', 'green', 'blue', 'blue']:
...     cnt[word] += 1
>>> cnt
Counter({'blue': 3, 'red': 2, 'green': 1})

>>> # Find the ten most common words in Hamlet
>>> import re
>>> words = re.findall(r'\w+', open('hamlet.txt').read().lower())
>>> Counter(words).most_common(10)
[('the', 1143), ('and', 966), ('to', 762), ('of', 669), ('i', 631),
 ('you', 554),  ('a', 546), ('my', 514), ('hamlet', 471), ('in', 451)]
class  collections. Counter ( [ iterable-or-mapping ] )

Counter is a dict subclass for counting hashable objects. It is an unordered collection where elements are stored as dictionary keys and their counts are stored as dictionary values. Counts are allowed to be any integer value including zero or negative counts. The Counter class is similar to bags or multisets in other languages.

Elements are counted from an iterable or initialized from another mapping (or counter):

>>>
>>> c = Counter()                           # a new, empty counter
>>> c = Counter('gallahad')                 # a new counter from an iterable
>>> c = Counter({'red': 4, 'blue': 2})      # a new counter from a mapping
>>> c = Counter(cats=4, dogs=8)             # a new counter from keyword args

Counter objects have a dictionary interface except that they return a zero count for missing items instead of raising a KeyError:

>>>
>>> c = Counter(['eggs', 'ham'])
>>> c['bacon']                              # count of a missing element is zero
0

Setting a count to zero does not remove an element from a counter. Use del to remove it entirely:

>>>
>>> c['sausage'] = 0                        # counter entry with a zero count
>>> del c['sausage']                        # del actually removes the entry

New in version 2.7.

Counter objects support three methods beyond those available for all dictionaries:

elements ( )

Return an iterator over elements repeating each as many times as its count. Elements are returned in arbitrary order. If an element’s count is less than one, elements() will ignore it.

>>>
>>> c = Counter(a=4, b=2, c=0, d=-2)
>>> list(c.elements())
['a', 'a', 'a', 'a', 'b', 'b']
most_common ( [ n ] )

Return a list of the n most common elements and their counts from the most common to the least. If n is omitted or None,most_common() returns all elements in the counter. Elements with equal counts are ordered arbitrarily:

>>>
>>> Counter('abracadabra').most_common(3)
[('a', 5), ('r', 2), ('b', 2)]
subtract ( [ iterable-or-mapping ] )

Elements are subtracted from an iterable or from another mapping (or counter). Like dict.update() but subtracts counts instead of replacing them. Both inputs and outputs may be zero or negative.

>>>
>>> c = Counter(a=4, b=2, c=0, d=-2)
>>> d = Counter(a=1, b=2, c=3, d=4)
>>> c.subtract(d)
>>> c
Counter({'a': 3, 'b': 0, 'c': -3, 'd': -6})

The usual dictionary methods are available for Counter objects except for two which work differently for counters.

fromkeys ( iterable )

This class method is not implemented for Counter objects.

update ( [ iterable-or-mapping ] )

Elements are counted from an iterable or added-in from another mapping (or counter). Like dict.update() but adds counts instead of replacing them. Also, the iterable is expected to be a sequence of elements, not a sequence of (key, value) pairs.

Common patterns for working with Counter objects:

sum(c.values())                 # total of all counts
c.clear()                       # reset all counts
list(c)                         # list unique elements
set(c)                          # convert to a set
dict(c)                         # convert to a regular dictionary
c.items()                       # convert to a list of (elem, cnt) pairs
Counter(dict(list_of_pairs))    # convert from a list of (elem, cnt) pairs
c.most_common()[:-n-1:-1]       # n least common elements
c += Counter()                  # remove zero and negative counts

Several mathematical operations are provided for combining Counter objects to produce multisets (counters that have counts greater than zero). Addition and subtraction combine counters by adding or subtracting the counts of corresponding elements. Intersection and union return the minimum and maximum of corresponding counts. Each operation can accept inputs with signed counts, but the output will exclude results with counts of zero or less.

>>>
>>> c = Counter(a=3, b=1)
>>> d = Counter(a=1, b=2)
>>> c + d                       # add two counters together:  c[x] + d[x]
Counter({'a': 4, 'b': 3})
>>> c - d                       # subtract (keeping only positive counts)
Counter({'a': 2})
>>> c & d                       # intersection:  min(c[x], d[x])
Counter({'a': 1, 'b': 1})
>>> c | d                       # union:  max(c[x], d[x])
Counter({'a': 3, 'b': 2})

Note

 

Counters were primarily designed to work with positive integers to represent running counts; however, care was taken to not unnecessarily preclude use cases needing other types or negative values. To help with those use cases, this section documents the minimum range and type restrictions.

  • The Counter class itself is a dictionary subclass with no restrictions on its keys and values. The values are intended to be numbers representing counts, but you could store anything in the value field.
  • The most_common() method requires only that the values be orderable.
  • For in-place operations such as c[key] += 1, the value type need only support addition and subtraction. So fractions, floats, and decimals would work and negative values are supported. The same is also true for update() and subtract() which allow negative and zero values for both inputs and outputs.
  • The multiset methods are designed only for use cases with positive values. The inputs may be negative or zero, but only outputs with positive values are created. There are no type restrictions, but the value type needs to support addition, subtraction, and comparison.
  • The elements() method requires integer counts. It ignores zero and negative counts.

 
 
 
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值