groupby 本身是一个迭代器,每次迭代出的元组第二个元素是生成器,所以它有个设计约束,必须要顺序迭代展开每个生成器,像下面这样。
In [1]: from operator import itemgetter
In [2]: from itertools import groupby
In [3]: d1={'name':'zhangsan','age':20,'country':'China'}
...: d2={'name':'wangwu','age':19,'country':'USA'}
...: d3={'name':'lisi','age':22,'country':'JP'}
...: d4={'name':'zhaoliu','age':22,'country':'USA'}
...: d5={'name':'pengqi','age':22,'country':'USA'}
...: d6={'name':'lijiu','age':22,'country':'China'}
...: lst = [d1, d2, d3, d4, d5, d6]
In [4]: lstg = groupby(lst, key=itemgetter('country'))
...: for k, gs in lstg:
...: print(k)
...: for g in gs:
...: print(g)
...:
...:
China
{'name': 'zhangsan', 'age': 20, 'country': 'China'}
USA
{'name': 'wangwu', 'age': 19, 'country': 'USA'}
JP
{'name': 'lisi', 'age': 22, 'country': 'JP'}
USA
{'name': 'zhaoliu', 'age': 22, 'country': 'USA'}
{'name': 'pengqi', 'age': 22, 'country': 'USA'}
China
{'name': 'lijiu', 'age': 22, 'country': 'China'}
再看不按顺序迭代的情况
In [7]: lstg = groupby(lst, key=itemgetter('country'))
...: k1, gs1 = next(lstg)
...: k2, gs2 = next(lstg)
...: list(gs2)
...:
...:
...:
...:
Out[7]: [{'name': 'wangwu', 'age': 19, 'country': 'USA'}]
In [8]: list(gs1)
Out[8]: []
In [9]:
先迭代gs2后,gs1已经失效了,这是设计约束,这个约束是通过 _grouper的第一行达到的:
while self.id is id and self.currkey == tgtkey:
仅靠 self.currkey == tgtkey 不足以约束,如我给出的数据为例,lst 没有按 'country' 排序, 所以在迭代groupby 的对象时, 'China', 'USA' 这样的key都会两次出现。所以self.id是必须的标记。