Python成为专业人士笔记–Groupby 分组

最新推荐文章于 2024-04-28 21:22:27 发布

创帆云

最新推荐文章于 2024-04-28 21:22:27 发布

阅读量225

点赞数

本文链接：https://blog.csdn.net/oSuiYing12/article/details/106211758

版权

专业人士笔记”系列目录：

创帆云：Python成为专业人士笔记--强烈建议收藏！每日持续更新！

在Python中，itertools.groupby（）方法允许开发人员根据指定的key来对可迭代类的值进行分组,并可对分组后的组内数据进行组内计算

元组列表分组

在这个例子中，我们可以看到使用不同类型的iterable时会发生什么。

        #此段代码，按照元组列表中每个元素的第一个项作为key进行分组

from itertools import groupby

things = [("animal", "bear"), ("animal", "duck"), ("plant", "cactus"), ("vehicle", "harley"),
          ("vehicle", "speed boat"), ("vehicle", "school bus")]

dic = {}
f = lambda x: x[0] #匿名函数，用于传入x值并返回x元素的索引为0的元素

for key, group in groupby(sorted(things, key=f), f):
    dic[key] = list(group)

print(dic)

#输出：
 {'animal': [('animal', 'bear'), ('animal', 'duck')], 'plant': [('plant', 'cactus')], 'vehicle': [('vehicle', 'harley'), ('vehicle', 'speed boat'), ('vehicle', 'school bus')]}

下面的例子与上面的例子基本相同。惟一的区别是，我已经将所有元组更改为列表。

        from itertools import groupby

things = [["animal", "bear"], ["animal", "duck"], ["vehicle", "harley"], ["plant", "cactus"], \
 \
          ["vehicle", "speed boat"], ["vehicle", "school bus"]]

dic = {}
f = lambda x: x[0]
for key, group in groupby(sorted(things, key=f), f):
    dic[key] = list(group)
print(dic)

#输出：
{'animal': [['animal', 'bear'], ['animal', 'duck']], 'plant': [['plant', 'cactus']], 'vehicle': [['vehicle', 'harley'], ['vehicle', 'speed boat'], ['vehicle', 'school bus']]}

分组默认key键

这个例子说明了当我们分组时不指定任何key键时，程序是如何工作的：

        from itertools import groupby

a=['goat', 'dog', 'cow', 1, 1, 2, 3, 11, 10, ('persons', 'man', 'woman')]

c = groupby(a)
dic = {}
for k, v in c:
    dic[k] = list(v)

print(dic)

#输出：
{1: [1, 1],   #发现它们将以自的值作为默认分组key,其中1在列表中有两个，所以分到一个组了
 2: [2],
 3: [3],
 ('persons', 'man', 'woman'): [('persons', 'man', 'woman')],
 'cow': ['cow'],
 'dog': ['dog'],
 10: [10],
 11: [11],
 'goat': ['goat']}

请注意，元组作为一个整体作为列表中的一个键

分组排序处理

        #按首字母分组，输出字典时内部不排序看会发生什么

from itertools import groupby

list_things = ['goat', 'dog', 'donkey', 'mulato', 'cow', 'cat', ('persons', 'man', 'woman'), \
 \
               'wombat', 'mongoose', 'malloo', 'camel']
c = groupby(list_things, key=lambda x: x[0])
dic = {}
for k, v in c:
    dic[k] = list(v)
    
print(dic)

#输出：
{'c': ['camel'],
 'd': ['dog', 'donkey'],
 'g': ['goat'],
 'm': ['mongoose', 'malloo'],
 'persons': [('persons', 'man', 'woman')],
 'w': ['wombat']}

注意，在上面的代码案例中，元素“mulato”和“camel”没有出现在我们的结果中。这是因为我们是按照首字母作为结果字典的key的，如首字母为m的元素，由于遍历时后面的dict[“m”]会覆盖之前产生的结果，所以字典只显示了最新的值；你可以将中间过程打印出来以便理解

排序版本

        from itertools import groupby

list_things = ['goat', 'dog', 'donkey', 'mulato', 'cow', 'cat', ('persons', 'man', 'woman'), 'wombat', 'mongoose', 'malloo', 'camel']

sorted_list = sorted(list_things, key = lambda x: x[0])
print(sorted_list)

c = groupby(sorted_list, key=lambda x: x[0])
dic = {}
for k, v in c:
    dic[k] = list(v)

print(dic)

#先看排序后的输出：
# sorted_list =['cow', 'cat', 'camel', 'dog', 'donkey', 'goat', 'mulato', 'mongoose', 'malloo', ('persons', 'man', 'woman'), 'wombat']

#再看最终结果dic：
{'c': ['cow', 'cat', 'camel'], 'd': ['dog', 'donkey'], 'g': ['goat'], 'm': ['mulato', 'mongoose', 'malloo'], 'persons': [('persons', 'man', 'woman')], 'w': ['wombat']}

可以看到，由于我们预先对列表进行了排序，在groupby分组时，m开头的数据集是在一次迭代中计算的，因此不存在字典后面被覆盖的问题，输出的结果也是完整的

好了，今天的分享就到这里，禁止转载，违者必究！