我们先来看一个题目
给定一组字符,使其压缩,压缩后的长度必须始终小于或等于原数组长度。
示例 1:
输入:
["a","a","b","b","c","c","c"]
输出:
["a","2","b","2","c","3"]
说明:"aa"被"a2"替代。"bb"被"b2"替代。"ccc"被"c3"替代。
或者输出一个描述元组
输入:
["a","a","b","b","c","c","c"]
输出:
[("a","2"),("b","2"),("c","3")]
这里就需要使用我们今天的主角groupby分组函数,需要注意的是这个是内置库itertools里的方法
和pandas里面的groupby聚合函数的使用还是有很大区别的
我们首先看下源码
class groupby(object):
"""
groupby(iterable, key=None) -> make an iterator that returns consecutive
keys and groups from the iterable. If the key function is not specified or
is None, the element itself is used for grouping.
"""
def __getattribute__(self, *args, **kwargs): # real signature unknown
""" Return getattr(self, name). """
pass
def __init__(self, iterable, key=None): # known case of itertools.groupby.__init__
""" Initialize self. See help(type(self)) for accurate signature. """
pass
def __iter__(self, *args, **kwargs): # real signature unknown
""" Implement iter(self). """
pass
从注释可以看出 groupby(iterable, key=None),一个参数是可迭代类型数据,另外一个为key,默认为None,按照key进行分组且说明了如果key为none或者没有特定说明时,使用他自己的元素进行分组
可能说起来比较抽象,我们来看个例子
from itertools import groupby
lst = [1,2,2,3,2,7,5]
print(lst)
for k,v in groupby(lst):
print(k, list(v))
输出:
[1, 2, 2, 3, 2, 7, 5]
1 [1]
2 [2, 2]
3 [3]
2 [2]
7 [7]
5 [5]
根据这个例子,我来说明下,首先需要引入库,然后调用函数时,没有传入key,你会发现
这个可迭代类型会按照元素顺序依次取出,按照元素本身进行分组(保证每个分组里都有元素)。
我们看到结果是前一个元素为分组的条件,后一个为原可迭代类型进行分组后的结果
这里要注意
1.是分组后的结果"v",是一个迭代内容,是存到缓存的,所以需要使用类型函数接收下,才可以调用
2.结果是按照迭代类型里的顺利排列的,你会发现第五次迭代的2没有放进第二次里面
所以,这里就体现了key的作用,key其实填入的就是排序方式
from itertools import groupby
lst = [1,2,2,3,2,7,5]
print(lst)
for k,v in groupby(lst,key=lst.sort()):
print(k, list(v))
输出:
[1, 2, 2, 3, 2, 7, 5]
1 [1]
2 [2, 2, 2]
3 [3]
5 [5]
7 [7]
这里可能有人会问,这样和把列表排序后再进行分组有啥区别呢?
其实没有区别,但是这是针对一维数据,如果是多维数据的话,这个就很方便了
这样上面问题就很好解决了
from itertools import groupby
groups_first = [k for k,v in groupby(["a","a","b","b","c","c","c"])]
counts = [len(list(v)) for k,v in groupby(["a","a","b","b","c","c","c"])]
list_a = list(zip(counts,groups_first))
list_b = []
for n, h in list_a:
list_b.append(h)
list_b.append(n)
print(list_a)
print(list_b)
输出:
[(2, 'a'), (2, 'b'), (3, 'c')]
['a', 2, 'b', 2, 'c', 3]
其实这里面还有很多玩法,可以自己去试试