最近在做一个项目,需要一个分类汇总功能,将一个二维列表[['a', 'b', 1], ['c', 'b', 2], ['c', 'b', 3], ['a', 'c', 4], ['a', 'c', 5]]按第二列对第三列进行分类汇总,得到[['b', 6], ['c', 9]],首先想到了利用pandas.DataFrame进行分类汇总。其代码如下所示:
from pandas import DataFrame
def add_repeat(a):
data = []
num = []
for i in a:
data.append(i[1])
num.append(i[2])
df = DataFrame({'name': data, 'num': num})
df1 = df.groupby('name')['num'].sum()
df2 = dict(df1)
key = list(df2.keys())
value = list(df2.values())
d = []
for i in range(0, len(value)):
d.append([key[i], value[i]])
return d
number_list = [['a', 'b', 1], ['c', 'b', 2], ['c', 'b', 3], ['a', 'c', 4], ['a', 'c', 5]]
numbers = add_repeat(number_list)
print(numbers)
然而, 发现当列表中的维数变多之后,会打乱顺序,得不到想要得结果,因此,自己写了一段代码,方法可能比较简单,但是可以保持顺序不变, 其代码如下所示:
def add_repeat2(a):
data = []
num = []
for i in a:
data.append(i[1])
num.append(i[2])
d = []
for j in data:
if j not in d:
d.append(j)
serials = []
for k in d:
serial = []
for j in range(0, len(data)):
if data[j] == k:
serial.append(j)
serials.append(serial)
numbers = []
for serial in serials:
number = 0
for i in serial:
number += num[i]
numbers.append(number)
number_list = []
for j in range(0, len(d)):
number_list.append([d[j], numbers[j]])
return number_list
如果哪位大神有更好的方法,欢迎指教。