与往常一样,numpy的解决方案基于^{}的魔力,没有循环或列表理解:records_array = array([1, 2, 3, 1, 1, 3, 4, 3, 2])
idx_sort = argsort(records_array)
sorted_records_array = records_array[idx_sort]
vals, idx_start, count = unique(sorted_records_array, return_counts=True,
return_index=True)
# sets of indices
res = split(idx_sort, idx_start[1:])
#filter them with respect to their size, keeping only items occurring more than once
vals = vals[count > 1]
res = filter(lambda x: x.size > 1, res)
编辑:下面是我之前的回答,需要更多的内存,使用numpy广播和两次调用unique:records_array = array([1, 2, 3, 1, 1, 3, 4, 3, 2])
vals, inverse, count = unique(records_array, return_inverse=True,
return_counts=True)
idx_vals_repeated = where(count > 1)[0]
vals_repeated = vals[idx_vals_repeated]
rows, cols = where(inverse == idx_vals_repeated[:, newaxis])
_, inverse_rows = unique(rows, return_index=True)
res = split(cols, inverse_rows[1:])
与预期一样res = [array([0, 3, 4]), array([1, 8]), array([2, 5, 7])]