python筛选出某一列中重复项,在元组Python列表中查找重复项

I want to find the matching item from the below given list.My List may be super large.

The very first item in the tuple "N1_10" is duplicated and matched with another item in another array

tuple in 1st array in the ListA ('N1_10', 'N2_28')

tuple in 2nd array in the ListA ('N1_10', 'N3_98')

ListA = [[('N1_10', 'N2_28'), ('N1_35', 'N2_44')],

[('N1_22', 'N3_72'), ('N1_10', 'N3_98')],

[('N2_33', 'N3_28'), ('N2_55', 'N3_62'), ('N2_61', 'N3_37')]]

what I want for the output is

output --> [('N1_10','N2_28','N3_98') , .... and the rest whatever match one of the

key will get into same tuple]

If you guys think , changing the data structure of the ListA is better option , pls feel free to advise!

Thanks for helping out!

SIMPLIFIED VERSION

List A = [[(a,x),(b,k),(c,l),(d,m)],[(e,d),(a,p),(g,s)],[...],[...]....]

wantedOutput --> [(a,x,p),(b,k),(c,l),(d,m,e),(g,s).....]

解决方案

Update: After rereading your question, it appears that you're trying to create equivalence classes, rather than collecting values for keys. If

[[(1, 2), (3, 4), (2, 3)]]

should become

[(1, 2, 3, 4)]

, then you're going to need to interpret your input as a graph and apply a connected components algorithm. You could turn your data structure into an adjacency list representation and traverse it with a breadth-first or depth-first search, or iterate over your list and build disjoint sets. In either case, your code is going to suddenly involve a lot of graph-related complexity, and it'll be hard to provide any output ordering guarantees based on the order of the input. Here's an algorithm based on a breadth-first search:

import collections

# build an adjacency list representation of your input

graph = collections.defaultdict(set)

for l in ListA:

for first, second in l:

graph[first].add(second)

graph[second].add(first)

# breadth-first search the graph to produce the output

output = []

marked = set() # a set of all nodes whose connected component is known

for node in graph:

if node not in marked:

# this node is not in any previously seen connected component

# run a breadth-first search to determine its connected component

frontier = set([node])

connected_component = []

while frontier:

marked |= frontier

connected_component.extend(frontier)

# find all unmarked nodes directly connected to frontier nodes

# they will form the new frontier

new_frontier = set()

for node in frontier:

new_frontier |= graph[node] - marked

frontier = new_frontier

output.append(tuple(connected_component))

Don't just copy this without understanding it, though; understand what it's doing, or write your own implementation. You'll probably need to be able to maintain this. (I would've used pseudocode, but Python is practically as simple as pseudocode already.)

In case my original interpretation of your question was correct, and your input is a collection of key-value pairs that you want to aggregate, here's my original answer:

Original answer

import collections

clusterer = collections.defaultdict(list)

for l in ListA:

for k, v in l:

clusterer[k].append(v)

output = clusterer.values()

defaultdict(list) is a dict that automatically creates a list as the value for any key that wasn't already present. The loop goes over all the tuples, collecting all values that match up to the same key, then creates a list of (key, value_list) pairs from the defaultdict.

(The output of this code is not quite in the form you specified, but I believe this form is more useful. If you want to change the form, that should be a simple matter.)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值