树的子结构 python,从python树表示中提取父节点和子节点

[Tree('ROOT', [Tree('S', [Tree('INTJ', [Tree('UH', ['Hello'])]), Tree(',', [',']), Tree('NP', [Tree('PRP$', ['My']), Tree('NN', ['name'])]), Tree('VP', [Tree('VBZ', ['is']), Tree('ADJP', [Tree('JJ', ['Melroy'])])]), Tree('.', ['.'])])]), Tree('ROOT', [Tree('SBARQ', [Tree('WHNP', [Tree('WP', ['What'])]), Tree('SQ', [Tree('VBZ', ['is']), Tree('NP', [Tree('PRP$', ['your']), Tree('NN', ['name'])])]), Tree('.', ['?'])])])]

I have many of these strings available in Python, which are actually tree representations. I want to extract the parent and child node for every word, e.g. for 'Hello' I want (INTJ, UH), and for 'My' it is (NP, PRP$).

This is the outcome I want:

(INTJ, UH) , (NP, PRP$), (NP, NN) , (VP, VBZ) , (VP , VPZ) , (ADJP, JJ) , (WHNP, WP), (SQ, VBZ), (NP, PRP$), (NP, NN)

How can I do that?

解决方案

Your string is obviously the representation of a list of Tree objects. It would be much better if you had access to, or could reconstruct in some other way, that list – if not, the most straightforward way to create a data structure you can work with is eval() (with all the usual caveats about calling eval() on user-supplied data).

Since you don't say anything about your Tree class, I'll write a simple one that suffices for the purposes of this question:

class Tree:

def __init__(self, name, branches):

self.name = name

self.branches = branches

Now we can recreate your data structure:

data = eval("""[Tree('ROOT', [Tree('S', [Tree('INTJ', [Tree('UH', ['Hello'])]), Tree(',', [',']), Tree('NP', [Tree('PRP$', ['My']), Tree('NN', ['name'])]), Tree('VP', [Tree('VBZ', ['is']), Tree('ADJP', [Tree('JJ', ['Melroy'])])]), Tree('.', ['.'])])]), Tree('ROOT', [Tree('SBARQ', [Tree('WHNP', [Tree('WP', ['What'])]), Tree('SQ', [Tree('VBZ', ['is']), Tree('NP', [Tree('PRP$', ['your']), Tree('NN', ['name'])])]), Tree('.', ['?'])])])]""")

Once we have that, we can write a function that produces the list of 2-tuples you want:

def tails(items, path=()):

for item in items:

if isinstance(item, Tree):

if item.name in {".", ","}: # ignore punctuation

continue

for result in tails(item.branches, path + (item.name,)):

yield result

else:

yield path[-2:]

This function descends recursively into the tree, yielding the last two Tree names each time it hits an appropriate leaf node.

Example use:

>>> list(tails(data))

[('INTJ', 'UH'), ('NP', 'PRP$'), ('NP', 'NN'), ('VP', 'VBZ'), ('ADJP', 'JJ'), ('WHNP', 'WP'), ('SQ', 'VBZ'), ('NP', 'PRP$'), ('NP', 'NN')]

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值