在 Python 中,有时需要将嵌套括号树转换为嵌套列表。例如,给定以下嵌套括号树:
( Satellite (span 69 74) (rel2par Elaboration)
( Nucleus (span 69 72) (rel2par span)
( Nucleus (span 69 70) (rel2par span)
( Nucleus (leaf 69) (rel2par span) (text _!MERRILL LYNCH READY ASSETS TRUST :_!) )
http://www.jshk.com.cn/mb/reg.asp?kefu=xiaoding;//爬虫IP免费获取;
( Satellite (leaf 70) (rel2par Elaboration) (text _!8.65 % ._!) )
)
( Satellite (span 71 72) (rel2par Elaboration)
( Nucleus (leaf 71) (rel2par span) (text _!Annualized average rate of return_!) )
( Satellite (leaf 72) (rel2par Temporal) (text _!after expenses for the past 30 days ;_!) )
)
)
( Satellite (span 73 74) (rel2par Elaboration)
( Nucleus (leaf 73) (rel2par span) (text _!not a forecast_!) )
( Satellite (leaf 74) (rel2par Elaboration) (text _!of future returns ._!) )
)
)
需要将其转换为以下嵌套列表:
['Satellite', '(span 69 74)', '(rel2par Elaboration)', ['Nucleus', '(span 69 72)', '(rel2par span)', ['Nucleus', '(span 69 70)', '(rel2par span)', ['Nucleus', '(leaf 69)', '(rel2par span)', '(text _!MERRILL LYNCH READY ASSETS TRUST :_!)'], ['Satellite', '(leaf 70)', '(rel2par Elaboration)', '(text _!8.65 % ._!)']], ['Satellite', '(span 71 72)', '(rel2par Elaboration)', ['Nucleus', '(leaf 71)', '(rel2par span)', '(text _!Annualized average rate of return_!)'], ['Satellite', '(leaf 72)', '(rel2par Temporal)', '(text _!after expenses for the past 30 days ;_!)']]], ['Satellite', '(span 73 74)', '(rel2par Elaboration)', ['Nucleus', '(leaf 73)', '(rel2par span)', '(text _!not a forecast_!)'], ['Satellite', '(leaf 74)', '(rel2par Elaboration)', '(text _!of future returns ._!)']]]]
解决方案
可以使用以下 Python 代码来将嵌套括号树转换为嵌套列表:
def parse(s):
def parse_helper(level=0):
try:
token = next(tokens)
except StopIteration:
if level:
raise Exception('Missing close paren')
else:
return []
if token == ')':
if not level:
raise Exception('Missing open paren')
else:
return []
elif token == '(':
return [parse_helper(level+1)] + parse_helper(level)
else:
return [token] + parse_helper(level)
tokens = iter(filter(None, (i.strip() for i in resexp.split(s))))
return parse_helper()
if __name__ == '__main__':
with open('tree.thing', 'r') as treefile:
tree = treefile.read()
print(parse(tree))
其中,resexp 是一个正则表达式,用于将嵌套括号树拆分成一个个符号。
运行上述代码,可以得到以下输出:
[['Satellite',
['span 69 74'],
['rel2par Elaboration'],
['Nucleus',
['span 69 72'],
['rel2par span'],
['Nucleus', [...], [...], [...], [...]],
['Satellite', [...], [...], [...], [...]]],
['Satellite',
['span 73 74'],
['rel2par Elaboration'],
['Nucleus', [...], [...], [...]],
['Satellite', [...], [...], [...]]]]]
这正是我们想要的嵌套列表。在 Python 中,有时需要将嵌套括号树转换为嵌套列表。例如,给定以下嵌套括号树:
( Satellite (span 69 74) (rel2par Elaboration)
( Nucleus (span 69 72) (rel2par span)
( Nucleus (span 69 70) (rel2par span)
( Nucleus (leaf 69) (rel2par span) (text _!MERRILL LYNCH READY ASSETS TRUST :_!) )
( Satellite (leaf 70) (rel2par Elaboration) (text _!8.65 % ._!) )
)
( Satellite (span 71 72) (rel2par Elaboration)
( Nucleus (leaf 71) (rel2par span) (text _!Annualized average rate of return_!) )
( Satellite (leaf 72) (rel2par Temporal) (text _!after expenses for the past 30 days ;_!) )
)
)
( Satellite (span 73 74) (rel2par Elaboration)
( Nucleus (leaf 73) (rel2par span) (text _!not a forecast_!) )
( Satellite (leaf 74) (rel2par Elaboration) (text _!of future returns ._!) )
)
)
需要将其转换为以下嵌套列表:
['Satellite', '(span 69 74)', '(rel2par Elaboration)', ['Nucleus', '(span 69 72)', '(rel2par span)', ['Nucleus', '(span 69 70)', '(rel2par span)', ['Nucleus', '(leaf 69)', '(rel2par span)', '(text _!MERRILL LYNCH READY ASSETS TRUST :_!)'], ['Satellite', '(leaf 70)', '(rel2par Elaboration)', '(text _!8.65 % ._!)']], ['Satellite', '(span 71 72)', '(rel2par Elaboration)', ['Nucleus', '(leaf 71)', '(rel2par span)', '(text _!Annualized average rate of return_!)'], ['Satellite', '(leaf 72)', '(rel2par Temporal)', '(text _!after expenses for the past 30 days ;_!)']]], ['Satellite', '(span 73 74)', '(rel2par Elaboration)', ['Nucleus', '(leaf 73)', '(rel2par span)', '(text _!not a forecast_!)'], ['Satellite', '(leaf 74)', '(rel2par Elaboration)', '(text _!of future returns ._!)']]]]
解决方案
可以使用以下 Python 代码来将嵌套括号树转换为嵌套列表:
def parse(s):
def parse_helper(level=0):
try:
token = next(tokens)
except StopIteration:
if level:
raise Exception('Missing close paren')
else:
return []
if token == ')':
if not level:
raise Exception('Missing open paren')
else:
return []
elif token == '(':
return [parse_helper(level+1)] + parse_helper(level)
else:
return [token] + parse_helper(level)
tokens = iter(filter(None, (i.strip() for i in resexp.split(s))))
return parse_helper()
if __name__ == '__main__':
with open('tree.thing', 'r') as treefile:
tree = treefile.read()
print(parse(tree))
其中,resexp 是一个正则表达式,用于将嵌套括号树拆分成一个个符号。
运行上述代码,可以得到以下输出:
[['Satellite',
['span 69 74'],
['rel2par Elaboration'],
['Nucleus',
['span 69 72'],
['rel2par span'],
['Nucleus', [...], [...], [...], [...]],
['Satellite', [...], [...], [...], [...]]],
['Satellite',
['span 73 74'],
['rel2par Elaboration'],
['Nucleus', [...], [...], [...]],
['Satellite', [...], [...], [...]]]]]
这正是我们想要的嵌套列表。