首先,我将示例数据转换为有效的html页面并对其进行了预打印。这样就更容易看到发生了什么:
Thomas A /Dumpling/
|
然后调换一下你的程序:
^{pr2}$
只留下实际的解析代码def get_string(node, default=''):
if node:
return ', '.join(node.stripped_strings)
else:
return default
def get_data(td_princ):
name = get_string(td_princ.find('span', {'class':'person-link'})).replace('/', '')
birth = hired = '(missing)'
for event in td_princ.find('table', {'class': 'events'}).findAll('tr'):
cnt = [get_string(cell) for cell in event.findAll('td')]
if len(cnt) == 2:
if cnt[0] == "event1:":
birth = cnt[1]
elif cnt[0] == "event2:":
hired = cnt[1]
return (name, birth, hired)
当对示例数据运行时,会生成一个csv文件Name,Born,Hired
Thomas A Dumpling,4 February 1940,"9 October 2002, Laplata, Md"