Number | CODE | Name | Category | Variation | ||
3259 | ABC123 | LAND | 3 - Design Reference | 2 - Production Item | ||
Number 3259 - Reference Number ABC123 CODE ABC123 | ||||||
3260 | XYZ453 | WATER | 3 Control Reference | 2 Item |
The output should be like当我使用这个时,第一个'td'标记中的列名from bs4 import BeautifulSoup
soup = BeautifulSoup(open('code.html'),'lxml')
col = soup.find('tr').find_all('td')
for c in col:
print(c.get_text())
所有列都打印出来了。我只需要
['Number', 'Code', 'Name']
我不需要包含“colspan”(第三个tr)
同时删除最后两个列值tr = soup.findChildren('tr')
for t in tr:
td = t.findChildren('td')
for child in td:
print(child.text)
再次,我得到了包括列和值在内的所有数据,我期望的是['3259', 'ABC123', 'LAND']
and
['3260', 'XYZ453','WATER']
通过删除
['3 - Design Reference','2 - Production Item']