我想从网站上抓取数据并将其写入CSV。示例代码包含“列名称”和“值”。
Number
CODE
Name
Category
Variation
3259
ABC123
LAND
3 - Design Reference
2 - Production Item
3260
XYZ453
WATER
3 Control Reference
2 Item
输出应该是我使用它时来自First'td'标签的列名
from bs4 import BeautifulSoup soup = BeautifulSoup(open('code.html'),'lxml') col = soup.find('tr').find_all('td') for c in col: print(c.get_text())
所有列都打印出来。相反,我只需要,
['Number', 'Code', 'Name']
我不需要'tr'包含“colspan” (第3个tr)
同时删除最后两列值,
tr = soup.findChildren('tr') for t in tr: td = t.findChildren('td') for child in td: print(child.text)
再次,我得到包括列和值的整个数据,我期望的是
['3259', 'ABC123', 'LAND'] and ['3260', 'XYZ453','WATER']
通过去除
['3 - Design Reference','2 - Production Item']