7.按照full xpath提取数据时结果为空:去掉某些无意义的tbody
8.不确定tbody或者div的id时爬取内容:
from bs4 import BeautifulSoup
from lxml import etree
获取id:
soup = BeautifulSoup(all[0][we], "lxml")
# 查找所有tbody 的id
div_list = soup.find_all('tbody')
id_list = []
for i in range(len(div_list)):
id_list.append(div_list[i]['id'])
id_list = list(set(id_list))
print(id_list)
爬取数据:
etree_html = etree.HTML(all[0][we])
sel = '/html/body/table/thead/tr/th/text()'
find = etree_html.xpath(sel)
print(find)
python 按照xpath(full xpath)爬取数据
最新推荐文章于 2023-12-15 21:53:28 发布