xpath分别获取多个相同标签之间的全部内容,如下:
<h2>标签一</h2>
<p>xxxx</p>
<p>xxxx</p>
<p>xxxx</p>
<p>xxxx</p>
<h2>标签二</h2>
<p>xxx</p>
<p>xxx</p>
<h2>标签三</h2>
例如获取h2标签之间所有的p标签的内容。可以使用
results = con.xpath('//h2')
for result in results:
content=result.xpath('./following-sibling::*[position()<count(./following-sibling::*) - count(./preceding-sibling::h2)]//text()')
这样就可以循环获取h2标签之间的内容了。完整代码如下
results = con.xpath('//h2')
dit = {}
for result in results:
d0 = ''.join(result.xpath('.//text()')).strip()
d1 = result.xpath('./following-sibling::*')
# print(d1)
s = [p.tag for p in d1]
print(s)
try:
c = s.index('h2')
b = result.xpath(f'./following-sibling::*[position()<{c+1}]//text()')
except Exception:
b = result.xpath(f'./following-sibling::*//text()')
dit[d0]=b