from lxml import etree wb_data = """ <div> <ul> <li class="item-0"><a href="link1.html">first item</a></li> <li class="item-1"><a href="link2.html">second item</a></li> <li class="item-inactive"><a href="link3.html">third item</a></li> <li class="item-1"><a href="link4.html">fourth item</a></li> <li class="item-0"><a href="link5.html">fifth item</a> </ul> </div> """ #解析字符串为html对象,自动补全html。body html=etree.HTML(wb_data) #解析数据,a标签的文本 #写法一:text属性 data1 = html.xpath('/html/body/div/ul/li/a') for i in data1: print(i.text) #写法二
网络爬虫——xpath使用
最新推荐文章于 2023-07-09 15:36:13 发布