XPath即为XML路径语言(XML Path Language),它是一种用来确定XML文档中某部分位置的语言。
xpath基本语法:
- 1.路径查询 //:查找所有子孙节点,不考虑层级关系 / :找直接子节点
- 2.谓词查询 //div[@id] //div[@id=“maincontent”]
- 3.属性查询 //@class
- 4.模糊查询 //div[contains(@id, “he”)] //div[starts‐with(@id, “he”)]
- 5.内容查询 //div/h1/text()
- 6.逻辑运算 //div[@id=“head” and @class=“s_down”] //title | //price
代码示例如下:
import requests
from lxml import etree
url = "https://www.qbiqu.com/0_1/1.html"
response = requests.get(url)
response.encoding = 'gbk'
html = etree.HTML(response.text)
title = html.xpath('//div[@class="bookname"]/h1/text()')[0]
print(title)