1.下载 xpath
2.修改后缀名为xpath.zip
3.将xpath.zip文件拖入浏览器扩展程序中
4.打开一个网站后按 ctrl + shift + x来打开xpath插件
5.安装lxml库
解析本地文件:
html_tree = etree.path('xx.html')
解析服务器响应文件
html_tree = etree.HTML(response.read().decode('utf-8'))
html_tree.xpath(xpath路径)
xpath基本语法:
1.路径查询:
//:查询所有子孙节点,不考虑层级关系
/:找直接字子节点
2.谓词查询
//div[@id]
//div[@id='''maincontent']
3.属性查询
//[@class]
4.模糊查询
//div[contains(@id,"he")]
//div[starts-with(@id,"the")]
5.内容查询
//div/h1/text()
6.逻辑运算
//div[@id='''head' and @class="s_down"]
//title | //price