python lxml用法详解_python进阶（lxml的用法）

最新推荐文章于 2024-06-07 18:30:39 发布

weixin_39876450

最新推荐文章于 2024-06-07 18:30:39 发布

阅读量1.8k

点赞数

文章标签： python lxml用法详解

本节处理的文件如下，文件名为：webhtml.html

漏斗图

1111

logo

taobao

hahaha3333

taobao2

last... ...

11111111111111111111111

22222222222222222222222

一、lxml的基本知识：

①xpath路径可以放在浏览器中查看。

②string得到结果是str，/text()得到的结果是list。

③ /@属性名得到的结果也是list。

⑤ .xpath 也可以用于 etree对象筛选后的对象：

Python_tree_list = Python_tree.xpath('//div[@class="article-intro"]/ul/li/a') # [,.... ]

for i in taoBao_tree_list:

print(i.xpath('text()')) # ['Python 练习实例1'] list类型

print(i.xpath('string()')) # Python 练习实例1 str 类型

1、lxml对象的创建：

(1)通过resquests响应内容：

from lxml import etree

import requests

响应内容

responce1 = requests.get('https://www.baidu.com').content.decode('utf-8')

html_lxml = etree.HTML(responce1) 创建lxml对象

(2)打开本地文件：

2、将lxml对象序列化：

result = etree.tostring(html_lxml,pretty_print=True,encoding='utf-8').decode('utf-8')

print(result)

二、xpath语法：

1、选取节点：

--------------- 注意三者的区别 ----------------

last_div=html.xpath("//div[@class='bottom']")[0] # 搜索 class为bottom的div标签

print(last_div.xpath("span/text()")) --------- 找到 last_div 下的span标签

print(last_div.xpath("//span/text()")) --------- 搜索所有的span标签(具有全局性)

print(last_div.xpath(".//span/text()")) --------- 搜索当前(last_div下)所有的span标签

2、谓语：

//input[contains(@name,'na')] 查找name属性中包含na的所有input标签；contains( )表示包含

etree_obj.xpath("//input[contains(@name,'na')]") ------ 匹配input标签中包含属性name=na的标签

3、xpath通配符：

4、实例：

5、xpath运算符：

其中或(|)比较常用。

< >= 等运算符用于标签内容比较，如例：

| 与 and or 用法区别:

----------------- and or -------------------

print(html.xpath("//li[@class='first' or @class='last']/span/text()")) -------- or 选取属性条件的时候使用

print(html.xpath("//ul[@class='ul_list']/li[position()>2 and position()<6]/text()"))

----------------- | ------------------------

print(html.xpath("//div[@class='top']/ul/li | //ul[@class='ul_list']/li")) -------- | 两个结果的集合

6、xpath获得标签属性和标签内容：

获得是内容，而不是标签本身。

① /text() 获取多个节点下第一层节点的所有内容，不包括子节点，且结果是list。

② /@属性名：获得标签的属性，结果也是 list。

③ string 获得多个节点的第一个节点下所有节点的内容，包括子节点，结果是 str 。

# xpath 中 /text() 方法的特点

# 1、若标签下面是其他标签，没有同级的文本内容，则提取的标签内容为空，因此他不会提取下一层标签的值，所以若想要

# 提取出下一层的标签的内容，需要深入到下一层的标签中，使用/text()，数据内容是 list

print(xpath_obj.xpath('//div[@class="actcont-auto"]/text()')) --- [ ] 空值

print(xpath_obj.xpath('//div[@class="actcont-auto"]')[0].xpath('.//a[@href="author_11549103"]/text()'))

# 2、xpath 中的 /string() 方法的特点，他可以打印出当前标签下第一个标签下的所有内容，数据类型为 str

print(xpath_obj.xpath('//div[@class="actcont-auto"]')[0].xpath('string()') )

7、实例：

weixin_39876450

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
python lxml用法详解_python进阶（lxml的用法）

本节处理的文件如下，文件名为：webhtml.html漏斗图1111logotaobaohahaha3333taobao2last... ...1111111111111111111111122222222222222222222222一、lxml的基本知识：①xpath路径可以放在浏览器中查看。②string得到结果是str，/text()得到的结果是list。③ /@属性名得到的结果也是...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。