(python)Xpath如何提取html标签（HTML标签和内容）

最新推荐文章于 2024-07-16 23:41:48 发布

fishineye

最新推荐文章于 2024-07-16 23:41:48 发布

阅读量2.9k

点赞数

分类专栏： Python

本文链接：https://blog.csdn.net/fishineye/article/details/106926153

版权

Python 专栏收录该内容

38 篇文章 1 订阅

订阅专栏

问题： (python)Xpath如何提取html标签（HTML标签和内容）
描述:

<div>
<table>
<tr>
<td>Row value 1</td>
<td>Row value 2</td>
</tr>
<tr>
<td>Row value 3</td>
<td>Row value 4</td>
</tr>
<tr>
<td>Row value 1</td>
<td>Row value 1</td>
</tr>
</table>
</div>

如何把table标签提取出来，结果如下：

<table>
<tr>
<td>Row value 1</td>
<td>Row value 2</td>
</tr>
<tr>
<td>Row value 3</td>
<td>Row value 4</td>
</tr>
<tr>
<td>Row value 1</td>
<td>Row value 1</td>
</tr>
</table>

代码如下：

selector = etree.HTML(html)
content = selector.xpath('//div/table')[0]
print(content)
# <Element div at 0x1bce7463548>
# 即：如何将Element对象转成str类型

解决方案1:

BeautifulSoup的find

解决方案2:

from lxml.html import fromstring, tostring
# fromstring返回一个HtmlElement对象
# selector = fromstring(html)

selector = etree.HTML(html)
content = selector.xpath('//div/table')[0]
print(content)
# tostring方法即可返回原始html标签
original_html = tostring(content)

解决方案3:

[div/table]就行吧貌似

解决方案4

from lxml import etree
div = etree.HTML(html)
table = div.xpath('//div/table')[0]
content = etree.tostring(table,print_pretty=True, method='html') # 转为字符串

以上介绍了“ (python)Xpath如何提取html标签（HTML标签和内容）”的问题解答，希望对有需要的网友有所帮助。
本文网址链接：http://www.codes51.com/itwd/4510100.html