在Python中Beautifulsoup移除、过滤掉指定的标签

闲石观江

于 2023-06-29 12:18:02 发布

阅读量1.7k

点赞数

文章标签： python beautifulsoup 开发语言

本文链接：https://blog.csdn.net/weixin_41059258/article/details/131452653

版权

这里写自定义目录标题

语法
示例代码

语法

tag.extract()

源码中对该方法的描述是“Destructively rips this element out of the tree”，即从（Beautiful Soup）树形结构中移除tag元素。

示例代码

from bs4 import BeautifulSoup

html = '<html><body><div>Hello World!</div><div>Hello Python!</div><div id="html">Hello HTML!</div><div>Hello ' \
       'BeautifulSoup!</div></body></html> '
soup = BeautifulSoup(html, "lxml")

# 移除id名为“html”的div标签
#      常规写法
# for tag in soup.select("#html"):
#     tag.extract()

#      推导式写法
tag_lst = [tag.extract() for tag in soup.select("#html")]
print('tag_lst:\n', tag_lst, '\n')

# soup是移除了指定标签的树形结构
print('html:\n', str(soup))

运行结果：

tag_lst:
 [<div id="html">Hello HTML!</div>] 

html:
 <html><body><div>Hello World!</div><div>Hello Python!</div><div>Hello BeautifulSoup!</div></body></html>