语法
tag.extract()
源码中对该方法的描述是“Destructively rips this element out of the tree”,即从(Beautiful Soup)树形结构中移除tag元素。
示例代码
from bs4 import BeautifulSoup
html = '<html><body><div>Hello World!</div><div>Hello Python!</div><div id="html">Hello HTML!</div><div>Hello ' \
'BeautifulSoup!</div></body></html> '
soup = BeautifulSoup(html, "lxml")
# 移除id名为“html”的div标签
# 常规写法
# for tag in soup.select("#html"):
# tag.extract()
# 推导式写法
tag_lst = [tag.extract() for tag in soup.select("#html")]
print('tag_lst:\n', tag_lst, '\n')
# soup是移除了指定标签的树形结构
print('html:\n', str(soup))
运行结果:
tag_lst:
[<div id="html">Hello HTML!</div>]
html:
<html><body><div>Hello World!</div><div>Hello Python!</div><div>Hello BeautifulSoup!</div></body></html>
参考:Python中Beautifulsoup去除/过滤掉特定标签_python使用soup过滤_春风化作秋雨的博客-CSDN博客