##示例1:去除script
#! /usr/bin/env python
# -*- coding: utf-8 -*-
from BeautifulSoup import BeautifulSoup
html = '''
baba
hi, world
'''
soup = BeautifulSoup('baba
')
[s.extract() for s in soup('script')]
print soup
输出:
baba
可以使用这种方法去除其他标签、以及其中内容。
也可以将
[s.extract() for s in soup('script')]
替换为:
[s.extract() for s in soup.findAll('script')]
##示例2:去除注释
#! /usr/bin/env python
# -*- coding: utf-8 -*-
from BeautifulSoup import BeautifulSoup, Comment
data = """
cat dog sheep goat
soup = BeautifulSoup(data)
for element in soup(text=lambda text: isinstance(text, Comment)):
element.extract()
print soup.prettify()
输出结果:
cat dog sheep goat