我通过删除一些标签修改了一个html文件beautifulsoup.现在我想将结果写回html文件中.我的代码:
from bs4 import BeautifulSoup
from bs4 import Comment
soup = BeautifulSoup(open('1.html'),"html.parser")
[x.extract() for x in soup.find_all('script')]
[x.extract() for x in soup.find_all('style')]
[x.extract() for x in soup.find_all('meta')]
[x.extract() for x in soup.find_all('noscript')]
[x.extract() for x in soup.find_all(text=lambda text:isinstance(text, Comment))]
html =soup.contents
for i in html:
print i
html = soup.prettify("utf-8")
with open("output1.html", "wb") as file:
file.write(html)
由于我使用了soup.prettify,它会生成如下的html:
BATAM.TRIBUNNEWS.COM, BINTAN
- Tradisi pedang pora mewarnai serah terima jabatan pejabat di
Polres
Bintan
, Senin (3/10/2016).
我希望得到的结果如下print i:
BATAM.TRIBUNNEWS.COM, BINTAN - Tradisi pedang pora mewarnai serah terima jabatan pejabat di Polres Bintan, Senin (3/10/2016).
Empat perwira baru Senin itu diminta cepat bekerja. Tumpukan pekerjaan rumah sudah menanti di meja masing masing.
我怎样才能得到相同的结果print i(即标签及其内容出现在同一行)?谢谢.