python修改html文件_如何保存回在Python用BeautifulSoup HTML文件所做的更改?

Python noob here...

I have the script below, which modifies the hrefs for a html file (in the future it will be a list of HTML files in a directory). Using beautifulSoup I managed to access the tag values and modify it as I want but I don't know how to save back the changes made to the file. Any help will be greatly appreciated.

import os

import re

from bs4 import BeautifulSoup

htmlDoc = open('adding_computer_c.html',"r+")

soup = BeautifulSoup(htmlDoc)

replacements= [ ('_', '-'), ('../tasks/', prefixUrl), ('../concepts/', prefixUrl) ]

for link in soup.findAll('a', attrs={'href': re.compile("../")}):

newlink=str(link)

for k, v in replacements:

newlink = newlink.replace(k, v)

extrachars=newlink[newlink.find("."):newlink.find(">")]

newlink=newlink.replace(extrachars,'')

link=newlink

print(link)

##How do I save the link I have modified back to the HTML file?

print(soup)##prints the original html tree

htmlDoc.close()

解决方案newlink = link['href']

# .. make replacements

link['href'] = newlink # store it back

Now print(soup.prettify()) will show changed links. To save the changes to a file:

htmlDoc.close()

html = soup.prettify("utf-8")

with open("output.html", "wb") as file:

file.write(html)

To preserve original character encoding of the document, you could use soup.original_encoding instead of "utf-8". See Encodings.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值