我得到一个建议,使用beauthoulsoup从HTML中删除具有特定id的标记。例如,删除下面的
from bs4 import BeautifulSoup
cwd = os.getcwd()
print ('Now you are at this directory: \n' + cwd)
# find files that have an extension with HTML
Files = os.listdir(cwd)
print Files
def func(file):
for file in os.listdir(cwd):
if file.endswith('.html'):
print ('HTML files are \n' + file)
f = open(file, "r+")
soup = BeautifulSoup(f, 'html.parser')
matches = str(soup.find_all("div", id="jp-post-flair"))
#The soup.find_all part should be correct as I tested it to
#print the matches and the result matches the texts I want to delete.
f.write(f.read().replace(matches,''))
#maybe the above line isn't correct
f.close()
func(file)
你能帮我检查一下哪个部分有错误的代码,也许我该怎么处理它?
非常感谢你!!在