这题用BeautifulSoup的话太容易了,不过据说有些网页用了bs的text方法后,还会有json数据留下来。我一时也找不到会出错的html。
from bs4 import BeautifulSoup
if __name__ == '__main__':
with open('tmp.html', 'r', encoding = 'utf-8') as html:
bs = BeautifulSoup(html, 'html.parser')
print(bs.text)