1.理解html文件格式和内容
大框架
<html>
<head>
.....
</head>
<body>
......
<body>
</html>
2.安装Beautiful Soup
3.安装lxml
from bs4 import BeautifulSoup
import re
soup = BeautifulSoup(open('E://0000bee3dab9ec4085b36c8f99b34289.html'),'html.parser')
for string in soup.stripped_strings:
print(repr(string))
4.进一步处理