我们使用BeautifulSoup是来解析爬取到的html页面
教程文档:https://www.crummy.com/software/BeautifulSoup/bs4/doc/index.zh.html
BeautifulSoup也是第三方库,需要安装,但anaconda自带(anaconda真强大)
测试:
以此页面为例:https://python123.io/ws/demo.html
import requests
r = requests.get('https://python123.io/ws/demo.html')
r.text
demo = r.text
# beautifulsoup4是库的全名,可以简写为bs4,从bs4库导入类BeautifulSoup
from bs4 import BeautifulSoup
# html.parser是解释器
soup = BeautifulSoup(demo, 'html.parser')
print(soup.prettify())