目录
1 HTML基础与BS4
<标签 属性=“属性值”>被标记的内容</标签>
BS4解析的原理就是找到唯一的标签标识,从而找到感兴趣的内容
pip installl bs4
2. BS4测试案例
首先,将源码交给BeautifulSoup对象,
page = BeautifulSoup(resp.text, 'html.parser')
如果不指定html.parser参数,则报如下警告:
UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.
从page中查找数据,使用find方法,具体使用有以下两种:
page.find("table", class_="hq_table") # 注意由于class是python的关键字,为避免冲突,在其后添加