201211_python继续教育_3

最新推荐文章于 2022-12-22 18:01:12 发布

aZhuchooong

最新推荐文章于 2022-12-22 18:01:12 发布

阅读量139

点赞数 2

本文链接：https://blog.csdn.net/aZhuchooong/article/details/111054547

版权

BeautifulSoup库

安装 - 第三方库，so需要终端安装

pip3 install BeautifulSoup4

引入

from bs4 import BeautifulSoup

解析数据

test = BeautifulSoup(content.text,'html.parser')
#test是一个beautifulsoup(简称bs)对象
#content.text处必须是字符串内容
#html.parser处是解析器

提取Tag对象 - find() / find_all()

test = BeautifulSoup(content.text,'html.parser')

item = test.find('div',class_='books')
#在解析的bs对象中，只提取满足要求的首个数据，即第一个div标签下的数据
#此时item是tag类对象
#find()用法：bs对象.find(标签，属性)

item = test.find_all('div',class_='books')
#在解析的bs对象中，只提取满足要求的所有数据，即所有div标签下的数据
#此时item是所有tag对象的列表，组成的resultset类对象
#find_all()用法：bs对象.find_all(标签，属性)

for i in item:
#此时i为tag类对象
#resultset类对象的item可循环取出tag类对象，等同于bs.find()

Tag对象中提取信息 - Tag对象属性

tag.find()/tag.find_all()
#取tag对象中的tag

tag.text
#取tag对象中的文本

tag['属性名']
#提取tag对象中这个属性的值

Tag对象中提取信息 - 嵌套提取好几层Tag
```
find('ul',class_='nav').find('ul').find_all('li')
```
最终打印结果，可以使用str.strip()去除特殊字符串。

比如，使用.strip()即可去掉' 我是吴枫\n'文字前面的空格与后面的换行。

Tag对象中提取信息 - 某标签的某一属性中有多个属性值

#例如：<p class="star-rating Two">,<p>标签中的'class'属性中有'star-rating'和'two'两个属性值
#此时只提取一个属性值，其中'star-rating'是第0个属性值，'two'是1个
grade = item.find('p',class_='star-rating')['class'][1]

aZhuchooong

关注

2
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
201211_python继续教育_3

BeautifulSoup库安装 - 第三方库，so需要终端安装pip3 install BeautifulSoup4引入from bs4 import BeautifulSoup解析数据test = BeautifulSoup(content.text,'html.parser')#test是一个beautifulsoup(简称bs)对象#content.text处必须是字符串内容#html.parser处是解析器提取数据 - find() / find_a...
复制链接

扫一扫