BeautifulSoup的安装与使用

最新推荐文章于 2023-10-18 22:20:58 发布

QINGMU150

最新推荐文章于 2023-10-18 22:20:58 发布

阅读量556

点赞数

分类专栏： Python 文章标签：爬虫工具

本文链接：https://blog.csdn.net/QINGMU150/article/details/86412193

版权

Python 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

一、Beautifulsoup的安装

beautifulsoup属于bs4工具包，因此只需要安装bs4即可使用beautifulsoup，在控制台输入 pip install bs4

二、解析网页

（1）不指定解析类型

自动选择最合适的解析器解析文档

>>>from bs4 import BeautifulSoup
>>>BeautifulSoup("my first web")
<html><body><p>my first web</p></body></html>

(2) 指定解析类型

手动指定文档解析类型，可以解析为HTML格式，也可以指定为XML格式

soup1 = BeautifulSoup(html, 'html')
soup2 = BeautifulSoup(html, 'lxml')

三、遍历节点

beautifulsoup将复杂HTML文档转换成一个复杂的树形结构，每个节点都是Python对象。

（1）通过点取方式遍历节点

>>>string = '<html><body><p>my first string</p><p>second paragraph</p></body></html>'
>>>soup = BeautifulSoup(string)
>>>soup.p.contents 通过soup对象定位节点p,并获取文本信息
[u'my first string']

通过点取属性的方式只能获取当前名字的第一个Tag，如果想要获取一个名称对应的所有节点怎么办呢？

（2）通过find_all方式遍历节点

>>>string = '<html><body><p>my first string</p><p>second paragraph</p></body></html>'
>>>soup = BeautifulSoup(string)
>>>soup.find_all('p')
[<p>my first string</p>, <p>second paragraph</p>]

四、获取Tag中的信息

（1）Tag中的name

Tag.name 获取tag对象的名称

（2）Tag中的attribute

Tag.attrs 获取Tag对象的所有属性名称

具体属性的值可以通过 Tag['Tag名称'] 获取及更改，例如：

>>>string= '<html><body><p class ="item" id = "article" name="text1">my first string</p></html>'
>>>soup = BeautifulSoup(string)
>>>Tag = soup.p
>>>Tag.attrs
{'id': 'article', 'name': 'text1', 'class': ['item']}
>>>Tag['class']       #获取Tag的class属性
['item']
>>>Tag['class'] = "my_item"   #获取更改后的Tag属性
>>>Tag.attrs
{'id': 'article', 'name': 'text1', 'class': 'my_item'}

(3)判断是否包含某个属性

Tag.has_attr('属性名')

QINGMU150

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
BeautifulSoup的安装与使用

一、Beautifulsoup的安装 beautifulsoup属于bs4工具包，因此只需要安装bs4即可使用beautifulsoup，在控制台输入 pip install bs4二、解析网页（1）不指定解析类型自动选择最合适的解析器解析文档&gt;&gt;&gt;from bs4 import BeautifulSoup&gt...
复制链接

扫一扫