python爬虫的BeautifulSoup库详解，Python开发知识体系

最新推荐文章于 2024-04-04 01:34:55 发布

Web学习笔记

最新推荐文章于 2024-04-04 01:34:55 发布

阅读量678

点赞数 9

分类专栏： 2024年程序员学习文章标签： python 爬虫 beautifulsoup

本文链接：https://blog.csdn.net/qq194582923/article/details/136895665

版权

2024年程序员学习专栏收录该内容

248 篇文章 0 订阅

订阅专栏

html = “”"

The Dormouse's story

Once upon a time there were three little sisters; and their names were

Elsie

Lacie

and

Tillie

and they lived at the bottom of a well.

...

“”"

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, ‘lxml’)

print(soup.p.children)

for i, child in enumerate(soup.p.children):

print(i, child)

<list_iterator object at 0x1064f7dd8>

Once upon a time there were three little sisters; and their names were

Elsie

3 Lacie

and

5 Tillie

and they lived at the bottom of a well.

html = “”"

The Dormouse's story

Once upon a time there were three little sisters; and their names were

Elsie

Lacie

and

Tillie

and they lived at the bottom of a well.

...

“”"

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, ‘lxml’)

print(soup.p.descendants)

for i, child in enumerate(soup.p.descendants):

print(i, child)

Once upon a time there were three little sisters; and their names were

Elsie

3 Elsie

4 Elsie

7 Lacie

8 Lacie

and

10 Tillie

11 Tillie

and they lived at the bottom of a well.

3.7父节点和祖先节点

html = “”"

The Dormouse's story

Once upon a time there were three little sisters; and their names were

Elsie

Lacie

and

Tillie

and they lived at the bottom of a well.

...

“”"

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, ‘lxml’)

print(soup.a.parent)

Once upon a time there were three little sisters; and their names were

Elsie

Lacie

and

Tillie

and they lived at the bottom of a well.

html = “”"

The Dormouse's story

Once upon a time there were three little sisters; and their names were

Elsie

Lacie

and

Tillie

and they lived at the bottom of a well.

...

“”"

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, ‘lxml’)

print(list(enumerate(soup.a.parents)))

[(0,

Once upon a time there were three little sisters; and their names were

Elsie

Lacie

and

Tillie

and they lived at the bottom of a well.

), (1,

Once upon a time there were three little sisters; and their names were

Elsie

Lacie

and

Tillie

and they lived at the bottom of a well.

...

), (2, The Dormouse's story

Once upon a time there were three little sisters; and their names were

Elsie

Lacie

and

Tillie

and they lived at the bottom of a well.

...

), (3, The Dormouse's story

Once upon a time there were three little sisters; and their names were

Elsie

Lacie

and

Tillie

and they lived at the bottom of a well.

...

)]

3.8兄弟节点

html = “”"

The Dormouse's story

Once upon a time there were three little sisters; and their names were

Elsie

Lacie

and

Tillie

and they lived at the bottom of a well.

...

“”"

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, ‘lxml’)

print(list(enumerate(soup.a.next_siblings)))

print(list(enumerate(soup.a.previous_siblings)))

[(0, ‘\n’), (1, Lacie), (2, ’ \n and\n '), (3, Tillie), (4, '\n and they lived at the bottom of a well.\n ')]

[(0, '\n Once upon a time there were three little sisters; and their names were\n ')]

4标准选择器

4.1find_all( name , attrs , recursive , text , **kwargs )

可根据标签名、属性、内容查找文档

4.1.1name

html=‘’’

Hello

Foo
Bar
Jay
Foo
Bar
‘’’

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, ‘lxml’)

print(soup.find_all(‘ul’))

print(type(soup.find_all(‘ul’)[0]))

[
Foo
Bar
Jay
,
Foo
Bar
]
<class ‘bs4.element.Tag’>

html=‘’’

Hello
Foo
Bar
Jay
Foo
Bar
‘’’

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, ‘lxml’)

for ul in soup.find_all(‘ul’):

print(ul.find_all(‘li’))

[
Foo
,
Bar
,
Jay
]

[
Foo
,
Bar
]

4.1.2attrs

html=‘’’

Hello
Foo
Bar
Jay
Foo
Bar
‘’’

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, ‘lxml’)

print(soup.find_all(attrs={‘id’: ‘list-1’}))

print(soup.find_all(attrs={‘name’: ‘elements’}))

[
Foo
Bar
Jay
]
[
Foo
Bar
Jay
]
html=‘’’

Hello
Foo
Bar
Jay
Foo
Bar
‘’’

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, ‘lxml’)

print(soup.find_all(id=‘list-1’))

print(soup.find_all(class_=‘element’))

[
Foo
Bar
Jay
]
[
Foo
,
Bar
,
Jay
,
Foo
,
Bar
]

4.1.3text

html=‘’’

Hello
Foo
Bar
Jay
Foo
Bar
‘’’

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, ‘lxml’)

print(soup.find_all(text=‘Foo’))

[‘Foo’, ‘Foo’]

4.2find( name , attrs , recursive , text , **kwargs )

find返回单个元素，find_all返回所有元素

html=‘’’

Hello
Foo
Bar
Jay
Foo
Bar
‘’’

from bs4 import BeautifulSoup
文末有福利领取哦~

👉一、Python所有方向的学习路线

Python所有方向的技术点做的整理，形成各个领域的知识点汇总，它的用处就在于，你可以按照上面的知识点去找对应的学习资源，保证自己学得较为全面。

👉二、Python必备开发工具

👉三、Python视频合集

观看零基础学习视频，看视频学习是最快捷也是最有效果的方式，跟着视频中老师的思路，从基础到深入，还是很容易入门的。

👉 四、实战案例

光学理论是没用的，要学会跟着一起敲，要动手实操，才能将自己的所学运用到实际当中去，这时候可以搞点实战案例来学习。（文末领读者福利）

👉五、Python练习题

检查学习结果。

👉六、面试资料

我们学习Python必然是为了找到高薪的工作，下面这些面试题是来自阿里、腾讯、字节等一线互联网大厂最新的面试资料，并且有阿里大佬给出了权威的解答，刷完这一套面试资料相信大家都能找到满意的工作。

👉因篇幅有限，仅展示部分资料，这份完整版的Python全套学习资料已经上传

小编13年上海交大毕业，曾经在小公司待过，也去过华为、OPPO等大厂，18年进入阿里一直到现在。

深知大多数初中级Python工程师，想要提升技能，往往是自己摸索成长或者是报班学习，但自己不成体系的自学效果低效又漫长，而且极易碰到天花板技术停滞不前！

因此收集整理了一份《2024年Python爬虫全套学习资料》送给大家，初衷也很简单，就是希望能够帮助到想自学提升又不知道该从何学起的朋友，同时减轻大家的负担。

由于文件比较大，这里只是将部分目录截图出来，每个节点里面都包含大厂面经、学习笔记、源码讲义、实战项目、讲解视频

如果你觉得这些内容对你有帮助，可以添加下面V无偿领取！（备注：python）

👉 四、实战案例

光学理论是没用的，要学会跟着一起敲，要动手实操，才能将自己的所学运用到实际当中去，这时候可以搞点实战案例来学习。（文末领读者福利）

👉五、Python练习题

检查学习结果。

👉六、面试资料

我们学习Python必然是为了找到高薪的工作，下面这些面试题是来自阿里、腾讯、字节等一线互联网大厂最新的面试资料，并且有阿里大佬给出了权威的解答，刷完这一套面试资料相信大家都能找到满意的工作。

👉因篇幅有限，仅展示部分资料，这份完整版的Python全套学习资料已经上传

小编13年上海交大毕业，曾经在小公司待过，也去过华为、OPPO等大厂，18年进入阿里一直到现在。

深知大多数初中级Python工程师，想要提升技能，往往是自己摸索成长或者是报班学习，但自己不成体系的自学效果低效又漫长，而且极易碰到天花板技术停滞不前！

因此收集整理了一份《2024年Python爬虫全套学习资料》送给大家，初衷也很简单，就是希望能够帮助到想自学提升又不知道该从何学起的朋友，同时减轻大家的负担。

由于文件比较大，这里只是将部分目录截图出来，每个节点里面都包含大厂面经、学习笔记、源码讲义、实战项目、讲解视频

如果你觉得这些内容对你有帮助，可以添加下面V无偿领取！（备注：python）
[外链图片转存中…(img-FQxt6scm-1710982156731)]

Web学习笔记

关注

9
点赞
踩
12

收藏

觉得还不错? 一键收藏
0
评论
python爬虫的BeautifulSoup库详解，Python开发知识体系

html = “”"Once upon a time there were three little sisters; and their names wereElsieLacieandTillieand they lived at the bottom of a well....“”"from bs4 import BeautifulSoupsoup = BeautifulSoup(html, ‘lxml’)print(soup.p.children)for i, child in enumerate(
复制链接

扫一扫