python爬虫的BeautifulSoup库详解,最新Python面试合集

Tillie

;

and they lived at the bottom of a well.

The Dormouse’s story

3.标签选择器


3.1选择元素

html = “”"

The Dormouse's story

The Dormouse's story

Once upon a time there were three little sisters; and their names were

,

Lacie and

Tillie;

and they lived at the bottom of a well.

...

“”"

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, ‘lxml’)

print(soup.title)

print(type(soup.title))

print(soup.head)

print(soup.p)

The Dormouse's story

<class ‘bs4.element.Tag’>

The Dormouse's story

The Dormouse's story

3.2获取名称

html = “”"

The Dormouse's story

The Dormouse's story

Once upon a time there were three little sisters; and their names were

,

Lacie and

Tillie;

and they lived at the bottom of a well.

...

“”"

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, ‘lxml’)

print(soup.title.name)

title

3.3获取属性

html = “”"

The Dormouse's story

The Dormouse's story

Once upon a time there were three little sisters; and their names were

,

Lacie and

Tillie;

and they lived at the bottom of a well.

...

“”"

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, ‘lxml’)

print(soup.p.attrs[‘name’])

print(soup.p[‘name’])

dromouse

dromouse

3.4获取内容

html = “”"

The Dormouse's story

The Dormouse's story

Once upon a time there were three little sisters; and their names were

,

Lacie and

Tillie;

and they lived at the bottom of a well.

...

“”"

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, ‘lxml’)

print(soup.p.string)

The Dormouse’s story

3.5嵌套选择

html = “”"

The Dormouse's story

The Dormouse's story

Once upon a time there were three little sisters; and their names were

,

Lacie and

Tillie;

and they lived at the bottom of a well.

...

“”"

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, ‘lxml’)

print(soup.head.title.string)

The Dormouse’s story

3.6子节点和子孙节点

html = “”"

The Dormouse's story

Once upon a time there were three little sisters; and their names were

Elsie

Lacie

and

Tillie

and they lived at the bottom of a well.

...

“”"

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, ‘lxml’)

print(soup.p.contents)

['\n Once upon a time there were three little sisters; and their names were\n ',

Elsie

, ‘\n’, Lacie, ’ \n and\n ', Tillie, '\n and they lived at the bottom of a well.\n ']

html = “”"

The Dormouse's story

Once upon a time there were three little sisters; and their names were

Elsie

Lacie

and

Tillie

and they lived at the bottom of a well.

...

“”"

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, ‘lxml’)

print(soup.p.children)

for i, child in enumerate(soup.p.children):

print(i, child)

<list_iterator object at 0x1064f7dd8>

0

Once upon a time there were three little sisters; and their names were

1

Elsie

2

3 Lacie

4

and

5 Tillie

6

and they lived at the bottom of a well.

html = “”"

The Dormouse's story

Once upon a time there were three little sisters; and their names were

Elsie

Lacie

and

Tillie

and they lived at the bottom of a well.

...

“”"

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, ‘lxml’)

print(soup.p.descendants)

for i, child in enumerate(soup.p.descendants):

print(i, child)

<generator object descendants at 0x10650e678>

0

Once upon a time there were three little sisters; and their names were

1

Elsie

2

3 Elsie

4 Elsie

5

6

7 Lacie

8 Lacie

9

and

10 Tillie

11 Tillie

12

and they lived at the bottom of a well.

3.7父节点和祖先节点

html = “”"

The Dormouse's story

Once upon a time there were three little sisters; and their names were

Elsie

Lacie

and

Tillie

and they lived at the bottom of a well.

...

“”"

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, ‘lxml’)

print(soup.a.parent)

Once upon a time there were three little sisters; and their names were

Elsie

Lacie

and

Tillie

and they lived at the bottom of a well.

html = “”"

The Dormouse's story

Once upon a time there were three little sisters; and their names were

Elsie

Lacie

and

Tillie

and they lived at the bottom of a well.

...

“”"

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, ‘lxml’)

print(list(enumerate(soup.a.parents)))

[(0,

Once upon a time there were three little sisters; and their names were

Elsie

Lacie

and

Tillie

and they lived at the bottom of a well.

), (1,

Once upon a time there were three little sisters; and their names were

Elsie

Lacie

and

Tillie

and they lived at the bottom of a well.

...

), (2, The Dormouse's story

Once upon a time there were three little sisters; and their names were

Elsie

Lacie

and

Tillie

and they lived at the bottom of a well.

...

), (3, The Dormouse's story

Once upon a time there were three little sisters; and their names were

Elsie

Lacie

and

Tillie

and they lived at the bottom of a well.

...

)]

3.8兄弟节点

html = “”"

The Dormouse's story

Once upon a time there were three little sisters; and their names were

Elsie

Lacie

and

Tillie

and they lived at the bottom of a well.

...

“”"

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, ‘lxml’)

print(list(enumerate(soup.a.next_siblings)))

print(list(enumerate(soup.a.previous_siblings)))

[(0, ‘\n’), (1, Lacie), (2, ’ \n and\n '), (3, Tillie), (4, '\n and they lived at the bottom of a well.\n ')]

[(0, '\n Once upon a time there were three little sisters; and their names were\n ')]

4标准选择器


4.1find_all( name , attrs , recursive , text , **kwargs )

可根据标签名、属性、内容查找文档

4.1.1name

html=‘’’

Hello
    • Foo
    • Bar
    • Jay
      • Foo
      • Bar
      • ‘’’

        from bs4 import BeautifulSoup

        soup = BeautifulSoup(html, ‘lxml’)

        print(soup.find_all(‘ul’))

        print(type(soup.find_all(‘ul’)[0]))

        自我介绍一下,小编13年上海交大毕业,曾经在小公司待过,也去过华为、OPPO等大厂,18年进入阿里一直到现在。

        深知大多数Python工程师,想要提升技能,往往是自己摸索成长或者是报班学习,但对于培训机构动则几千的学费,着实压力不小。自己不成体系的自学效果低效又漫长,而且极易碰到天花板技术停滞不前!

        因此收集整理了一份《2024年Python开发全套学习资料》,初衷也很简单,就是希望能够帮助到想自学提升又不知道该从何学起的朋友,同时减轻大家的负担。
        img
        img



        既有适合小白学习的零基础资料,也有适合3年以上经验的小伙伴深入学习提升的进阶课程,基本涵盖了95%以上Python开发知识点,真正体系化!

        由于文件比较大,这里只是将部分目录大纲截图出来,每个节点里面都包含大厂面经、学习笔记、源码讲义、实战项目、讲解视频,并且后续会持续更新

        如果你觉得这些内容对你有帮助,可以添加V获取:vip1024c (备注Python)
        img

        一个人可以走的很快,但一群人才能走的更远。如果你从事以下工作或对以下感兴趣,欢迎戳这里加入程序员的圈子,让我们一起学习成长!

        AI人工智能、Android移动开发、AIGC大模型、C C#、Go语言、Java、Linux运维、云计算、MySQL、PMP、网络安全、Python爬虫、UE5、UI设计、Unity3D、Web前端开发、产品经理、车载开发、大数据、鸿蒙、计算机网络、嵌入式物联网、软件测试、数据结构与算法、音视频开发、Flutter、IOS开发、PHP开发、.NET、安卓逆向、云计算

        [外链图片转存中…(img-s0dJsA2r-1712165684815)]



        既有适合小白学习的零基础资料,也有适合3年以上经验的小伙伴深入学习提升的进阶课程,基本涵盖了95%以上Python开发知识点,真正体系化!

        由于文件比较大,这里只是将部分目录大纲截图出来,每个节点里面都包含大厂面经、学习笔记、源码讲义、实战项目、讲解视频,并且后续会持续更新

        如果你觉得这些内容对你有帮助,可以添加V获取:vip1024c (备注Python)
        [外链图片转存中…(img-l6PdeEE8-1712165684816)]

        一个人可以走的很快,但一群人才能走的更远。如果你从事以下工作或对以下感兴趣,欢迎戳这里加入程序员的圈子,让我们一起学习成长!

        AI人工智能、Android移动开发、AIGC大模型、C C#、Go语言、Java、Linux运维、云计算、MySQL、PMP、网络安全、Python爬虫、UE5、UI设计、Unity3D、Web前端开发、产品经理、车载开发、大数据、鸿蒙、计算机网络、嵌入式物联网、软件测试、数据结构与算法、音视频开发、Flutter、IOS开发、PHP开发、.NET、安卓逆向、云计算

      • 10
        点赞
      • 27
        收藏
        觉得还不错? 一键收藏
      • 0
        评论

      “相关推荐”对你有帮助么?

      • 非常没帮助
      • 没帮助
      • 一般
      • 有帮助
      • 非常有帮助
      提交
      评论
      添加红包

      请填写红包祝福语或标题

      红包个数最小为10个

      红包金额最低5元

      当前余额3.43前往充值 >
      需支付:10.00
      成就一亿技术人!
      领取后你会自动成为博主和红包主的粉丝 规则
      hope_wisdom
      发出的红包
      实付
      使用余额支付
      点击重新获取
      扫码支付
      钱包余额 0

      抵扣说明:

      1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
      2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

      余额充值