【bs4】官网学习BeautifulSoup

最新推荐文章于 2024-04-15 09:13:56 发布

神创

最新推荐文章于 2024-04-15 09:13:56 发布

阅读量694

点赞数

分类专栏： python 爬虫

本文链接：https://blog.csdn.net/qq_19741181/article/details/79476956

版权

python 同时被 3 个专栏收录

84 篇文章 3 订阅

订阅专栏

爬虫

16 篇文章 0 订阅

订阅专栏

BeautifulSoup

4 篇文章 0 订阅

订阅专栏

>>> html_doc = """
... <html><head><title>The Dormouse's story</title></head>
... <body>
... The Dormouse's story
...
... Once upon a time there were three little sisters; and their names were
... <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
... <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
... <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
... and they lived at the bottom of a well.
...
... ...

... """

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(html_doc, 'html.parser')
>>>
>>> print(soup.prettify())
<html>
<head>
<title>
The Dormouse's story
</title>
</head>
<body>


The Dormouse's story



Once upon a time there were three little sisters; and their names were
<a class="sister" href="http://example.com/elsie" id="link1">
Elsie
</a>
,
<a class="sister" href="http://example.com/lacie" id="link2">
Lacie
</a>
and
<a class="sister" href="http://example.com/tillie" id="link3">
Tillie
</a>
;
and they lived at the bottom of a well.


...

</body>
</html>

>>>

----------------------

>>> soup.title
<title>The Dormouse's story</title>
>>> soup.title.name
'title'
>>> soup.title.string
"The Dormouse's story"
>>> soup.title.parent.name
'head'
>>> soup.p
The Dormouse's story
>>> soup.p['class']
['title']
>>> soup.a
<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>
>>> soup.find_all('a')
[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
>>> soup.find(id="link3")
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>

>>>

----------------

>>> for link in soup.find_all('a'):
...     print(link.get('href'))
...
http://example.com/elsie
http://example.com/lacie
http://example.com/tillie
>>>

>>> print(soup.get_text())


The Dormouse's story


The Dormouse's story
Once upon a time there were three little sisters; and their names were
Elsie,
Lacie and
Tillie;
and they lived at the bottom of a well.
...


>>>


The Dormouse's story


The Dormouse's story
Once upon a time there were three little sisters; and their names were
Elsie,
Lacie and
Tillie;
and they lived at the bottom of a well.
...


>>>

神创

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
【bs4】官网学习BeautifulSoup

&gt;&gt;&gt; html_doc = """... &lt;html&gt;&lt;head&gt;&lt;title&gt;The Dormouse's story&lt;/title&gt;&lt;/head&gt;... &lt;body&gt;... &lt;p class="title"&gt;&
复制链接

扫一扫