基于Beautiful Soup 4.2.0文档的学习记录（1）——find()、find_all()、findAll()

最新推荐文章于 2024-08-04 22:11:35 发布

HeatDeath

最新推荐文章于 2024-08-04 22:11:35 发布

阅读量2.6k

点赞数

分类专栏： Python爬虫文章标签：文档标签 tag

本文为博主原创文章，未经博主允许不得转载。哈哈哈

本文链接：https://blog.csdn.net/HeatDeath/article/details/64923018

版权

本文介绍了Beautiful Soup 4.2.0中的find()和find_all()方法，包括name、keyword参数、按CSS搜索、text参数、limit参数、recursive参数的使用，并展示了如何像调用find_all()一样调用tag。

摘要由CSDN通过智能技术生成

find()

find( name , attrs , recursive , text , **kwargs )

find_all()方法将返回文档中符合条件的所有tag,尽管有时候我们只想得到一个结果.比如文档中只有一个<body>标签,那么使用 find_all()方法来查找<body>标签就不太合适, 使用 find_all 方法并设置 limit=1 参数不如直接使用 find() 方法.下面两行代码是等价的:

soup.find_all('title', limit=1)
# [<title>The Dormouse's story</title>]

soup.find('title')
# <title>The Dormouse's story</title>

唯一的区别是 :

find_all() 方法的返回结果是值包含一个元素的列表,没有找到目标是返回空列表
find() 方法直接返回结果,找不到目标时,返回 None .

print(soup.find("nosuchtag"))
# None

注：现在findAll()方法更名为find_all()方法

find_all()

find_all( name , attrs , recursive , text , **kwargs )

find_all() 方法搜索当前tag的所有tag子节点,并判断是否符合过滤器的条件.这里有几个例子:

soup.find_all("title")
# [<title>The Dormouse's story</title>]

soup.find_all("p", "title")
# [<p class="title"><b>The Dormouse's story</b></p>]

soup.find_all("a")
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find_all(id="link2")
# [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]

import re
soup.find(text=re.compile("sisters"))
# u'Once upon a time there were three little sisters; and their names were\n'