https://www.crummy.com/software/BeautifulSoup/bs4/doc/#searching-the-tree
find_all()
Signature: find_all(name, attrs, recursive, string, limit, **kwargs)
The find_all()
method looks through a tag’s descendants andretrieves all descendants that match your filters. I gave severalexamples in Kinds of filters, but here are a few more:
例子:
soup.find_all("title")
# [<title>The Dormouse's story</title>]
soup.find_all("p", "title")
# [<p class="title"><b>The Dormouse's story</b></p>]
soup.find_all("a")
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
# <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
soup.find_all(id="link2")
# [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]
import re
soup.find(string=re.compile("sisters"))
# u'Once upon a time there were three little sisters; and their names were\n'
Some of these should look familiar, but others are new. What does itmean to pass in a value for string
, or id
? Why doesfind_all("p", "title")
find a <p> tag with the CSS class “title”?Let’s look at the arguments to find_all()
.
find_all("p", "title")
“p”是传给参数name的值,“title”是传给参数attrs的值。
The name
argument
Pass in a value for name
and you’ll tell Beautiful Soup to onlyconsider tags with certain names. Text strings will be ignored, aswill tags whose names that don’t match.
传入给name的值指你想找的标签的名称,如<title><p>。名称不匹配的标签不会显示。
This is the simplest usage:
soup.find_all("title")
# [<title>The Dormouse's story</title>]
Recall from Kinds of filters that the value to name
can be astring, a regular expression, a list, a function, or the valueTrue.
The keyword arguments
Any argument that’s not recognized will be turned into a filter on oneof a tag’s attributes. If you pass in a value for an argument called id
,Beautiful Soup will filter against each tag’s ‘id’ attribute:
重要的用法:任何以赋值形式(x=“string”)传递到findall的参数,如果x不再参数列表中(name, attrs, recursive, string, limit),x就被当做标签的属性。例如,传id=‘link2’,将会过滤属性id等于link2的标签出来。
soup.find_all(id='link2')
# [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]
If you pass in a value for href
, Beautiful Soup will filteragainst each tag’s ‘href’ attribute:
soup.find_all(href=re.compile("elsie"))
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]
You can filter an attribute based on a string, a regularexpression, a list, a function, or the value True.
This code finds all tags whose id
attribute has a value,regardless of what the value is:
soup.find_all(id=True)
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
# <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
You can filter multiple attributes at once by passing in more than onekeyword argument:
soup.find_all(href=re.compile("elsie"), id='link1')
# [<a class="sister" href="http://example.com/elsie" id="link1">three</a>]
Some attributes, like the data-* attributes in HTML 5, have names thatcan’t be used as the names of keyword arguments:
data_soup = BeautifulSoup('<div data-foo="value">foo!</div>')
data_soup.find_all(data-foo="value")
# SyntaxError: keyword can't be an expression
You can use these attributes in searches by putting them into adictionary and passing the dictionary into find_all()
as theattrs
argument:
data_soup.find_all(attrs={"data-foo": "value"})
# [<div data-foo="value">foo!</div>]
You can’t use a keyword argument to search for HTML’s ‘name’ element,because Beautiful Soup uses the name
argument to contain the nameof the tag itself. Instead, you can give a value to ‘name’ in theattrs
argument.
name_soup = BeautifulSoup(‘<input name=”email”/>’)name_soup.find_all(name=”email”)# []name_soup.find_all(attrs={“name”: “email”})# [<input name=”email”/>]