BeautifulSoup/bs4 常用的findall函数_bs4 findall 分批取-CSDN博客

https://www.crummy.com/software/BeautifulSoup/bs4/doc/#searching-the-tree

`find_all()`

Signature: find_all(name, attrs, recursive, string, limit, **kwargs)

The find_all() method looks through a tag’s descendants andretrieves all descendants that match your filters. I gave severalexamples in Kinds of filters, but here are a few more:

例子：

 
  soup.find_all("title")
# [<title>The Dormouse's story</title>]

soup.find_all("p", "title")
# [<p class="title"><b>The Dormouse's story</b></p>]

soup.find_all("a")
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find_all(id="link2")
# [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]

import re
soup.find(string=re.compile("sisters"))
# u'Once upon a time there were three little sisters; and their names were\n'
 
 

Some of these should look familiar, but others are new. What does itmean to pass in a value for string, or id? Why doesfind_all("p", "title") find a <p> tag with the CSS class “title”?Let’s look at the arguments to find_all().

find_all("p", "title") “p”是传给参数name的值，“title”是传给参数attrs的值。

The `name` argument

Pass in a value for name and you’ll tell Beautiful Soup to onlyconsider tags with certain names. Text strings will be ignored, aswill tags whose names that don’t match.

传入给name的值指你想找的标签的名称，如<title><p>。名称不匹配的标签不会显示。

This is the simplest usage:

 
   soup.find_all("title")
# [<title>The Dormouse's story</title>]

Recall from Kinds of filters that the value to name can be astring, a regular expression, a list, a function, or the valueTrue.

The keyword arguments

Any argument that’s not recognized will be turned into a filter on oneof a tag’s attributes. If you pass in a value for an argument called id,Beautiful Soup will filter against each tag’s ‘id’ attribute:

重要的用法：任何以赋值形式（x=“string”）传递到findall的参数，如果x不再参数列表中（name, attrs, recursive, string, limit），x就被当做标签的属性。例如，传id=‘link2’，将会过滤属性id等于link2的标签出来。

 
   soup.find_all(id='link2')
# [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]

If you pass in a value for href, Beautiful Soup will filteragainst each tag’s ‘href’ attribute:

 
   soup.find_all(href=re.compile("elsie"))
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]

You can filter an attribute based on a string, a regularexpression, a list, a function, or the value True.

This code finds all tags whose id attribute has a value,regardless of what the value is:

 
   soup.find_all(id=True)
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

You can filter multiple attributes at once by passing in more than onekeyword argument:

 
   soup.find_all(href=re.compile("elsie"), id='link1')
# [<a class="sister" href="http://example.com/elsie" id="link1">three</a>]

Some attributes, like the data-* attributes in HTML 5, have names thatcan’t be used as the names of keyword arguments:

 
   data_soup = BeautifulSoup('<div data-foo="value">foo!</div>')
data_soup.find_all(data-foo="value")
# SyntaxError: keyword can't be an expression

You can use these attributes in searches by putting them into adictionary and passing the dictionary into find_all() as theattrs argument:

 
   data_soup.find_all(attrs={"data-foo": "value"})
# [<div data-foo="value">foo!</div>]

You can’t use a keyword argument to search for HTML’s ‘name’ element,because Beautiful Soup uses the name argument to contain the nameof the tag itself. Instead, you can give a value to ‘name’ in theattrs argument.

name_soup = BeautifulSoup(‘<input name=”email”/>’)name_soup.find_all(name=”email”)# []name_soup.find_all(attrs={“name”: “email”})# [<input name=”email”/>]