BeautifulSoup/bs4 常用的findall函数

https://www.crummy.com/software/BeautifulSoup/bs4/doc/#searching-the-tree

find_all()

Signature: find_all(name, attrs, recursive, string, limit, **kwargs)

The find_all() method looks through a tag’s descendants andretrieves all descendants that match your filters. I gave severalexamples in Kinds of filters, but here are a few more:

例子:

soup.find_all("title")
# [<title>The Dormouse's story</title>]

soup.find_all("p", "title")
# [<p class="title"><b>The Dormouse's story</b></p>]

soup.find_all("a")
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find_all(id="link2")
# [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]

import re
soup.find(string=re.compile("sisters"))
# u'Once upon a time there were three little sisters; and their names were\n'

Some of these should look familiar, but others are new. What does itmean to pass in a value for string, or id? Why doesfind_all("p", "title") find a <p> tag with the CSS class “title”?Let’s look at the arguments to find_all().

find_all("p", "title") “p”是传给参数name的值,“title”是传给参数attrs的值。

The name argument

Pass in a value for name and you’ll tell Beautiful Soup to onlyconsider tags with certain names. Text strings will be ignored, aswill tags whose names that don’t match.

传入给name的值指你想找的标签的名称,如<title><p>。名称不匹配的标签不会显示。

This is the simplest usage:

soup.find_all("title")
# [<title>The Dormouse's story</title>]

Recall from Kinds of filters that the value to name can be astring, a regular expression, a list, a function, or the valueTrue.

The keyword arguments

Any argument that’s not recognized will be turned into a filter on oneof a tag’s attributes. If you pass in a value for an argument called id,Beautiful Soup will filter against each tag’s ‘id’ attribute:

重要的用法:任何以赋值形式(x=“string”)传递到findall的参数,如果x不再参数列表中(name, attrs, recursive, string, limit),x就被当做标签的属性。例如,传id=‘link2’,将会过滤属性id等于link2的标签出来。


soup.find_all(id='link2')
# [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]

If you pass in a value for href, Beautiful Soup will filteragainst each tag’s ‘href’ attribute:

soup.find_all(href=re.compile("elsie"))
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]

You can filter an attribute based on a string, a regularexpression, a list, a function, or the value True.

This code finds all tags whose id attribute has a value,regardless of what the value is:

soup.find_all(id=True)
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

You can filter multiple attributes at once by passing in more than onekeyword argument:

soup.find_all(href=re.compile("elsie"), id='link1')
# [<a class="sister" href="http://example.com/elsie" id="link1">three</a>]

Some attributes, like the data-* attributes in HTML 5, have names thatcan’t be used as the names of keyword arguments:

data_soup = BeautifulSoup('<div data-foo="value">foo!</div>')
data_soup.find_all(data-foo="value")
# SyntaxError: keyword can't be an expression

You can use these attributes in searches by putting them into adictionary and passing the dictionary into find_all() as theattrs argument:

data_soup.find_all(attrs={"data-foo": "value"})
# [<div data-foo="value">foo!</div>]

You can’t use a keyword argument to search for HTML’s ‘name’ element,because Beautiful Soup uses the name argument to contain the nameof the tag itself. Instead, you can give a value to ‘name’ in theattrs argument.

name_soup = BeautifulSoup(‘<input name=”email”/>’)name_soup.find_all(name=”email”)# []name_soup.find_all(attrs={“name”: “email”})# [<input name=”email”/>]

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值