目录
四、beautifulsoup搜索(直接定位我们需要的东西)
1.find_all(name, attrs, recursive, text, **kwargs)
(1)name参数(只能匹配标签,属性、内容不在搜索范围内)
规则形式一:字符串(找出所有其标签与字符串完全匹配的tag)
四、beautifulsoup搜索(直接定位我们需要的东西)
1.find_all(name, attrs, recursive, text, **kwargs)
(1)name参数(只能匹配标签,属性、内容不在搜索范围内)
查找所有匹配的tag,返回一个包含元素类型为tag的list
规则形式一:字符串(找出所有其标签与字符串完全匹配的tag)
list=bs.find_all("a")
for item in list:
print(item)
print("-"*500)
list=bs.find_all("Skip to content")
for item in list:
print(item)
print("-"*500)
list=bs.find_all("tr")
for item in list:
for child in item.contents:
if str(child.string)!="\n":
print(child.string,end=" ")
print()
print(type(item),sep="\n")
<a class="skip-link screen-reader-text" href="#primary"><!--Skip to content--></a>
<a class="skip-link screen-reader-text" href="#primary">Skip to content</a>
--------------------------------------------------
--------------------------------------------------
Hull Type: Fin w/bulb & spade rudder
Rigging Type: Fractional Sloop
LOA: 33.80 ft / 10.30 m
LWL: 26.90 ft / 8.20 m
S.A. (reported): 450.00 ft² / 41.81 m²
Beam: 8.20 ft / 2.50 m
Displacement: 3,527.00 lb / 1,600 kg
Ballast: 1,598.00 lb / 725 kg
Max Draft: 5.90 ft / 1.80 m
Construction: FG w/core,Composite
Ballast Type: Lead
First Built: 1990
# Built: 350
Designer: Ron Holland & Rolf Gyhlenius
<class 'bs4.element.Tag'>
规则形式二:正则表达式
list = bs.find_all(re.compile("t"))
for item in list:
print(item)
print("-"*50)
<html lang="en-US">
<head>
<meta charset="utf-8"/>
......
</meta></meta></link></meta>
--------------------------------------------------
<title>11 METER - sailboatdata</title>
--------------------------------------------------
<meta content="en_US" property="og:locale">
<meta content="article" property="og:type">
<meta content="11 METER - sailboatdata" property="og:title"/>
<meta content="https://sailboatdata.com/sailboat/11-meter/" property="og:url"/>
<meta content="sailboatdata" property="og:site_name"/>
<meta content="https://www.facebook.com/sailboatdata" property="article:publisher"/>
<meta content="2023-05-12T18:11:13+00:00" property="article:modified_time"/>
<meta content="https://sailboatdata.com/wp-content/uploads/2023/03/11_meter_photo.jpg" property="og:image"/>
<meta content="483" property="og:image:width"/>
<meta content="640" property="og:image:height"/>
<meta content="image/jpeg" property="og:image:type"/>
</meta></meta>
--------------------------------------------------
<meta content="article" property="og:type">
<meta content="11 METER - sailboatdata" property="og:title"/>
<meta content="https://sailboatdata.com/sailboat/11-meter/" property="og:url"/>
<meta content="sailboatdata" property="og:site_name"/>
<meta content="https://www.facebook.com/sailboatdata" property="article:publisher"/>
<meta content="2023-05-12T18:11:13+00:00" property="article:modified_time"/>
<meta content="https://sailboatdata.com/wp-content/uploads/2023/03/11_meter_photo.jpg" property="og:image"/>
<meta content="483" property="og:image:width"/>
<meta content="640" property="og:image:height"/>
<meta content="image/jpeg" property="og:image:type"/>
</meta>
--------------------------------------------------
<meta content="11 METER - sailboatdata" property="og:title"/>
--------------------------------------------------
<meta content="https://sailboatdata.com/sailboat/11-meter/" property="og:url"/>
--------------------------------------------------
<meta content="sailboatdata" property="og:site_name"/>
--------------------------------------------------
<meta content="https://www.facebook.com/sailboatdata" property="article:publisher"/>
--------------------------------------------------
<meta content="2023-05-12T18:11:13+00:00" property="article:modified_time"/>
--------------------------------------------------
<meta content="https://sailboatdata.com/wp-content/uploads/2023/03/11_meter_photo.jpg" property="og:image"/>
--------------------------------------------------
<meta content="483" property="og:image:width"/>
--------------------------------------------------
<meta content="640" property="og:image:height"/>
--------------------------------------------------
<meta content="image/jpeg" property="og:image:type"/>
--------------------------------------------------
<table class="table">
<tbody class="table-light">
<!-- start hull section -->
<tr>
.....
<td>Ron Holland & Rolf Gyhlenius</td>
</tr>
<!-- end designer name section -->
</tbody>
</table>
--------------------------------------------------
<tbody class="table-light">
<!-- start hull section -->
<tr>
......
<td>Ron Holland & Rolf Gyhlenius</td>
</tr>
<!-- end designer name section -->
</tbody>
--------------------------------------------------
<tr>
<td>Hull Type:</td>
<td>Fin w/bulb & spade rudder</td>
</tr>
--------------------------------------------------
<td>Hull Type:</td>
--------------------------------------------------
<td>Fin w/bulb & spade rudder</td>
--------------------------------------------------
<tr>
<td>Rigging Type:</td>
<td>Fractional Sloop</td>
</tr>
--------------------------------------------------
<td>Rigging Type:</td>
--------------------------------------------------
<td>Fractional Sloop</td>
--------------------------------------------------
<tr>
<td>LOA:</td>
<td>33.80 ft / 10.30 m</td>
</tr>
--------------------------------------------------
<td>LOA:</td>
--------------------------------------------------
<td>33.80 ft / 10.30 m</td>
--------------------------------------------------
<tr>
<td>LWL:</td>
<td>26.90 ft / 8.20 m</td>
</tr>
--------------------------------------------------
<td>LWL:</td>
--------------------------------------------------
<td>26.90 ft / 8.20 m</td>
--------------------------------------------------
<tr>
<td>S.A. (reported):</td>
<td>450.00 ft² / 41.81 m²</td>
</tr>
--------------------------------------------------
<td>S.A. (reported):</td>
--------------------------------------------------
<td>450.00 ft² / 41.81 m²</td>
--------------------------------------------------
<tr>
<td>Beam:</td>
<td>8.20 ft / 2.50 m</td>
</tr>
--------------------------------------------------
<td>Beam:</td>
--------------------------------------------------
<td>8.20 ft / 2.50 m</td>
--------------------------------------------------
<tr>
<td>Displacement:</td>
<td>3,527.00 lb / 1,600 kg</td>
</tr>
--------------------------------------------------
<td>Displacement:</td>
--------------------------------------------------
<td>3,527.00 lb / 1,600 kg</td>
--------------------------------------------------
<tr>
<td>Ballast:</td>
<td>1,598.00 lb / 725 kg</td>
</tr>
--------------------------------------------------
<td>Ballast:</td>
--------------------------------------------------
<td>1,598.00 lb / 725 kg</td>
--------------------------------------------------
<tr>
<td>Max Draft:</td>
<td>5.90 ft / 1.80 m</td>
</tr>
--------------------------------------------------
<td>Max Draft:</td>
--------------------------------------------------
<td>5.90 ft / 1.80 m</td>
--------------------------------------------------
<tr>
<td>Construction:</td>
<td>FG w/core,Composite</td>
</tr>
--------------------------------------------------
<td>Construction:</td>
--------------------------------------------------
<td>FG w/core,Composite</td>
--------------------------------------------------
<tr>
<td>Ballast Type:</td>
<td>Lead</td>
</tr>
--------------------------------------------------
<td>Ballast Type:</td>
--------------------------------------------------
<td>Lead</td>
--------------------------------------------------
<tr>
<td>First Built:</td>
<td>1990</td>
</tr>
--------------------------------------------------
<td>First Built:</td>
--------------------------------------------------
<td>1990</td>
--------------------------------------------------
<tr>
<td># Built:</td>
<td>350</td>
</tr>
--------------------------------------------------
<td># Built:</td>
--------------------------------------------------
<td>350</td>
--------------------------------------------------
<tr>
<td>Designer:</td>
<td>Ron Holland & Rolf Gyhlenius</td>
</tr>
--------------------------------------------------
<td>Designer:</td>
--------------------------------------------------
<td>Ron Holland & Rolf Gyhlenius</td>
--------------------------------------------------
运行结果如上,它找出了所有标签中包含t的tag:html,title,meta,meta,每个meta,table,tbody,每个tr,每个tr中的每个td.
规则形式三:方法
传入一个函数,根据函数的要求来搜索:
list = bs.find_all(nameIsExists)
for item in list:
print(item)
print("-"*50)
<meta content="width=device-width, initial-scale=1" name="viewport"/>
--------------------------------------------------
<meta content="app-id=1545197438" name="apple-itunes-app"/>
--------------------------------------------------
<meta content="index, follow, max-image-preview:large, max-snippet:-1, max-video-preview:-1" name="robots">
<!-- This site is optimized with the Yoast SEO plugin v20.6 - https://yoast.com/wordpress/plugins/seo/ -->
<title>11 METER - sailboatdata</title>
<link href="https://sailboatdata.com/sailboat/11-meter/" rel="canonical">
<meta content="en_US" property="og:locale">
<meta content="article" property="og:type">
<meta content="11 METER - sailboatdata" property="og:title"/>
<meta content="https://sailboatdata.com/sailboat/11-meter/" property="og:url"/>
<meta content="sailboatdata" property="og:site_name"/>
<meta content="https://www.facebook.com/sailboatdata" property="article:publisher"/>
<meta content="2023-05-12T18:11:13+00:00" property="article:modified_time"/>
<meta content="https://sailboatdata.com/wp-content/uploads/2023/03/11_meter_photo.jpg" property="og:image"/>
<meta content="483" property="og:image:width"/>
<meta content="640" property="og:image:height"/>
<meta content="image/jpeg" property="og:image:type"/>
</meta></meta></link></meta>
--------------------------------------------------
找出了所有标签中包含name属性的tag
规则形式四:字符串列表
(2)kwargs参数:指定属性指定值 或 指定属性有无
返回一个包含元素类型为tag的list
list = bs.find_all(class_=True)
for item in list:
print(item)
print("-"*50)
print(bs.find_all(content="app-id=1545197438"))
找出了有class属性的tag
--------------------------------------------------
[<meta content="app-id=1545197438" name="apple-itunes-app"/>]
找出了content是指定值的tag
(3)text参数:匹配文本
返回一个包含元素类型为NavigableString或comment的list
list = bs.find_all("Skip to content") # 搜索不到,因为它只匹配标签
for item in list:
print(item)
print("-" * 50)
list = bs.find_all(text="Skip to content") # 搜索到了,因为它匹配内容
for item in list:
print(type(item),item)
--------------------------------------------------
<class 'bs4.element.Comment'> Skip to content
<class 'bs4.element.NavigableString'> Skip to content
当然,直接这样用没用什么意义,返回的值与我们传入的值看起来是一样的。
事实上,我们一般用列表,或者正则表达式放在text=后
应用正则表达式,来获取整个html所有navigableString(包含comment)中的所有符合正则条件的完整内容。
(4)attrs参数
print(bs.find_all(name="viewport"))
print(bs.find_all(attrs={"name":"viewport"}))
[]
[<meta content="width=device-width, initial-scale=1" name="viewport"/>]
第一种直接搜就搜不到了,因为name是find_all方法的一个形参呐
(5)limit参数,限定查找个数,只要前n个
2.find()
相当于是限制limit=1
3.css选择器
(1)通过标签来找,直接找,并获取内容
print("通过标签来找,直接找,并获取内容")
list = bs.select("a")
for item in list:
print(type(item.getText()),item.getText())
通过标签来找,直接找,并获取内容
<class 'str'>
<class 'str'> Skip to content
(2)通过类名来找,加.号
print("通过类名来找,加.号")
list = bs.select(".boats-dimension")
for item in list:
print(item)
通过类名来找,加.号
<div class="boats-dimension">
<h1>11 METER</h1>
</div>
(3)通过id来找,加#号
print("通过id来找,加#号")
list = bs.select("#primary")
for item in list:
print(item)
通过id来找,加#号
<main class="site-main" id="primary">
<div class="container">
<div class="row">
<div class="col-lg-8 sailboat-content-wrapper">
<div class="boats-dimension">
<h1>11 METER</h1>
</div>
<div class="spec">
<div class="spec-bord">
<table class="table">
<tbody class="table-light">
<!-- start hull section -->
<tr>
<td>Hull Type:</td>
<td>Fin w/bulb & spade rudder</td>
</tr>
<!-- end hull section -->
<!-- start rig section -->
<tr>
<td>Rigging Type:</td>
<td>Fractional Sloop</td>
</tr>
<!-- end rig section -->
<!-- start loa section -->
<tr>
<td>LOA:</td>
<td>33.80 ft / 10.30 m</td>
</tr>
<!-- end loa section -->
<!-- start lod section -->
<!-- end lod section -->
<!-- start lwl section -->
<tr>
<td>LWL:</td>
<td>26.90 ft / 8.20 m</td>
</tr>
<!-- end lwl section -->
<!-- start sa section -->
<tr>
<td>S.A. (reported):</td>
<td>450.00 ft² / 41.81 m²</td>
</tr>
<!-- end sa section -->
<!-- start beam section -->
<tr>
<td>Beam:</td>
<td>8.20 ft / 2.50 m</td>
</tr>
<!-- end beam section -->
<!-- start beam wl section -->
<!-- end beam wl section -->
<!-- start disp section -->
<tr>
<td>Displacement:</td>
<td>3,527.00 lb / 1,600 kg</td>
</tr>
<!-- end disp section -->
<!-- start ballast section -->
<tr>
<td>Ballast:</td>
<td>1,598.00 lb / 725 kg</td>
</tr>
<!-- end ballast section -->
<!-- start max draft section -->
<tr>
<td>Max Draft:</td>
<td>5.90 ft / 1.80 m</td>
</tr>
<!-- end max draft section -->
<!-- start min draft section -->
<!-- end min draft section -->
<!-- start hull construction section -->
<tr>
<td>Construction:</td>
<td>FG w/core,Composite</td>
</tr>
<!-- end hull construction section -->
<!-- start ballast type section -->
<tr>
<td>Ballast Type:</td>
<td>Lead</td>
</tr>
<!-- end ballast type section -->
<!-- start Bridgedeck Clearance section -->
<!-- end Bridgedeck Clearance section -->
<!-- start first built section -->
<tr>
<td>First Built:</td>
<td>1990</td>
</tr>
<!-- end first built section -->
<!-- start last built section -->
<!-- end last built section -->
<!-- start number built section -->
<tr>
<td># Built:</td>
<td>350</td>
</tr>
<!-- end number built section -->
<!-- start builder name section -->
<!-- end builder name section -->
<!-- start designer name section -->
<tr>
<td>Designer:</td>
<td>Ron Holland & Rolf Gyhlenius</td>
</tr>
<!-- end designer name section -->
</tbody>
</table>
</div>
</div>
<!-- Auxiliary Power/Tanks -->
</div>
</div>
</div>
</main>
(4)组合查找
print("组合查找")
list = bs.select("div .boats-dimension")
for item in list:
print(item)
组合查找
<div class="boats-dimension">
<h1>11 METER</h1>
</div>
(5)通过属性来查找
print("通过属性来查找")
list = bs.select('[href="#primary1"]')
for item in list:
print(item)
通过属性来查找
<a class="skip-link screen-reader-text" href="#primary1"><!--Skip to content--></a>
(6)通过标签和属性来查找
print("通过标签和属性来查找")
list = bs.select('a[href="#primary1"]')
for item in list:
print(item)
通过标签和属性来查找
<a class="skip-link screen-reader-text" href="#primary1"><!--Skip to content--></a>
(7)通过子标签来查找
print("通过子标签来查找")
list = bs.select('div > div > div >div >div')
print(list)
通过子标签来查找
[<div class="spec-bord">
<table class="table">
<tbody class="table-light">
<!-- start hull section -->
<tr>
<td>Hull Type:</td>
<td>Fin w/bulb & spade rudder</td>
</tr>
<!-- end hull section -->
<!-- start rig section -->
<tr>
<td>Rigging Type:</td>
<td>Fractional Sloop</td>
</tr>
<!-- end rig section -->
<!-- start loa section -->
<tr>
<td>LOA:</td>
<td>33.80 ft / 10.30 m</td>
</tr>
<!-- end loa section -->
<!-- start lod section -->
<!-- end lod section -->
<!-- start lwl section -->
<tr>
<td>LWL:</td>
<td>26.90 ft / 8.20 m</td>
</tr>
<!-- end lwl section -->
<!-- start sa section -->
<tr>
<td>S.A. (reported):</td>
<td>450.00 ft² / 41.81 m²</td>
</tr>
<!-- end sa section -->
<!-- start beam section -->
<tr>
<td>Beam:</td>
<td>8.20 ft / 2.50 m</td>
</tr>
<!-- end beam section -->
<!-- start beam wl section -->
<!-- end beam wl section -->
<!-- start disp section -->
<tr>
<td>Displacement:</td>
<td>3,527.00 lb / 1,600 kg</td>
</tr>
<!-- end disp section -->
<!-- start ballast section -->
<tr>
<td>Ballast:</td>
<td>1,598.00 lb / 725 kg</td>
</tr>
<!-- end ballast section -->
<!-- start max draft section -->
<tr>
<td>Max Draft:</td>
<td>5.90 ft / 1.80 m</td>
</tr>
<!-- end max draft section -->
<!-- start min draft section -->
<!-- end min draft section -->
<!-- start hull construction section -->
<tr>
<td>Construction:</td>
<td>FG w/core,Composite</td>
</tr>
<!-- end hull construction section -->
<!-- start ballast type section -->
<tr>
<td>Ballast Type:</td>
<td>Lead</td>
</tr>
<!-- end ballast type section -->
<!-- start Bridgedeck Clearance section -->
<!-- end Bridgedeck Clearance section -->
<!-- start first built section -->
<tr>
<td>First Built:</td>
<td>1990</td>
</tr>
<!-- end first built section -->
<!-- start last built section -->
<!-- end last built section -->
<!-- start number built section -->
<tr>
<td># Built:</td>
<td>350</td>
</tr>
<!-- end number built section -->
<!-- start builder name section -->
<!-- end builder name section -->
<!-- start designer name section -->
<tr>
<td>Designer:</td>
<td>Ron Holland & Rolf Gyhlenius</td>
</tr>
<!-- end designer name section -->
</tbody>
</table>
</div>]
(8)通过兄弟节点标签查找
print("通过兄弟节点标签查找")
list = bs.select('.spec ~ .boats-dimension')
list = bs.select('.boats-dimension ~ .spec') # 与顺序还有关系
print(list)
通过兄弟节点标签查找
[<div class="spec">
<div class="spec-bord">
<table class="table">
<tbody class="table-light">
<!-- start hull section -->
<tr>
<td>Hull Type:</td>
<td>Fin w/bulb & spade rudder</td>
</tr>
<!-- end hull section -->
<!-- start rig section -->
<tr>
<td>Rigging Type:</td>
<td>Fractional Sloop</td>
</tr>
<!-- end rig section -->
<!-- start loa section -->
<tr>
<td>LOA:</td>
<td>33.80 ft / 10.30 m</td>
</tr>
<!-- end loa section -->
<!-- start lod section -->
<!-- end lod section -->
<!-- start lwl section -->
<tr>
<td>LWL:</td>
<td>26.90 ft / 8.20 m</td>
</tr>
<!-- end lwl section -->
<!-- start sa section -->
<tr>
<td>S.A. (reported):</td>
<td>450.00 ft² / 41.81 m²</td>
</tr>
<!-- end sa section -->
<!-- start beam section -->
<tr>
<td>Beam:</td>
<td>8.20 ft / 2.50 m</td>
</tr>
<!-- end beam section -->
<!-- start beam wl section -->
<!-- end beam wl section -->
<!-- start disp section -->
<tr>
<td>Displacement:</td>
<td>3,527.00 lb / 1,600 kg</td>
</tr>
<!-- end disp section -->
<!-- start ballast section -->
<tr>
<td>Ballast:</td>
<td>1,598.00 lb / 725 kg</td>
</tr>
<!-- end ballast section -->
<!-- start max draft section -->
<tr>
<td>Max Draft:</td>
<td>5.90 ft / 1.80 m</td>
</tr>
<!-- end max draft section -->
<!-- start min draft section -->
<!-- end min draft section -->
<!-- start hull construction section -->
<tr>
<td>Construction:</td>
<td>FG w/core,Composite</td>
</tr>
<!-- end hull construction section -->
<!-- start ballast type section -->
<tr>
<td>Ballast Type:</td>
<td>Lead</td>
</tr>
<!-- end ballast type section -->
<!-- start Bridgedeck Clearance section -->
<!-- end Bridgedeck Clearance section -->
<!-- start first built section -->
<tr>
<td>First Built:</td>
<td>1990</td>
</tr>
<!-- end first built section -->
<!-- start last built section -->
<!-- end last built section -->
<!-- start number built section -->
<tr>
<td># Built:</td>
<td>350</td>
</tr>
<!-- end number built section -->
<!-- start builder name section -->
<!-- end builder name section -->
<!-- start designer name section -->
<tr>
<td>Designer:</td>
<td>Ron Holland & Rolf Gyhlenius</td>
</tr>
<!-- end designer name section -->
</tbody>
</table>
</div>
</div>]