python BS4(下)

搜索树

get_text()方法

  • 获取tag中包含的文本内容
  • get_text()方法仅适用字符串类型

select()方法

  • select()方法是通过css选择器的语法来查找目标
  • CSS语法 —— https://www.w3school.com.cn/cssref/css_selectors.asp
from bs4 import BeautifulSoup
html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>
"""
bs = BeautifulSoup(html_doc, 'lxml')

# 通过标签查找
print(bs.select('b'))			# [<b>The Dormouse's story</b>]
# 通过类名查找
print(bs.select('.title'))		# [<p class="title"><b>The Dormouse's story</b></p>]
# 通过id名来查找
print(bs.select('#link1'))		# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]

修改文档树(不常用)

  • 修改tag的名称和属性
  • 修改string属性值,相当于用当前的内容替代原来的内容
  • append()方法,向tag中添加内容(类似列表的append()方法)
  • decompose()方法,修改删除段落

修改tag名称和属性

from bs4 import BeautifulSoup
html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>
"""
bs = BeautifulSoup(html_doc, 'lxml')
print(bs.p)			# <p class="title"><b>The Dormouse's story</b></p>

# 修改tag名称
bs.p.name = 'z'		
print(bs.z)			# <z class="title"><b>The Dormouse's story</b></z>
# 修改属性
bs.z['class'] = 'name'	
print(bs.z)			# <z class="name"><b>The Dormouse's story</b></z>

修改string属性值

from bs4 import BeautifulSoup
html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>
"""
bs = BeautifulSoup(html_doc, 'lxml')
print(bs.title)		# <title>The Dormouse's story</title>

# 修改string属性值
bs.title.string = 'Book Name'
print(bs.title)		# <title>Book Name</title>

append()方法

from bs4 import BeautifulSoup
html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>
"""
bs = BeautifulSoup(html_doc, 'lxml')
print(bs.p)		# <p class="title"><b>The Dormouse's story</b></p>

# append()方法
bs.p.append('abc')		# <p class="name"><b>The Dormouse's story</b>abc</p>

decompose()方法

from bs4 import BeautifulSoup
html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>
"""
bs = BeautifulSoup(html_doc, 'lxml')
print(bs.p)			# <p class="title"><b>The Dormouse's story</b></p>

# decompose()方法
bs.b.decompose()
print(bs.p)			# <p class="name"></p>
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值