Beautiful Soup的用法（三）：bs4.element.Tag的用法

最新推荐文章于 2024-08-16 17:18:56 发布

go2coding

最新推荐文章于 2024-08-16 17:18:56 发布

阅读量1.4w

点赞数 9

分类专栏： Beautiful Soup的用法文章标签： Beautiful Soup

本文链接：https://blog.csdn.net/weixin_40425640/article/details/124032576

版权

Beautiful Soup的用法专栏收录该内容

7 篇文章

订阅专栏

在上一节中，通过分析了Beautiful Soup 中各个元素之间的关系，到最后我们定在了bs4.element.Tag这个关键类上，很多关键信息的提取都需要使用到bs4.element.Tag，需要进一步的看看bs4.element.Tag 在提取上，有哪些可以用的方法。

先来看看bs4.element.Tag 具体的是指什么？

#!/usr/bin/python  
#coding=utf-8  
  
from bs4 import BeautifulSoup  
  
html = """  
<html><head><title>The Dormouse's story</title></head>  
<body>  
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>  
<p class="story">Once upon a time there were three little sisters; and their names were  
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and  
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;  
and they lived at the bottom of a well.</p>  
</body>  
</html>  
"""  
  
soup = BeautifulSoup(html, "lxml")  
a_tag = soup.find('a')  
print 'a_tag type:',type(a_tag)  
print a_tag

输出的结果如下：

a_tag type: <class 'bs4.element.Tag'>
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>

通俗点Tag 是讲就是 HTML 中的一个个标签。

获取标签name

在提取关键信息中，这个并不是重点，name 可以获得标签的名称，比如a：

print a_tag.name

输出的结果为：

获取属性值

能够获取到属性特征是有用的特性，比如在a的标签中，我们需要获取到链接地址，使用[] 获取属性，如’href’：

print a_tag['href']

输出为：

http://example.com/lacie

获取关键值

在分析时，最常用到的是如何获取网页上显示的数据，也就是html的值，bs4.element.Tag 可以通过text或者get_text()来获取tag的值。

print a_tag.text

获取的结果为：

Lacie

我们通过name，[]，text 获取了html中关键的熟悉，这对我们来说是相当有用的，也相当的方便。