3.16（跟着学长学python）

禾太阳

已于 2022-03-21 23:08:35 修改

阅读量1k

点赞数 3

文章标签： python 开发语言

于 2022-03-17 00:02:05 首次发布

本文链接：https://blog.csdn.net/qq_58181376/article/details/123539507

版权

补充知识

BeautifulSoup

一.BeautifulSoup是将复杂HTML文档转换成一个复杂的树形结构，每个节点都是python对象，所有对象可以归纳为4种:

-Tag

-NavigableString

-BeautifulSoup

-Comment

1.Tag ：标签及其内容，只能拿到找到的第一个内容，第二常用

1.1 打印title

from bs4 import BeautifulSoup
file = open("./baidu.html","rb")
html = file.read()
bs =BeautifulSoup(html,"html.parser")  #parser解析器
print(bs.title)

结果：<title>百度一下，你就知道</title>

1.2 打印以a开头和以a结尾的内容

from bs4 import BeautifulSoup
file = open("./baidu.html","rb")
html = file.read()
bs =BeautifulSoup(html,"html.parser")  #parser解析器
print(bs.a)

结果：<a class="mnav" href="http://news.baidu.com" name="tj_trnews"></a>

1.3 打印以head开头和以head结尾的内容

from bs4 import BeautifulSoup
file = open("./baidu.html","rb")
html = file.read()
bs =BeautifulSoup(html,"html.parser")  #parser解析器
print(bs.head)

结果：

1.4 类别

from bs4 import BeautifulSoup
file = open("./baidu.html","rb")
html = file.read()
bs =BeautifulSoup(html,"html.parser")  #parser解析器
print(type(bs.title))
print(type(bs.a))
print(type(bs.head))

结果：

2.NavigableString ：标签里的内容，字符串

from bs4 import BeautifulSoup
file = open("./baidu.html","rb")
html = file.read()
bs =BeautifulSoup(html,"html.parser")  #parser解析器
print(bs.title.string)
print(type(bs.title.string))

结果：

百度一下，你就知道
<class 'bs4.element.NavigableString'>

3.BeautifulSoup：表示整个文档，最常用

from bs4 import BeautifulSoup
file = open("./baidu.html","rb")
html = file.read()
bs =BeautifulSoup(html,"html.parser")  #parser解析器

print(bs.name)
print(type(bs))

结果：

[document]
<class 'bs4.BeautifulSoup'>

from bs4 import BeautifulSoup
file = open("./baidu.html","rb")
html = file.read()
bs =BeautifulSoup(html,"html.parser")  #parser解析器
print(bs)

结果：整个文档

4.comment ：是一个特殊的NavigableString，输出的内容不包含注释符号

from bs4 import BeautifulSoup
file = open("./baidu.html","rb")
html = file.read()
bs =BeautifulSoup(html,"html.parser")  #parser解析器
print(bs.a.string)
print(type(bs.a.string))

结果：

新闻
<class 'bs4.element.Comment'>

5.补充 dict

from bs4 import BeautifulSoup
file = open("./baidu.html","rb")
html = file.read()
bs =BeautifulSoup(html,"html.parser")  #parser解析器
print(bs.a.attrs)  #拿到一个标签里的所有属性

print(type(bs.a.attrs))

结果：

{'class': ['mnav'], 'href': 'http://news.baidu.com', 'name': 'tj_trnews'}
<class 'dict'>