3.16(跟着学长学python)

补充知识

BeautifulSoup

一.BeautifulSoup是将复杂HTML文档转换成一个复杂的树形结构, 每个节点都是python对象,所有对象可以归纳为4种:

-Tag

-NavigableString

-BeautifulSoup

-Comment

1.Tag :  标签及其内容,只能拿到找到的第一个内容,第二常用

1.1 打印title

from bs4 import BeautifulSoup
file = open("./baidu.html","rb")
html = file.read()
bs =BeautifulSoup(html,"html.parser")  #parser解析器
print(bs.title)

结果:<title>百度一下,你就知道</title>

1.2 打印以a开头和以a结尾的内容

from bs4 import BeautifulSoup
file = open("./baidu.html","rb")
html = file.read()
bs =BeautifulSoup(html,"html.parser")  #parser解析器
print(bs.a)

结果:<a class="mnav" href="http://news.baidu.com" name="tj_trnews"><!--新闻--></a>

1.3 打印以head开头和以head结尾的内容

from bs4 import BeautifulSoup
file = open("./baidu.html","rb")
html = file.read()
bs =BeautifulSoup(html,"html.parser")  #parser解析器
print(bs.head)

结果:

<head>
<meta content="text/html;charest=utf-8" http-equiv="content-type"/>
<meta content="IE=Edge" http-equiv="X-UA-Compatible"/>
<meta content="always" name="referrer"/>
<link href="https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/bdorz/baidu.min.css" rel="stylesheet" type="text/css"/>
<title>百度一下,你就知道</title>
</head>

1.4  类别

from bs4 import BeautifulSoup
file = open("./baidu.html","rb")
html = file.read()
bs =BeautifulSoup(html,"html.parser")  #parser解析器
print(type(bs.title))
print(type(bs.a))
print(type(bs.head))

结果:

<class 'bs4.element.Tag'>
<class 'bs4.element.Tag'>
<class 'bs4.element.Tag'>

2.NavigableString :标签里的内容,字符串

from bs4 import BeautifulSoup
file = open("./baidu.html","rb")
html = file.read()
bs =BeautifulSoup(html,"html.parser")  #parser解析器
print(bs.title.string)
print(type(bs.title.string))

结果:

百度一下,你就知道
<class 'bs4.element.NavigableString'>

3.BeautifulSoup:   表示整个文档,  最常用

from bs4 import BeautifulSoup
file = open("./baidu.html","rb")
html = file.read()
bs =BeautifulSoup(html,"html.parser")  #parser解析器

print(bs.name)
print(type(bs))

结果:

[document]
<class 'bs4.BeautifulSoup'>

from bs4 import BeautifulSoup
file = open("./baidu.html","rb")
html = file.read()
bs =BeautifulSoup(html,"html.parser")  #parser解析器
print(bs)


结果:整个文档

4.comment :是一个特殊的NavigableString,输出的内容不包含注释符号

from bs4 import BeautifulSoup
file = open("./baidu.html","rb")
html = file.read()
bs =BeautifulSoup(html,"html.parser")  #parser解析器
print(bs.a.string)
print(type(bs.a.string))

结果:

新闻
<class 'bs4.element.Comment'>

5.补充 dict

from bs4 import BeautifulSoup
file = open("./baidu.html","rb")
html = file.read()
bs =BeautifulSoup(html,"html.parser")  #parser解析器
print(bs.a.attrs)  #拿到一个标签里的所有属性

print(type(bs.a.attrs))

结果:

{'class': ['mnav'], 'href': 'http://news.baidu.com', 'name': 'tj_trnews'}
<class 'dict'>

 

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值