BeautifulSoup解析库

最新推荐文章于 2022-08-17 21:27:39 发布

立乱来

最新推荐文章于 2022-08-17 21:27:39 发布

阅读量419

点赞数

分类专栏：笔记 Python html 文章标签：笔记 python 爬虫

本文链接：https://blog.csdn.net/qq_42107431/article/details/121468236

版权

笔记同时被 3 个专栏收录

81 篇文章 1 订阅

订阅专栏

Python

14 篇文章 0 订阅

订阅专栏

html

9 篇文章 0 订阅

订阅专栏

BeautifulSoup解析库

文章目录

BeautifulSoup解析库

介绍

从HTML或者xml文件中提取数据的Python库

安装

BeautifulSoup 4

pip install bs4

#安装lxml

pip install lxml

BeautifulSoup对象介绍与创建

# 导入模块
from bs4 import BeautifulSoup

def test():
    #创建beautifulSoup对象
    suop = BeautifulSoup('<html>data</html>','lxml')
    print(suop)

if __name__ == '__main__':
    test()

解决警告问题：在创建对象是要指明解析器，如：‘lxml’；

BeautifulSoup 对象的find方法

find方法的作用：搜索文档树


# 导入模块
from bs4 import BeautifulSoup
# 准备文档字符串
htmltxt = '''<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Document对象</title>
</head>
<body>

<div id="d1">div1</div>
<div id="d2">div2</div>
<div id="d3">div3</div>

<div class="cls1">div4</div>
<div class="cls2">div5</div>
<input type="text" name="username">

<script>
    /**
     *
     Document：文档对象

     1. 创建(获取)：在HTML dom模型中可以使用window对象来获取
         1. window.document
         2. document
     2. 方法：
         1. 获取Element对象：
             1. getElementById()	查找具有指定的唯一 ID 的元素。id属性值一般唯一
             2. getElementsByTagName()	返回所有具有指定名称的元素节点。返回值是一个数组
             3. getElementsByClassName()：根据Class属性值获取元素对象们。返回值是一个数组
             4. getElementsByName()	返回带有指定名称的对象集合。返回值也是一个数组。
         2. 创建其他DOM对象
             1. createAttribute(name)	创建拥有指定名称的属性节点，并返回新的 Attr 对象。
             2. createComment()	创建注释节点。
             3. createElement()	创建元素节点。
             4. createTextNode()	创建文本节点。

     3. 属性

     */
    //返回所有具有指定名称的元素节点。返回值是一个数组
    var ids = document.getElementsByTagName("div");
    // alert(ids.length);

    //根据Class属性值获取元素对象们。返回值是一个数组
    var div_cls = document.getElementsByClassName("cls1");
    // alert(div_cls.length);

    //返回带有指定名称的对象集合。返回值也是一个数组。
    var names = document.getElementsByName("username");
    alert(names.length);
</script>

</body>
</html>'''

# 创建BeautifulSoup对象
soup = BeautifulSoup(htmltxt,'lxml')
# 查找标签
meta=soup.find('script')
# 查找所有的div标签
d_s = soup.find_all('div')

# 根据属性进行查找
# 查找id为d1的标签
a = soup.find(attrs={'id':'d1'})
# a = soup.find(id='d1')

# 根据文本查找
# 查找文本div3
txt = soup.find(text='div3')

print(txt)
print(type(txt))

# Tag对象，常用属性
print(a.name)# 获取所在标签的名称
print("标签所有属性：",a.attrs)
print("标签所有内容：",a.text)
# if(txt):
#     print("OK")

实现


import requests
from bs4 import  BeautifulSoup

res = requests.get('https://ncov.dxy.cn/ncovh5/view/pneumonia')
home_p = res.content.decode('utf-8')
# print(home_p)
soup = BeautifulSoup(home_p,'lxml')
script = soup.find(attrs={'id':'getListByCountryTypeService2true'})
print(script.text)# 拿到script标签的内容

p = BeautifulSoup(home_p,‘lxml’)
script = soup.find(attrs={‘id’:‘getListByCountryTypeService2true’})
print(script.text)# 拿到script标签的内容

立乱来

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
BeautifulSoup解析库

BeautifulSoup解析库文章目录BeautifulSoup解析库介绍安装BeautifulSoup对象介绍与创建BeautifulSoup 对象的find方法实现介绍从HTML或者xml文件中提取数据的Python库安装BeautifulSoup 4pip install bs4#安装lxmlpip install lxmlBeautifulSoup对象介绍与创建# 导入模块from bs4 import BeautifulSoupdef test(): #创建
复制链接

扫一扫