BeautifulSoup之find_all()

最新推荐文章于 2023-01-05 16:16:43 发布

Gao__xi

最新推荐文章于 2023-01-05 16:16:43 发布

阅读量1.8k

点赞数 1

分类专栏： Python爬虫基础文章标签： find_all

本文链接：https://blog.csdn.net/Gao__xi/article/details/88652172

版权

Python爬虫基础专栏收录该内容

7 篇文章 0 订阅

订阅专栏

代码

import requests
from bs4 import  BeautifulSoup
path="https://blog.csdn.net/Gao__xi/article/details/88607021"
header={ "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.108 Safari/537.36 2345Explorer/8.1.0.14126"}
html=requests.get(path,header)
soup =BeautifulSoup(html.text,"lxml")
print(soup.prettify())
'''
1.find_all() 主要返回符合条件的结果集
'''
"""
find_all()参数
1.查询标签名为指定内容的Tag对象  "a" "p" "div"
2.查询标签中有指定属性的Tag对象  attr={} class_  id href
3.string="" 寻找所有html文档中与string 匹配字符串
4.限制查询的数量 limit
5.查询直接子标签 recursive 
"""

'''
1.通过标签名获取标签
'''
##寻找所有a标签 获得的是所有 a的Tag对象列表结果集
print(type(soup.find_all("a")[0]))
print(type(soup.find_all("a")))

##寻找两个或两个以上的所有标签
print(soup.find_all(["title","a"]))
print(soup.find_all(["title","a"])[0])

'''
2.依据标签属性过滤
'''
###1.可以搜索指定属性值的Tag对象
#写法1
print(soup.find_all(href="https://blog.csdn.net/gao__xi/article/category/8764721"))
#特别注意class一定这样写 class_
print(soup.find_all(class_="hover-show text text3"))
#写法2
print(soup.find_all(attrs={"href":"https://blog.csdn.net/gao__xi/article/category/8764721"}))
print(soup.find_all(attrs={"class":"hover-show text text3"}))

###多个属性限制过滤 [<a class="clearfix" href="https://blog.csdn.net/gao__xi/article/category/8764721">]
#写法1
print(soup.find_all(class_="clearfix",href="https://blog.csdn.net/gao__xi/article/category/8764721"))
#写法2
print(soup.find_all(attrs={"class":"clearfix","href":"https://blog.csdn.net/gao__xi/article/category/8764721"}))
###########搜索属性存在的标签############
print(soup.find_all(class_=True,href="https://blog.csdn.net/gao__xi/article/category/8764721"))

'''
1&2.标签名和标签属性组合过滤
'''
print(soup.find_all("span",attrs={"class":"title"}))

'''
3.通过string来搜索 搜索文档中 含有的字符串 一般和标签名过滤混合使用
'''
print(soup.find_all(text="Python爬虫基础"))
print(type(soup.find_all(text="Python爬虫基础")))
print(soup.find_all(string='Python爬虫基础'))
print(type(soup.find_all(string="Python爬虫基础")))
##混合使用
print(soup.find_all("span",string="Python爬虫基础"))
print(soup.find_all("span",text="Python爬虫基础"))

'''
['Python爬虫基础']
<class 'bs4.element.ResultSet'>
['Python爬虫基础']
<class 'bs4.element.ResultSet'>
'''
'''
4 限制查询的个数
'''
##限制查询数量
print(len(soup.find_all("a"))) #所有的a共 241
print(len(soup.find_all("a",limit=3))) #限制只查询3个


'''
5.recursive 只想搜索直接子节点
调用tag的 find_all() 方法时,Beautiful Soup会检索当前tag的所有子孙节点,
如果只想搜索tag的直接子节点,
可以使用参数 recursive=False .
'''
print(soup.head.find_all("meta",recursive=False))

总结

   find_all()的最终目的就是为了能够同过属性的限制，name ，string ，attr，limit等查询出想要的标签以及内容，响应的还有，find(),find_parents() 和 find_parent(),find_next_siblings() 和 find_next_sibling()等等，用法类似。

Gao__xi

关注

1
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
BeautifulSoup之find_all()

代码import requestsfrom bs4 import BeautifulSouppath="https://blog.csdn.net/Gao__xi/article/details/88607021"header={ "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, ...
复制链接

扫一扫

专栏目录