python学习笔记——beautifulsoup对象操作2

最新推荐文章于 2023-02-25 14:09:17 发布

Imadone

最新推荐文章于 2023-02-25 14:09:17 发布

阅读量219

点赞数

分类专栏： Python Python学习文章标签： python beautifulsoup对象操作

本文链接：https://blog.csdn.net/ShewMi/article/details/79471469

版权

Python 同时被 2 个专栏收录

18 篇文章 0 订阅

订阅专栏

Python学习

18 篇文章 0 订阅

订阅专栏

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>python learning</title>
</head>

<body>
	<h1>知足</h1>
	<div class="newClass" style="border: #000">
		一阵风
		<p id="song">  知足的快乐  </p>
		<p id="song">  快乐  </p>
		<p >知足的快乐</p>
		<p >  知足的快乐  </p>
		吹来
		<a href="http://www.baidu.com" target="blank">才是真的永久</a>
		<div>
			<p>五月天</p>
		</div>
	</div>
</body>
</html>

#!/user/bin/env python
#coding:utf-8
print 'beautifulsoup文档树操作'

from bs4 import BeautifulSoup

html = open('d:\\python\\demo.html')
bs = BeautifulSoup(html,'lxml')

#find_all方法：在当前节点中查找子节点，不包括自己
#find_all(self, name=None, attrs={}, recursive=True, text=None, limit=None, **kwargs)

#name:根据标签查找
print bs.div.find_all('p')

#**kwargs:根据属性查找
print bs.div.find_all(id='song')[0]

#attrs：根据属性字典查找
print bs.div.find_all(attrs={'target':"blank"})

#recursive：是否查找孙节点，默认是，否则仅查找直接子节点
#limit:查找的个数限制，默认无限制
print bs.div.find_all(['a','p'],id='song',recursive = False,limit = 10)

#跟string属性类似的输出特性
print bs.div.find_all(text='知足的快乐')

print '====================================='
#以上find的参数基本适用于其他find方法
#遍历父节点
for par in bs.p.find_parents():
    print par.name

print '====================================='
#向下遍历
for next in bs.p.find_all_next():
    print next
    
#除此之外兄弟也是类似的
#find和find_all区别在于find只返回第一个查找的
print '====================================='

#使用类似CSS的选择器查找
print bs.select('p',limit =  1)
print bs.select('.newClass')
print bs.select('#song')
print '======================================'

#查找子标签或组合
print bs.select('div > a')
print bs.select('div a[target="blank"]')

beautifulsoup文档树操作
[<p id="song">\xa0\xa0\u77e5\u8db3\u7684\u5feb\u4e50\xa0\xa0</p>, <p id="song">\xa0\xa0\u5feb\u4e50\xa0\xa0</p>, <p>\u77e5\u8db3\u7684\u5feb\u4e50</p>, <p>\xa0\xa0\u77e5\u8db3\u7684\u5feb\u4e50\xa0\xa0</p>, <p>\u4e94\u6708\u5929</p>]
<p id="song">  知足的快乐  </p>
[<a href="http://www.baidu.com" target="blank">\u624d\u662f\u771f\u7684\u6c38\u4e45</a>]
[<p id="song">\xa0\xa0\u77e5\u8db3\u7684\u5feb\u4e50\xa0\xa0</p>, <p id="song">\xa0\xa0\u5feb\u4e50\xa0\xa0</p>]
[u'\u77e5\u8db3\u7684\u5feb\u4e50']
=====================================
div
body
html
[document]
=====================================
<p id="song">  快乐  </p>
<p>知足的快乐</p>
<p>  知足的快乐  </p>
<a href="http://www.baidu.com" target="blank">才是真的永久</a>
<div>
<p>五月天</p>
</div>
<p>五月天</p>
=====================================
[<p id="song">\xa0\xa0\u77e5\u8db3\u7684\u5feb\u4e50\xa0\xa0</p>]
[<div class="newClass" style="border: #000">\n\t\t\u4e00\u9635\u98ce\n\t\t<p id="song">\xa0\xa0\u77e5\u8db3\u7684\u5feb\u4e50\xa0\xa0</p>\n<p id="song">\xa0\xa0\u5feb\u4e50\xa0\xa0</p>\n<p>\u77e5\u8db3\u7684\u5feb\u4e50</p>\n<p>\xa0\xa0\u77e5\u8db3\u7684\u5feb\u4e50\xa0\xa0</p>\n\t\t\u5439\u6765\n\t\t<a href="http://www.baidu.com" target="blank">\u624d\u662f\u771f\u7684\u6c38\u4e45</a>\n<div>\n<p>\u4e94\u6708\u5929</p>\n</div>\n</div>]
[<p id="song">\xa0\xa0\u77e5\u8db3\u7684\u5feb\u4e50\xa0\xa0</p>, <p id="song">\xa0\xa0\u5feb\u4e50\xa0\xa0</p>]
======================================
[<a href="http://www.baidu.com" target="blank">\u624d\u662f\u771f\u7684\u6c38\u4e45</a>]
[<a href="http://www.baidu.com" target="blank">\u624d\u662f\u771f\u7684\u6c38\u4e45</a>]

Imadone

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python学习笔记——beautifulsoup对象操作2

&lt;!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"&gt;&lt;html xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;head&gt;&lt;met
复制链接

扫一扫