python学习爬虫（5）--BeautifulSoup遍历文档树：.contens, .children, .descendants等

最新推荐文章于 2024-05-02 05:09:18 发布

IT小样

最新推荐文章于 2024-05-02 05:09:18 发布

阅读量1.4k

点赞数 2

分类专栏： Python爬虫文章标签：遍历文档树父节点子孙节点兄弟节点

本文链接：https://blog.csdn.net/weixin_31315135/article/details/88824657

版权

作者：IT小样
本篇主要介绍对BeautifulSoup的引用，以之前教程中的HTML为例：

html_doc = '''
<html><head><title>hello,tester</title></head><body>
<p class="title"><b><h1>Hello,welcome</h1></b></p>
<p class="documentation">Tester, welcome! This is a new partion of your job's life. With python, you can finnish your work easier and faster.How, <a href="http://example.com/easier" class="easier" id="link1"> easier </a> and <a href="http://example.com/faster" class="faster" id = "link2">faster</a> Now, you have a initial impression about python.</p>
<p class="documention">let's go!!!</p> 
</body></html>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc)

从上面的html_doc的定义来看，tag中包含子节点，那如何操作与遍历呢？

1、操作文档

1.1、获取元素

通过tag的name来获取元素值
>>>tag.head
<head><title>hello,tester</title></head>
>>>tag.body.p
<h1>Hello,welc

最低0.47元/天解锁文章

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

IT小样

关注关注

2
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
python学习爬虫（5）--BeautifulSoup遍历文档树：.contens, .children, .descendants等

作者：IT小样本篇主要介绍对BeautifulSoup的引用，以之前教程中的HTML为例：html_doc = '''<html><head><title>hello,tester</title></head><body><h1>Hello,w...
复制链接

扫一扫