17 - 05 - 26 Python contents / children / descendants 区别

最新推荐文章于 2024-07-03 03:52:27 发布

Sodaoo

最新推荐文章于 2024-07-03 03:52:27 发布

阅读量2.5k

点赞数

分类专栏： Python 文章标签： python BeautifulSoup descendants

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/SoDaoo/article/details/70230128

版权

Python 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

先说说导航树：

# Navigating Trees 导航树：

The findAll function is responsible for finding tags based on their name and attribute.

（依靠标签的名称和属性来查找标签）

但是如果你需要通过标签在文档中的位置来查找标签，该怎么办？

某HTML文件就可以映射成为这样一棵具有明确亲子关系的树：

html

— body

— div.wrapper

— h1

— div.content

— table#giftList

— tr

— th

— th

— th

— th

— tr.gift#gift1

— td

.......

一般BeautifulSoup函数总是处理当前标签的后代标签，例如：bs0bj.body.h1，

类似的，bs0bj.div.findAll("img")会找出文档中第一个div标签，然后获取这个div后代里的所有img标签列表。

可是如果你只是想找出子标签：可以用 .children ：

>> from urllib.request import urlopen

>> from bs4 import BeautifulSoup

>> html = urlopen("www.pyth..ng.com/pages/page3.html")

>> bsObj = BeautifulSoup(html)

>> for child in bsObj.find("table",{"id":"giftList"}).children:

>> print(child)

This code prints out all of the list of product rows in the giftList table

(table giftlist下所有的直接子标签的内容包括标签/属性/文字/)

# 注意 .contents / .children / .descendants(后代) 的区别：

tag的 .contents 属性可以将tag的子节点以列表的方式输出:

>>>head_tag

<head><title>The Dormouse's story</title></head>

>>>head_tag.contents

<title>The Dormouse's story</title>

>>>title_tag = head_tag.contents[0]

>>>title_tag.contents

The Dormouse's story

BeautifulSoup 对象本身一定会包含子节点,也就是说<html>标签也是 BeautifulSoup 对象的子节点:

soup.contents[0].name

# u'html'

字符串没有 .contents 属性,因为字符串没有子节点:

通过tag的 .children 生成器,可以对tag的子节点进行循环:

>>>for child in title_tag.children:

>>> print(child)

The Dormouse's story

综上 .contents 和 .children 属性仅包含tag的直接子节点 .

例如,<head>标签只有一个直接子节点(儿子)：<title>

>>>head_tag.contents

<title>The Dormouse's story</title>

但是<title>标签自身也包含一个子节点 : 字符串："The Dormouse’s story",

这种情况下字符串"The Dormouse’s story"属于<head>标签的子孙节点 .

.contents 和 .children并不能输出这个"孙节点" ,

而： .descendants 属性可以对所有tag的子孙节点进行递归循环 :

>>>for child in head_tag.descendants:

>>> print(child)

<title>The Dormouse's story</title>

The Dormouse's story

--------------------取材《Web scraping...》 / BeautifulSoup 官方文档。

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。