遍历的实例如下:
from bs4 import BeautifulSoup
import requests
url="https://www.kaola.com/"
r=requests.get(url)
r.encoding=r.apparent_encoding
soup=BeautifulSoup(r.text,"html.parser")
print(soup.head)
print(soup.body.contents) #结果为一个列表,子节点的列表
print(len(soup.body.contents))
#遍历儿子节点
for child in soup.body.children:
print(child)
#遍历子孙节点
for child in soup.header.descendants:
print(child)
#标签树的上行遍历
for parent in soup.a.parents:
if parent is None:
print(parent)
else:
print(parent.name)
#平行遍历
print(soup.a.next_sibling) #遍历后续节点
print(soup.a.previous_sibling) #遍历前续节点
更好地阅读:
(soup.a.prettify()) #规范化,更好的阅读
结果
<a class="app" data-param="from=topDownload&zn=top" href="//app.kaola.com" rel="nofollow">
手机考拉
<span class="m-notice">
<img height="116px" src="https://haitao.nos.netease.com/9c6b0d6c-4a71-4018-8fb6-46496f51caff.png" width="116px"/>
<strong class="txt">
下载APP
<br/>
领1000元新人礼包
</strong>
<span class="arrow">
<span class="arr">
</span>
<span class="arr1">
</span>
</span>
</span>
</a>
它在每个标签后都加了\n,也能对中文有很好的支持。