python网络数据采集练习2

最新推荐文章于 2024-08-14 12:00:04 发布

kleine23fpts_zz

最新推荐文章于 2024-08-14 12:00:04 发布

阅读量148

点赞数

分类专栏： python 练习文章标签： python

本文链接：https://blog.csdn.net/kleine23fpts_zz/article/details/80152792

版权

练习同时被 2 个专栏收录

5 篇文章 0 订阅

订阅专栏

python

2 篇文章 0 订阅

订阅专栏

基于书籍《python网络数据采集》（[美]Ryan Mitchell)

第二章2.2.3

子标签，兄弟标签，父标签。

1.处理子标签（两种，第一种children标签，另一种descendants标签，children仅是下一级，descendants则是父标签下所有级别。

from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen ("http://www.pythonscraping.com/pages/page3.html");
bsObj = BeautifulSoup(html,"html.parser")

#children只是下一级，descendants是所有级别后代。
for child in bsObj.find("table",{"id":"giftList"}).children:
	print(child)

2.处理兄弟标签

对象不能把自己当作兄弟标签，且分为next_siblings和previous_siblings从前和从后两种找兄弟函数。

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.pythonscraping.com/pages/page3.html")
bsObj = BeautifulSoup(html,"html.parser")

#找到一组兄弟从第一个开始用next_siblings,从最后一个则用previous_sibings
for sibling in bsObj.find("table",{"id":"giftList"}).tr.next_siblings:
	print(sibling)

3.处理父标签

原理同上

from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen("http://www.pythonscraping.com/pages/page3.html")
bsObj = BeautifulSoup(html,"html.parser")

print(bsObj.find("img",{"src":"../img/gifts/img1.jpg"}).parent.previous_sibling.get_text())
#这里<td>是src:../img/gifts/img1.jpg的父类，先parent找到这个父类在找钱一个具有价格信息的兄弟在输出信息

kleine23fpts_zz

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python网络数据采集练习2

基于书籍《python网络数据采集》（[美]Ryan Mitchell)第二章2.2.3子标签，兄弟标签，父标签。1.处理子标签（两种，第一种children标签，另一种descendants标签，children仅是下一级，descendants则是父标签下所有级别。from urllib.request import urlopenfrom bs4 impor
复制链接

扫一扫