python爬虫学习第十二天

最新推荐文章于 2024-02-02 20:51:26 发布

可惜没有如果

最新推荐文章于 2024-02-02 20:51:26 发布

阅读量194

点赞数

分类专栏：学习笔记文章标签： python

本文链接：https://blog.csdn.net/qq_34194478/article/details/76736070

版权

学习笔记专栏收录该内容

45 篇文章 0 订阅

订阅专栏

今天学习了用Beautifulsoup函数来获取指定的节点，以及用当前结点顺藤摸瓜找到其子节点，后代节点，兄弟节点，父节点。

练习1 findAll 函数抽取只包含在标签里的文字
还顺便把class=’red’标签里的内容也提取了

# from urllib.request import urlopen
# from bs4 import BeautifulSoup

# r = urlopen('http://www.pythonscraping.com/pages/warandpeace.html')
# bsObj = BeautifulSoup(r)
# persons = bsObj.findAll('span',{'class':'green'})
# conversasions = bsObj.findAll('span',{'class':'red'})
# for name in persons:
#   print(name.get_text())
# print('\n')
# for talks in conversasions:
#   print(talks.get_text())

练习2 查找内容匹配的html元素
查找html元素在昨天已经练习过了就是find/findall函数。
利用这两个函数的tag参数与tagAtrribute参数可以让我们检索大多数标签，此外我们还可以通过text参数（下面的例子正是如此）匹配内容包含制定字符串的标签

# from urllib.request import urlopen
# from bs4 import BeautifulSoup

# r = urlopen('http://www.pythonscraping.com/pages/warandpeace.html')
# bsObj = BeautifulSoup(r)
# test = bsObj.findAll(text = 'the prince')
# print(len(test))

练习3 子标签和后代标签注意他们的区别

子标签就是一个父标签的下一级，而后代标签是指一个父标签下面所有级别的标签。所有的子标签都是后代标签，但不是所有的后代标签都是子标签。

# from urllib.request import urlopen
# from bs4 import BeautifulSoup

# r = urlopen('http://www.pythonscraping.com/pages/page3.html')

# bsObj = BeautifulSoup(r)
# for child in bsObj.find('table',{'id':'giftList'}).children:
#   print(child)
# print('\n')
# for descendant in bsObj.find('table',{'id':'giftList'}).descendants:
#   print(descendant)

练习4 用next_siblings获取兄弟节点

# from urllib.request import urlopen
# from bs4 import BeautifulSoup

# r = urlopen('http://www.pythonscraping.com/pages/page3.html')
# bsObj = BeautifulSoup(r)
# for sibling in bsObj.find('table',{'id':'giftList'}).tr.next_siblings:
#   print(sibling)

练习5 用parent/parents操作父节点

# from urllib.request import urlopen
# from bs4 import BeautifulSoup

# r = urlopen('http://www.pythonscraping.com/pages/page3.html')
# bsObj = BeautifulSoup(r)
# money = bsObj.find('img',{'src':'../img/gifts/img1.jpg'}).parent.previous_sibling
# print(money.get_text())

可惜没有如果

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python爬虫学习第十二天

今天学习了用Beautifulsoup函数来获取指定的节点，以及用当前结点顺藤摸瓜找到其子节点，后代节点，兄弟节点，父节点。练习1 findAll 函数抽取只包含在标签里的文字还顺便把class=’red’标签里的内容也提取了# from urllib.request import urlopen# from bs4 import BeautifulSoup# r = urlopen('h
复制链接

扫一扫