BeautifulSoup

最新推荐文章于 2017-10-28 10:52:18 发布

小木头1209

最新推荐文章于 2017-10-28 10:52:18 发布

阅读量417

点赞数

分类专栏： python学习文章标签： python

本文链接：https://blog.csdn.net/jiasudu1234/article/details/72082583

版权

python学习专栏收录该内容

24 篇文章 1 订阅

订阅专栏

#encoding='utf-8'
from urllib.request import urlopen
from bs4 import BeautifulSoup
html=urlopen('http://pythonscraping.com/pages/page1.html')
html=urlopen('http://pythonscraping.com/pages/page1.html')
html1=urlopen('https://book.douban.com/')
#r=html1.read().decode('utf-8')
bsobj=BeautifulSoup(html1,from_encoding='utf-8')
g1=bsobj.findAll('h4',{'class':'title'})
for g in g1:
print(g.get_text())#去掉标签，获取内容

#####
findAll(tag,attributes,recuributes,recursive,text,limit,keywords)
find(tag,attributes,recursive,text,keywords)#limit=1时的findAll
tag:传递一标签或多标签。例如findAll（{'h2','h3','h4'}）
attributes:用python字典封装的一个标签的若干属性和对应的属性值。
例如findAll('span',{'class':{'green','red'}})返回文档里红色与绿色两种颜色的span标签。
recursive:布尔型，Ture查找标签参数的所有子标签，以及子标签的标签。False只查找一级标签。
text:用标签的文本内容去匹配，而不是标签的属性。例如我们查找文本中包含‘广西科学出版社’的数量。
namelist=bsobj.findAll(publisher='广西科学技术出版社')

print(len(namelist))

处理子标签和其他后代标签

子标签就是一个父标签的下一级，而后代标签是指一个父标签
下面所有级别的标签。例如，tr 标签是tabel 标签的子标签，而tr、th、td、img 和span
标签都是tabel 标签的后代标签（我们的示例页面中就是如此）。所有的子标签都是后代标

签，但不是所有的后代标签都是子标签。

小木头1209

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
BeautifulSoup

#encoding='utf-8'from urllib.request import urlopenfrom bs4 import BeautifulSouphtml=urlopen('http://pythonscraping.com/pages/page1.html')html=urlopen('http://pythonscraping.com/pages/page1.ht
复制链接

扫一扫

专栏目录