beautifulSoup库的使用案例

最新推荐文章于 2023-10-10 15:59:53 发布

conwinner

最新推荐文章于 2023-10-10 15:59:53 发布

阅读量198

点赞数

文章标签： python

本文链接：https://blog.csdn.net/kangnianguo/article/details/106745240

版权

先写一个入门案例

from urllib.request import urlopen
from bs4 import BeautifulSoup


url = ''

html = urlopen(url)
bs = BeautifulSoup(html, 'html.parser')

for child in bs.find('table', {'id':'giftList'}).children:
    print(child)

大量案例解析

find_all 获取到包含一个便签的所有列表

bs.find_all('table')[4].find_all('tr')[2].find('td').find_all('div')[1].find('a')

find_all 获取指定类名的标签的所有结果

# 获取 span 标签 类名为 green 的所有结果
nameList = bs.find_all('span', {'class':'green'})

get_text 清除所有标签，只返回包含文字的部分

nameList = bs.find_all('span',{'class':'green'}).get_text()

获取满足多个条件中的一个时的结果也可以用find_all

.find_all(['h1', 'h2', 'h3', 'h4', 'h5', 'h6'])
.find_all('span',{'class':{'green','red'}})

查找包含 the prince 内容的标签

nameList = bs.find_all(text='the prince')

返回第一个在class_属性中包含单词text并且在id属性中包含titile的标签

title = bs.find_all(id='title', class_='text')

注意一下两种方式完全一样

bs.find_all(id='text')
bs.find_all('',{'id':'text'})

加入正则表达式的案例

from urllib.request import urlopen
from bs4 import BeautifulSoup


url = ''

html = urlopen(url)
bs = BeautifulSoup(html, 'html.parser')

images = bs.find_all('img', {'src':re.compile('\.\.\/img\/gifts\/img.*\.jpg')})

for image in images:
    print(image['src'])

conwinner

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
beautifulSoup库的使用案例

先写一个入门案例from urllib.request import urlopenfrom bs4 import BeautifulSoupurl = ''html = urlopen(url)bs = BeautifulSoup(html, 'html.parser')for child in bs.find('table', {'id':'giftList'}).children: print(child)大量案例解析find_all 获取到包含一个便签的所有列表b
复制链接

扫一扫