立即学习:https://edu.csdn.net/course/play/24756/280700?utm_source=blogtoedu
:1、’find_all:返回的是一个列表
from bs4 import BeautifulSoup
html=""" """
soup = BeautifulSoup(html,"lxml")
(1)获取所有tr标签
trs=soup.find_all('tr')
for tr in trs:
print(tr)
print('-'*50)
(2)获取第2个tr标签:
tr=soup.find_all('tr',limit=2)[1]
print(tr)
(find_all('tr',limit=2)表示前两个)
(3)获取所有class=even的tr标签:
trs=soup.find_all('tr',class_='even')(class_是为了避免class重复,报错)
or trs=soup.find_all('tr',attrs={'class':'even'})
(4)将所有id=test,class=test的a标签提取出来:
list=soup.find_all('a',id='test',class_='test')
for a in list:
print(a)
(5)获取所有a标签的href属性:
alist=soup.find_all('a')
for a in alist:
href=a['href']
print(href)
或者:
for a in alist:
href=a.attrs['href']
print(href)
(6)获取所有职位信息,纯文本
trs=soup.find_all('tr')[1:]
for tr in trs:
tds=tr.find_all('td')
name=tds[0].string
print(name)
或者:
infos=list(tr.stripped_strings)
print(infos)