1. 在指定网站爬取指定class的信息:
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.pythonscraping.com/pages/warandpeace.html")
bsObj = BeautifulSoup(html)
nameList = bsObj.findAll("span", {"class":"green"})
for name in nameList:
print(name.get_text())
2. find和findAll函数的情况
findAll(tag,attributes,rescursive,text,limit,keywords)
find(tag,attributes,rescursive,text,keywords)
tag 为标签名称
findAll({"h1","h2","h3"})
attributes 是对应的属性值
nameList = bsObj.findAll("span", {"class":"green"})
rescursive 是布尔值
True是所有标签
Fasle就只查一级标签
text是用标签的文本内容去匹配
比如:
nameList = bsObj.findAll(text="the prince")
print(len(nameList))
limit