beautifulsoup功能

#coding:utf-8
from urllib.request import urlopen
from bs4 import BeautifulSoup
from urllib.error import HTTPError,URLError

def getinfo(url):
try:
html = urlopen(url) #读取网页,html.read()为其源代码
bsobj = BeautifulSoup(html.read(),"lxml") #用beautifulsoup读取网页源代码
title = bsobj.h1 #获取网页title
nameList = bsobj.findAll("span",{"class":"green"})
all_theprince = bsobj.findAll(text="the prince")
except (HTTPError,URLError,ArithmeticError) as e: #网页错误,服务器不存在,尝试访问未知对象
return None
return title,nameList,all_theprince
url_Info = getinfo("http://www.pythonscraping.com/pages/warandpeace.html")
try:
title = url_Info[0] #调用getTitle函数,获取网站的title
print(title)
nameList = url_Info[1] #获取nameList
for name in nameList: #遍历nameList列表
print(name.get_text()) #去除标签格式,输出文本
all_theprince = url_Info[2]
print(len(all_theprince))

except:
print("URL could not be found")

转载于:https://www.cnblogs.com/kylechen/p/8557589.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值