Python网络数据采集-BeautifulSoup

最新推荐文章于 2021-05-29 11:02:39 发布

xinmengsiyuan

最新推荐文章于 2021-05-29 11:02:39 发布

阅读量530

点赞数

分类专栏： python

本文链接：https://blog.csdn.net/xinmengsiyuan/article/details/54972691

版权

python 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

from urllib.request import urlopen
html = urlopen("http://pythonscraping.com/pages/page1.html")

print(html.read())

from bs4 import BeautifulSoup
from urllib.request import urlopen
html = urlopen("http://pythonscraping.com/pages/page1.html")
bsobj = BeautifulSoup(html.read(),'html.parser')
print(bsobj)

#两种异常,1:网页在服务器上不存在,2:服务器不存在
from bs4 import BeautifulSoup
from urllib.request import urlopen
def getTitle(url):
try:
html = urlopen(url)
except HTTPError as e:
return None
#异常,网页在服务器上不存在,不再执行 else 语句后面的代码
try:
bsobj = BeautifulSoup(html.read(),'html.parser')
title = bsobj.body.h1
except AttributeError as e:
return None
return title
title = getTitle("http://pythonscraping.com/pages/page1.html")
if title == None:
print("Title is none")
else:
print(title.text)