python捕获屏幕的标准库_Python标准库urllib2的使用和获取网站状态举例

最新推荐文章于 2023-02-15 17:25:11 发布

weixin_39760619

最新推荐文章于 2023-02-15 17:25:11 发布

阅读量124

点赞数

文章标签： python捕获屏幕的标准库

本文链接：https://blog.csdn.net/weixin_39760619/article/details/111426978

版权

Python 2.7标准库中的urllib2以urlopen函数的形式提供了一个非常简单的接口，我们可以使用这个函数来获取网站内容，比如可以用它来做网络爬虫。当然Urllib2也同样提供一个比较复杂的接口来处理复杂情况，例如：基础验证、cookies、代理等。

基本使用

urlopen函数可以接受一个字符串类型url或者一个request对象。

正常的返回对象中主要有这几个方法。

read()：获取网站全部html代码

info()：获取meta-information信息，比如服务器发送的头headers信息。

geturl()：获取真实打开的地址，通常可以识别网址是否设置跳转。这个urllib2会帮你完成，最后得到的是真实地址。

getcode()：获取http返回代码。

1、直接打开url

import urllib2

response = urllib2.urlopen('https://zhangnq.com/')

html= response.read()

print html

2、request对象访问

import urllib2

url='https://zhangnq.com/'

req=urllib2.Request(url)

response=urllib2.urlopen(req,timeout=30)

html= response.read()

print html

这里urlopen指定timeout超时时间。

3、传递data参数

如果你需要发送数据到URL，比如用户登录，那么HTTP中这个经常使用POST请求发送。这个步骤通常在你提交一个HTML表单时由浏览器完成。在python程序里如何使用POST提交任意的数据？首先需要把data编码成标准格式，然后作为data参数传递给Request对象，最后提交。编码工作使用urllib中的urlencode方法来完成。

import urllib

import urllib2

url = 'https://zhangnq.com/'

values = {'username' : 'sijitao',

'password' : 'passw0rd'}

data = urllib.urlencode(values)

req = urllib2.Request(url, data)

response = urllib2.urlopen(req)

html = response.read()

print html

如果需要使用GET请求发送，那么把编码后的data数据和url相加再提交即可。

import urllib

import urllib2

url="http://www.baidu.com/"

data={}

data['wd']='site:blog.nbhao.org'

url_values=urllib.urlencode(data)

furl=url+'s?'+url_values

req=urllib2.Request(furl)

response = urllib2.urlopen(req,timeout=5)

html = response.read()

print html

4、异常处理

一般使用URLError这个异常。在没有网络连接或者服务器不存在的情况时，URLError异常一般会带有"reason"属性。在网址不存在或者其他服务器错误时，我们可以捕获URLError中的code属性。

import urllib2

url='https://zhangnq.com/'

req=urllib2.Request(url)

response = None

try:

response = urllib2.urlopen(req,timeout=5)

print response.getcode()

print response.geturl()

print response.info()

#print response.read()

except urllib2.URLError as e:

print e

if hasattr(e, 'code'):

print 'Error code:',e.code

#print e.read()

print e.geturl()

print e.info()

elif hasattr(e, 'reason'):

print 'Reason:',e.reason

except:

pass

finally:

if response:

response.close()

urllib2库的基本使用一般就这些。

获取网站状态举例

背景是如何让程序判断一个网址导航站(http://www.hostunion.net/)中网址是否正常。有了urllib2的基本使用和异常的处理，一般就可以解决。例子中使用了pickle模块，判断如果超过5次异常就删除网站。例子如下。

def webCheck(timeout=60):

result=sqlExecute("select id,url from websites where status = 3")

webCheck_pkl='data/webCheck.pkl'

try:

f=file(webCheck_pkl,'rb')

web_dict=pickle.load(f)

f.close()

except:

web_dict={}

l=[]

if result:

for row in result:

url='http://'+row['url']

req=urllib2.Request(url)

req.add_header('User-Agent',"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36")

response=None

try:

response=urllib2.urlopen(req,timeout=timeout)

#print "Url: %s\t%s" % (url,response.getcode())

try:

web_dict[row['id']]['result_code']

web_dict[row['id']]['fail_cnt']

except:

web_dict[row['id']]={}

web_dict[row['id']]['fail_cnt']=0

web_dict[row['id']]['result_code']=response.getcode()

web_dict[row['id']]['fail_cnt']=0

except urllib2.URLError as e:

if hasattr(e, 'code'):

print "Url: %s\t%s" % (url,e.code)

try:

web_dict[row['id']]['result_code']

web_dict[row['id']]['fail_cnt']

except:

web_dict[row['id']]={}

web_dict[row['id']]['fail_cnt']=0

web_dict[row['id']]['result_code']=e.code

web_dict[row['id']]['fail_cnt']=web_dict[row['id']]['fail_cnt']+1

if web_dict[row['id']]['fail_cnt']>=5:

l.append(row['id'])

elif hasattr(e, 'reason'):

print "Url: %s\t%s" % (url,'error')

try:

web_dict[row['id']]['result_code']

web_dict[row['id']]['fail_cnt']

except:

web_dict[row['id']]={}

web_dict[row['id']]['fail_cnt']=0

web_dict[row['id']]['result_code']=e.reason

web_dict[row['id']]['fail_cnt']=web_dict[row['id']]['fail_cnt']+1

if web_dict[row['id']]['fail_cnt']>=5:

l.append(row['id'])

except:

pass

finally:

if response:

response.close()

for key in l:

sql='update websites set status=1 where id=%s' % key

sqlExecute(sql)

#dump

f=file(webCheck_pkl,'wb')

pickle.dump(web_dict,f)

f.close()

参考网址：http://www.pythontab.com/html/2014/pythonhexinbiancheng_1128/928.html

weixin_39760619

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫