Python 检查web/url状态常用库比较分析

最新推荐文章于 2022-07-08 01:09:54 发布

banrieen

最新推荐文章于 2022-07-08 01:09:54 发布

阅读量969

点赞数 1

分类专栏：创建应用和虚拟化服务文章标签： python url

本文链接：https://blog.csdn.net/banrieen/article/details/60772918

版权

创建应用和虚拟化服务专栏收录该内容

36 篇文章 0 订阅

订阅专栏

Python url 处理常用库比较分析

简单对比几种检查website或url是否存在的方法：

webbrowser
httplib
urllib，urllib2
requests
Sulenium WebDriver

1、可以使用HEAD Request替换GET方法，只Download header 非全部content获取response status
备注：即使 URL 存在的情况下，HEAD request 也可能返回fail,例如：获取Amzon front page时， returns status 405 (Method Not Allowed) ，这个需要额外GET content 来确定。
Http 状态返回码 (来自 wikipedia”https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol#Request_methods“)
1xx - informational
2xx - success
3xx - redirection
4xx - client error
5xx - server error

import httplib
c = httplib.HTTPConnection('www.example.com')
c.request("HEAD", '')
if c.getresponse().status == 200:
   print('web site exists')

或：

import httplib2
h = httplib2.Http()
resp = h.request("http://www.google.com", 'HEAD')
assert int(resp[0]['status']) < 400

2、如果想下载whole page，只用一般request方法检查status code

import requests

response = requests.get('http://google.com')
assert response.status_code < 400

或：

import requests
request = requests.get('http://www.example.com')
if request.status_code == 200:
    print('Web site exists')
else:
    print('Web site does not exist')

3、使用webbrowser.open(url, new=0, autoraise=True)很难判定page是否打开或存在，只要browser(系统默认)打开则返回*True值。*
且browser对象主要提供open方法，缺少close

#url "http://192.168.99.74"不存在
webbrowser.open("http://192.168.99.74", new=0, autoraise=True)
True

4、使用urllib或urllib2，判定urlopen(）返回状态较简单的可以确认webwite或url是否存在或可以打开

import urllib2
try:
    urllib2.urlopen('http://www.example.com/some_page')
except urllib2.HTTPError, e:
    print(e.code)
except urllib2.URLError, e:
    print(e.args)

5、使用webdriver 对象GET(URL)也能判断url是否存在或可打开，但是必须处理其抛出的异常，否则代码执行会中止。
selenium2需要额外安装，httplib2，urllib2为python内置对象。

from selenium import webdriver
ROBOT_LIBRARY_SCOPE = 'GLOBAL'

def __init__(self):
    pass

def web_open_page(self, url, browser="chrome"):
    driver = webdriver.Chrome(r'C:\Python27\selenium\webdriver\chromedriver')  
    try:
        driver.get(url)
    except:
        print "Open fail !\n"

banrieen

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Python 检查web/url状态常用库比较分析

Python url 处理常用库比较分析webbrowserhttpliburllib，urllib2requestsSulenium WebDriver1、若使用Http 状态返回码检查website 是否存在：#可以使用HEAD Request替换GET方法
复制链接

扫一扫