实际上,urllib2似乎可以执行HTTP HEAD请求。
上面@reto链接到的question显示了如何让urllib2执行HEAD请求。
我的看法是:import urllib2
# Derive from Request class and override get_method to allow a HEAD request.
class HeadRequest(urllib2.Request):
def get_method(self):
return "HEAD"
myurl = 'http://bit.ly/doFeT'
request = HeadRequest(myurl)
try:
response = urllib2.urlopen(request)
response_headers = response.info()
# This will just display all the dictionary key-value pairs. Replace this
# line with something useful.
response_headers.dict
except urllib2.HTTPError, e:
# Prints the HTTP Status code of the response but only if there was a
# problem.
print ("Error code: %s" % e.code)
如果你用Wireshark网络协议analazer之类的工具检查一下,你会发现它实际上是在发送HEAD请求,而不是GET。
这是Wireshark捕获的来自上述代码的HTTP请求和响应:HEAD /doFeT HTTP/1.1
Accept-Encoding: identity
Host:
bit.ly
Connection: close
User-Agent: Python-urllib/2.7
HTTP/1.1 301 Moved
Server: nginx
Date: Sun, 19 Feb 2012
13:20:56 GMT
Content-Type: text/html; charset=utf-8
Cache-control: private; max-age=90
Location:
http://www.kidsidebyside.org/?p=445
MIME-Version: 1.0
Content-Length: 127
Connection: close
Set-Cookie:
_bit=4f40f738-00153-02ed0-421cf10a;domain=.bit.ly;expires=Fri Aug 17 13:20:56 2012;path=/; HttpOnly
然而,正如在另一个问题中的一个注释中所提到的,如果所讨论的URL包含重定向,那么urllib2将对目的地而不是头部执行GET请求。这可能是一个主要的缺点,如果你真的只想提出头的要求。
上面的请求涉及重定向。以下是Wireshark捕获的对目的地的请求:GET /2009/05/come-and-draw-the-circle-of-unity-with-us/ HTTP/1.1
Accept-Encoding: identity
Host: www.kidsidebyside.org
Connection: close
User-Agent: Python-urllib/2.7
使用urllib2的另一种方法是使用Joe Gregorio的httplib2库:import httplib2
url = "http://bit.ly/doFeT"
http_interface = httplib2.Http()
try:
response, content = http_interface.request(url, method="HEAD")
print ("Response status: %d - %s" % (response.status, response.reason))
# This will just display all the dictionary key-value pairs. Replace this
# line with something useful.
response.__dict__
except httplib2.ServerNotFoundError, e:
print (e.message)
这有一个优点,即对初始HTTP请求和重定向到目标URL的请求都使用HEAD请求。
这是第一个请求:HEAD /doFeT HTTP/1.1
Host: bit.ly
accept-encoding: gzip,
deflate
user-agent: Python-httplib2/0.7.2 (gzip)
这是第二个请求,到目的地:HEAD /2009/05/come-and-draw-the-circle-of-unity-with-us/ HTTP/1.1
Host: www.kidsidebyside.org
accept-encoding: gzip, deflate
user-agent: Python-httplib2/0.7.2 (gzip)