urllib2库是涉及到url资源请求的常用库
官方文档:urllib2 — extensible library for opening URLs
常用函数:
urllib2.urlopen(url [, data [, timeout ][, cafile][, capath][, cadefault ][, context ])
例子:url:可以是string,也可以是Request对象
timeout:设置请求超时
返回的对象有geturl()、info()、read()方法
geturl()方法获取连接地址
info()方法获取返回网页信息
read()方法获取返回网页内容
import urllib2 url = 'http://www.csdn.net/' html = urllib2.urlopen(url, timeout=5)
urllib2.Request(url [, data][, headers][, origin_req_host][, unverifiable])
例子:url:为合法的url,string
headers:浏览器头
import urllib2 url="http://www.csdn.net/" headers = {"User-Agent":"Mozilla/4.0;MSTE 6.0; Windows NT 5.1"} req = urllib2.Request(url, headers=headers) html = urllib2.urlopen(req)
错误处理:
URLError
import urllib2 try: html = urllib2.urlopen("http://www.csdn.net/") except urllib2.URLError, e: print e.reason
HTTPError
SocketErrorimport urllib2 try: html = urllib2.urlopen("http://www.csdn.net") except urllib2.HTTPError, e: print e.code print e.reason
import socket try: html = urllib2.urlopen("http://www.csdn.net") except urllib2.SocketError, e: print e.reason
连接超时捕获
import urllib2 import socket try: urllib2.urlopen("http://example.com", timeout = 1) except urllib2.URLError, e: if isinstance(e.reason, socket.timeout): print "There was an error: %r" % e