urllib2模块是urllib模块的扩展,两个模块都属于Python标准库,urllib2的使用如下
1.请求网页
req = urllib.urlopen(url)
html = req.read()
2.查看请求状况
req.geturl()获取请求页面的url
req.info() 获取网页的元信息
req.getcode()获取状态码
3.使用Request对象实例请求页面
request = urllib2.Request(url)
urllib2.urlopen(request)
4.对request对象的实例进行个多操作(request对象可以允许向服务器发送数据,和其他额外信息)
request .add_data(key,value)添加数据键值对
request.has_data()返回是否有某一数据
request.get_data()获取数据
request.get_method()获取请求类型get,post等
request.add_header(key,value)添加请求头 键值对
eg:request.add_header('urser-angent','Mozilla/5.0')
request.has_header()判断是否有请求头
request.add_unredirected_header()
request.get_full_url()
request.get_type()获取url类型
request.get_host()获取将要链接的主机
request.set_proxy(host,type)设置代理
request.is_unverifiable()返回请求是否可以验证
6.使用特殊情景处理器请求网页
1.生成特殊情景处理对象
handler = HTTPCookieProcessor
handler = ProxyHandler
handler = HTTPHandler
handler = HTTPRedirectHandler
2.构建opener
opener = urllibe.build_opener(handler)
3.安装opener
urllib2.install_opener(opener)
4.请求网页数据
urllib2.urlopen(url)
urllib2.urlopen(request)
7.用urllib2提交表单数据
urllib2提交的表单数据需要用urllib.urlencode进行编码
url = ‘http://www.xxxx.com’
data = {key:value,key:value}
postdata = urllibencode(data)
request = urllib2.Request(url,data = postdata)
request.add_header(headers)
response = urllib2.urlopen(request)