urllib2源码解读四（用opener打开你的url）

最新推荐文章于 2023-02-02 13:14:01 发布

a13393665983

最新推荐文章于 2023-02-02 13:14:01 发布

阅读量140

点赞数

文章标签： python

原文链接：http://www.cnblogs.com/lexus/archive/2013/01/07/2850279.html

版权

urllib2源码解读四（用opener打开你的url）

urllib2源码解读四（用opener打开你的url） | the5fire的技术博客

urllib2源码解读四（用opener打开你的url）

作者：胡阳发布：2012-12-20 23:45 分类：源码解读阅读 15 次暂无评论

urllib2源码解读四（用opener打开你的url）

在前面两篇文章《urllib2源码解读二（简单的urlopen）》和《urllib2源码解读三》中已经构造了一个opener了，我分析的过程看起来比较麻烦，其实理解之后发现也就那些逻辑罢了。有了这个opener之后，我们就可以用它来打开/读取url。整个过程都在opener.open(url)这个函数中。

这个函数的流程是清晰的，接受三个参数：fullurl，data，timeout。fullurl其实有两种形式：一种是url，另一种是Request对象。通过data参数来控制发送什么方式的http请求，GET还是POST。函数处理一个url的大体步骤是这样的：

1、构造Request对象。
2、对Request进行预处理，主要是晚上一个Request的信息，如header的处理或者cookie的处理。
3、然后用httplib中的对应协议的类，对这个Request进行处理。（httplib 是python中http 协议的客户端实现，用来与 HTTP 服务器进行交互）
4、最后就是扫尾了，看看你返回来的Response是否是一个有错误，有错误的就进行错误处理，比如说抛出一个“urlopen error…”这样的错误。
5、上一步没有错误的话，你就会得到一个经过httplib处理完成之后返回的Response对象，这个Response有点像一个文件对象，直接用read()即可。

基于上面的步骤，贴山代码来瞅一眼：

def open ( self , fullurl , data = None , timeout = socket._GLOBAL_DEFAULT_TIMEOUT ):
'''
一开始先建立Request对象，
然后处理使用request的handler处理request请求，
从上一步得到response之后，通过处理response的handler来处理

其中，在处理request的时候，会用到上面的职责链。
'''
# accept a URL or a Request object
if isinstance (fullurl , basestring ):
req = Request (fullurl , data )
else:
req = fullurl
if data is not None:
req. add_data (data )

req. timeout = timeout
protocol = req. get_type ( )

# 预处理Request
meth_name = protocol+ "_request"
for processor in self. process_request. get (protocol , [ ] ):
meth = getattr (processor , meth_name )
req = meth (req )

# 处理Request
response = self._open (req , data )

#处理Resonse
meth_name = protocol+ "_response"
for processor in self. process_response. get (protocol , [ ] ):
meth = getattr (processor , meth_name )
response = meth (req , response )

return response

只需看那三块代码即可，都是基于之前文章中已经学习到的OpenerDirector的三个属性，处理方式都是遍历一遍对应协议的handler，然后挨个处理一遍或者说加工一遍更切贴，很像是一个流水线的工厂。

就拿第一个处理——request处理来说，首先会从process_request中获取出能处理http请求的handler，默认的情况下这个process_request中的http协议只对应一个处理器：HTTPHandler。因此会使用HTTPHandler中的http_request方法来处理。其余的两个处理过程也是一样。

这段代码并不是完全，因为其中还调用了一个私有方法（从命名上来说）self._open(req, data)，先看下代码，然后再说下我对其中的疑问。

def _open ( self , req , data = None ):
#用默认的open进行处理
result = self._call_chain ( self. handle_open , 'default' ,
'default_open' , req )
if result:
return result
#用协议对应的open处理
protocol = req. get_type ( )
result = self._call_chain ( self. handle_open , protocol , protocol +
'_open' , req )
if result:
return result
#啥也没有得到，说明协议不对或者协议为包含的预期handler中，因此就是未知的处理。
return self._call_chain ( self. handle_open , 'unknown' ,
'unknown_open' , req )

def _call_chain ( self , chain , kind , meth_name , *args ):
"""
类职责链模式，通过遍历list里面的所有handler
来查找能够处理该请求的方法
"""
# Handlers raise an exception if no one else should try to handle
# the request, or return None if they can't but another handler
# could. Otherwise, they return the response.
handlers = chain. get (kind , ( ) )
for handler in handlers:
func = getattr (handler , meth_name )
result = func (*args )
if result is not None:
return result

_open代码中的注释很详细了，而_call_chain这段代码其实就是把for processor in self.process_request.get(protocol, [])….放到一个函数里了。
大概清晰之后，说说我的疑问。
疑问一、为啥要单独写一个_open函数，这三个处理过程都大致一样，一块放到一个函数中应该很清晰。
自问自答曰：可能是在open一个Request时的过程稍微复杂些，因此提取处理代码会更加清晰。我觉得这个理由很好。
疑问二、既然提取出来_call_chain这个函数，为啥不把对Request和Response的处理也用这个函数来做。
这个的原因我想还是为了保证三个处理的独立，区分更明显些吧。

这些疑问在以后的不断实践中回得到答案的，关于urllib2的分析就到此为止吧。

在对代码的分析学习中，最大的收获就是知道了一种程序的组织结构，用建造者模式或者说是职责连模式（知道是啥模型的不妨指点下）来处理多种请求，另外还有一点，函数不是越短越美，而是越清晰越美。

–EOF–

posted on 2013-01-07 22:45 lexus 阅读( ...) 评论( ...) 编辑收藏

转载于:https://www.cnblogs.com/lexus/archive/2013/01/07/2850279.html

a13393665983

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
urllib2源码解读四（用opener打开你的url）

urllib2源码解读四（用opener打开你的url） urllib2源码解读四（用opener打开你的url） | the5fire的技术博客urllib2源码解读四（用opener打开你的url）作者：胡阳发布：2012-12-20 23:45 分类...
复制链接

扫一扫