urllib2函数功能表

urllib2定义了如下的函数

urlopen

功能:打开一个url。
urllib2.urlopen(url[, data[, timeout[, cafile[, capath[, cadefault[, context]]]]])
urllib2默认发送 HTTP/1.1 requests ,header默认为 Connection:close

data
data可以是一个字符串,也可以是一个request。
目前HTTP请求是唯一一种需要data的request;当data提供后,request将是一个post的请求,而不是一个GET的请求。data应该是标准缓存形式为:/x-www-form-urlencoded 。urllib.urlencode()函数可以将map或者一系列的元组处理成这种字符串的形式。

timeout
参数缺省,为HTTP,HTTPS和FTP链接的时候使用。

context
如果context指定了,它一定是一定是一个ssl.SSLContext的实例,用于描述不同的SSL选择。可以参见HTTPSConnection了解更多。

cafile and capath
cafile和capath两个参数也是可选择的。cafile应该指向包含一系列CA证书的单个文件。但是capath应该指向一个包含证书文件的哈希地址表。要了解更多,可以参见函数ssl.SSLContext.load_verify_locations()

该函数返回一个类似文件的对象,该对象携带了额外的三个方法:
geturl() — 返回检索资源的url,通常用于判断是否有改变url的情况发生。

info() — 返回页面的变化信息,比如headers,

getcode() — 返回HTTP的状态。

此外,如果检测到有代理设置,比如,当*_proxy环境如http_proxy设置,并且确保requests可以通过代理。

urllib2.install_opener(opener)

安装一个OpenerDirector实例作为默认的全局opener。opener的安装只有在一种情况下是有必要的,就是你想用urlopen来用这个opener。否则,只需要调用改用urlopen()为OpenerDirector.open(),它不会检查真正的OpenerDirector。

urllib2.build_opener([handler, …])

返回一个OpenDirector的实例,该函数将一系列的handlers以给定的顺序串联。handlers可以是一个BaseHandler的实例,也可以是一个BaseHandler的子集。以下类的实例将会在handlers之前,除非handlers包含它们: ProxyHandler (if proxy settings are detected), UnknownHandler, HTTPHandler, HTTPDefaultErrorHandler, HTTPRedirectHandler, FTPHandler, FileHandler, HTTPErrorProcessor.
如果python支持SSL,例如,如果ssl模块可以被导入,HTTPSHandlers将可以被添加。

class urllib2.Request(url[, data][, headers][, origin_req_host][, unverifiable])

这是一个request的抽象类。url必须是一个包含可靠url的string。data应该是一个指定额外发送给server内容的string。目前只有HTTP请求需要使用data;如果data参数被提供的话,那么这个HTTP请求就不会是一个GET请求,而是一个POST请求。
headers应该是一个字典,可以使用add_header()在每个key和value都调用的时候。它通常被用来使用User-Agent的hender值,使代码伪装成浏览器,因为有些HTTP server通常只允许来自于浏览器的访问请求。
“Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11”, while urllib2‘s default user agent string is “Python-urllib/2.6” (on Python 2.6).
这一段不好翻译,大家自己看看吧。。
The final two arguments are only of interest for correct handling of third-party HTTP cookies:
origin_req_host should be the request-host of the origin transaction, as defined by RFC 2965. It defaults to cookielib.request_host(self). This is the host name or IP address of the original request that was initiated by the user. For example, if the request is for an image in an HTML document, this should be the request-host of the request for the page containing the image.

unverifiable should indicate whether the request is unverifiable, as defined by RFC 2965. It defaults to False. An unverifiable request is one whose URL the user did not have the option to approve. For example, if the request is for an image in an HTML document, and the user had no option to approve the automatic fetching of the image, this should be true.

Request Object

Request.add_data(data)
设置request的data。除了HTTP handlers,其它的handlers都不需要这一步。data应该是一个byte string,并且将request变成一个POST请求。

Request.get_method()
如果函数返回一个string,则表示这是一个HTTP请求。这个函数只有在HTTP请求的时候才有意义。

Request.has_data()

Return whether the instance has a non-None data.

Request.get_data()

Return the instance’s data.

Request.add_header(key, val)

Add another header to the request. Headers are currently ignored by all handlers except HTTP handlers, where they are added to the list of headers sent to the server. Note that there cannot be more than one header with the same name, and later calls will overwrite previous calls in case the key collides. Currently, this is no loss of HTTP functionality, since all headers which have meaning when used more than once have a (header-specific) way of gaining the same functionality using only one header.

Request.add_unredirected_header(key, header)
Add a header that will not be added to a redirected request.

Request.has_header(header)
Return whether the instance has the named header (checks both regular and unredirected).

Request.get_full_url()
Return the URL given in the constructor.

Request.get_type()
Return the type of the URL — also known as the scheme.

Request.get_host()
Return the host to which a connection will be made.

Request.get_selector()
Return the selector — the part of the URL that is sent to the server.

Request.get_header(header_name, default=None)
Return the value of the given header. If the header is not present, return the default value.

Request.header_items()
Return a list of tuples (header_name, header_value) of the Request headers.

Request.set_proxy(host, type)
Prepare the request by connecting to a proxy server. The host and type will replace those of the instance, and the instance’s selector will be the original URL given in the constructor.

Request.get_origin_req_host()
Return the request-host of the origin transaction, as defined by RFC 2965. See the documentation for the Request constructor.

Request.is_unverifiable()
Return whether the request is unverifiable, as defined by RFC 2965. See the documentation for the Request constructor.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值