python学习笔记——urllib库中的parse

最新推荐文章于 2024-02-07 22:13:02 发布

weixin_34107739

最新推荐文章于 2024-02-07 22:13:02 发布

阅读量304

点赞数

文章标签： python 运维

1 urllib.parse

urllib 库中包含有如下内容

Package contents

error

parse

request

response

robotparser

其中urllib.parse主要是用来解析URL（统一资源定位器）的。

urllib.parse模块定义了一个标准接口，将统一资源定位器URL字符串拆分为诸如addressing scheme、网址、路径等组件；该模块也可以将相对URL（relative URL）转换为给定的基URL（base URL）的绝对URL（absolute URL）。

urllib.parse被设计成在相对统一资源定位器（relative uniform resource locators）上与互联网RFC相匹配，它支持的URL schemes（URL协议）如下：

file、 ftp、gopher、hdl、http、 https、imap、 mailto、 mms、news、nntp、 prospero、rsync、rtsp、 rtspu、 sftp、 shttp、 sip、 sips、 snews、svn、svn+ssh、 telnet、 wais、 ws、wss。

Python中的urllib.parse模块提供的方法可以分为两种：

网址解析（URL parsing）：将URL字符串拆分为其组件

网址引用（URL quoting）：将URL组建组合到URL字符串中

2 网址解析（URL parsing）

2.1 urlparse

urlparse(url, scheme='', allow_fragments=True)

将URL解析成6部分，分别是

协议（scheme）

域名（netloc）

路径（path）

路径参数（params）

查询参数（query）

片段（fragment）

备注：

这 6 项也是ParseResult对象的方法ParseResult(scheme, netloc, path, params, query, fragment)，A 6-tuple that contains components of a parsed URL.

这六项数据描述符（Data descriptors inherited from ParseResult:）

from urllib import parse

urlp = parse.urlparse('https://www.icourse163.org/search.htm?search=%E7%AE%97%E6%B3%95#type=10&orderBy=0&pageIndex=1')
print(urlp)
print(urlp.scheme)
print(urlp.path)

运行

ParseResult(scheme='https', netloc='www.icourse163.org', path='/search.htm', params='', query='search=%E7%AE%97%E6%B3%95', fragment='type=10&orderBy=0&pageIndex=1')
https
/search.htm