Python3 url解析,urllib.parse

Python urllib.parse

Python3标准库中的urllib,用来处理各种协议下的url请求。其中parse模块用来解析url,主要包含如下方法:

__all__ = ["urlparse", "urlunparse", "urljoin", "urldefrag",
           "urlsplit", "urlunsplit", "urlencode", "parse_qs",
           "parse_qsl", "quote", "quote_plus", "quote_from_bytes",
           "unquote", "unquote_plus", "unquote_to_bytes",
           "DefragResult", "ParseResult", "SplitResult",
           "DefragResultBytes", "ParseResultBytes", "SplitResultBytes"]
urlparse

作用是将完整的url字符分解成6个部分:

urlparse -> (scheme, netloc, path, params, query, fragment)
  • scheme = “https”
  • netloc = “www.baidu.com
  • paht = “/topic/”
  • params = “”
  • query = “”
  • fragment = “”

返回结果类似元组,可以通过访问属性的方式访问对应数据:

proxy_result = urlparse(url)
query = proxy_result.query
netloc = proxy_result.netloc
urlsplit

类似 urlparse ,区别在于,返回一个五元元组格式的结果,相比较 urlparse 少返回一个params,同样可以通过访问属性的方法获取相应结果,使用方法等同于 urlparse

'''
urlsplit -> (scheme, netloc, path, query, fragment)
'''
urlunparse

urlparse 的逆方法,用来将url碎片整合成完整的url。接收的参数为包含url结构的元组,不存在的结构则留空。可以跟 urlparse 配合使用。

'''
urlunparse(components) -> Put a parsed URL back together again.
'''
urlunsplit

urlsplit 的逆方法,可以与 urlsplit 方法配合使用。

'''
urlunsplit -> Combine the elements of a tuple as returned by urlsplit() into a complete URL as a string.
'''
urljoin

用来将域名和path结合起来,当path为完整url时,返回path

'''
urljoin -> Join a base URL and a possibly relative URL to form an absolute interpretation of the latter.
'''
urljoin(base, path)
urldefrag

将url分解为url 和fragment两部分,fragment为url中#后边的内容

urldefrag -> Removes any existing fragment from URL.Returns a tuple of the defragmented URL and the fragment.
unquote_to_bytes

将url解码为bytes类型,使用utf8解码

unquote_to_bytes -> unquote_to_bytes('abc%20def') -> b'abc def'.
unquote

对传入的字符串进行url解码,默认utf8,可以通过encoding参数修改

unquote -> unquote('abc%20def') -> 'abc def'.
unquote(string, encoding='utf-8', errors='replace')
parse_qs

将query string解析为字典格式:{query1: [value1,value2,…]}。值为列表,因为可能有一个参数多个值的情况。默认会把空格去掉,通过修改布尔参数 keep_blank_values 更改方式

parse_qs -> Parse a query given as a string argument.
parse_qs(qs, keep_blank_values=False, strict_parsing=False,
             encoding='utf-8', errors='replace')
parse_qsl

将query string解析为二元元组格式:((query1: value1), …)

parse_qsl -> Parse a query given as a string argument.
parse_qsl(qs, keep_blank_values=False, strict_parsing=False,
              encoding='utf-8', errors='replace')
unquote_plus

在使用 unquote 方法之前先将+替换为空格

unquote_plus('%7e/abc+def') -> '~/abc def'
quote

对字符进行url编码,默认使用utf8

quote('abc def') -> 'abc%20def'
quote(string, safe='/', encoding=None, errors=None)
quote_plus

使用 quote 之前先将字符中的空格替换为+

Like quote(), but also replace ' ' with '+', as required for quoting HTML form values.
quote_from_bytes

对bytes或bytesarray类型进行url编码,返回str

Like quote(), but accepts a bytes object rather than a str, and does
not perform string-to-bytes encoding.
urlencode

将字典或二元元组转换为query string

Encode a dict or sequence of two-element tuples into a URL query string.
urlencode(query, doseq=False, safe='', encoding=None, errors=None,
              quote_via=quote_plus)
to_bytes
to_bytes(u"URL") --> 'URL'.
unwrap
unwrap('<URL:type://host/path>') --> 'type://host/path'.
splittype
splittype('type:opaquestring') --> 'type', 'opaquestring'.
splithost
splithost('//host[:port]/path') --> 'host[:port]', '/path'.
splituser
splituser('user[:passwd]@host[:port]') --> 'user[:passwd]', 'host[:port]'.
splitpasswd
splitpasswd('user:passwd') -> 'user', 'passwd'.
splithost
splitport('host:port') --> 'host', 'port'.
splitnport
Split host and port, returning numeric port.
splitquery
splitquery('/path?query') --> '/path', 'query'.
splittag
splittag('/path#tag') --> '/path', 'tag'.
splitattr
splitattr('/path;attr1=value1;attr2=value2;...') ->
    '/path', ['attr1=value1', 'attr2=value2', ...].
splitvalue
splitvalue('attr=value') --> 'attr', 'value'.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值