quote()
quote函数是urllib.parse模块中的一个方法,用于将字符串进行URL编码。URL编码是将URL中的非ASCII字符和一些特殊字符转换成特定的格式,以便于在URL中传输和处理。
quote方法的语法如下:
quote(string, safe='', encoding=None, errors=None)
参数说明:
- string:需要进行URL编码的字符串。
- safe:指定不需要编码的字符,默认为空。可以是一个字符串,也可以是一个字符集合。
- encoding:指定编码方式,默认为None,即使用系统默认编码方式。
- errors:指定编码错误处理方式,默认为None。
返回值是一个进行URL编码后的字符串。
示例代码:
from urllib.parse import quote
string = "中文"
encoded_string = quote(string)
print(encoded_string) # 输出:%E4%B8%AD%E6%96%87
在上面的例子中,我们将字符串"中文"使用quote方法进行URL编码,得到的结果是"%E4%B8%AD%E6%96%87"。在URL中,非ASCII字符和一些特殊字符都会被转换成类似"%XX"的格式,其中XX是字符的ASCII码的十六进制表示。
quote方法还可以接受safe参数,用于指定不需要进行编码的字符。比如,我们可以指定字符集合"/"不需要进行编码:
示例代码:
from urllib.parse import quote
url_string = "http://example.com/path/to/file?name=张三"
encoded_string = quote(url_string)
print(encoded_string) # 输出:http%3A//example.com/path/to/file%3Fname%3D%E5%BC%A0%E4%B8%89
encoded_string = quote(url_string, safe='/')
print(encoded_string) # 输出:http%3A//example.com/path/to/file%3Fname%3D%E5%BC%A0%E4%B8%89
encoded_string = quote(url_string, safe=':')
print(encoded_string) # 输出:http:%2F%2Fexample.com%2Fpath%2Fto%2Ffile%3Fname%3D%E5%BC%A0%E4%B8%89
encoded_string = quote(url_string, safe='/:')
print(encoded_string) # 输出:http://example.com/path/to/file%3Fname%3D%E5%BC%A0%E4%B8%89
encoded_string = quote(url_string, safe='/:?')
print(encoded_string) # 输出:http://example.com/path/to/file?name%3D%E5%BC%A0%E4%B8%89
运行结果:
urlencode()
urlencode 是 urllib.parse 模块中的一个函数,用于将字典或元组列表中的数据进行URL编码。
示例代码:
from urllib.parse import urlencode
data = {'name': '张三', 'age': 25, 'city': '北京'}
encoded_data = urlencode(data)
print(encoded_data) # name=%E5%BC%A0%E4%B8%89&age=25&city=%E5%8C%97%E4%BA%AC
输出结果为:name=%E5%BC%A0%E4%B8%89&age=25&city=%E5%8C%97%E4%BA%AC
urlencode 函数将字典中的键值对转换为 URL 编码格式,并用 & 符号连接起来。其中,中文字符会被转换为 %E5%BC%A0%E4%B8%89 这样的URL编码形式。
你可以将 urlencode 生成的编码后的数据用于构建URL查询字符串或POST请求的参数。
urlparse()
使用urlparse库会将url分解成6部分,返回的是一个元组 (scheme, netloc, path, parameters, query, fragment)。可以再使用urljoin、urlsplit、urlunsplit、urlparse把分解后的url拼接起来。
def urlparse(url, scheme='', allow_fragments=True):
"""Parse a URL into 6 components:
<scheme>://<netloc>/<path>;<params>?<query>#<fragment>
Return a 6-tuple: (scheme, netloc, path, params, query, fragment).
Note that we don't break the components up in smaller bits
(e.g. netloc is a single string) and we don't expand % escapes."""
url, scheme, _coerce_result = _coerce_args(url, scheme)
splitresult = urlsplit(url, scheme, allow_fragments)
scheme, netloc, url, query, fragment = splitresult
if scheme in uses_params and ';' in url:
url, params = _splitparams(url)
else:
params = ''
result = ParseResult(scheme, netloc, url, params, query, fragment)
return _coerce_result(result)
注意:通过urlparse库返回的元组可以用来确定网络协议(HTTP、FTP等)、服务器地址、文件路径等。
示例代码:
from urllib.parse import urlparse
url = urlparse('http://www.baidu.com/index.php?username=dgw')
print(url)
print(url.netloc)
urlunparse()
使用urlunparse库将一个元组(scheme, netloc, path, parameters, query, fragment)组成一个具有正确格式的URL。
def urlunparse(components):
"""Put a parsed URL back together again. This may result in a
slightly different, but equivalent URL, if the URL that was parsed
originally had redundant delimiters, e.g. a ? with an empty query
(the draft states that these are equivalent)."""
scheme, netloc, url, params, query, fragment, _coerce_result = (
_coerce_args(*components))
if params:
url = "%s;%s" % (url, params)
return _coerce_result(urlunsplit((scheme, netloc, url, query, fragment)))
示例代码:
from urllib.parse import urlparse, urlunparse
url = urlparse('http://www.baidu.com/index.php?username=dgw')
print(url)
url_join1 = urlunparse(url)
print(url_join1)
url_tuple = ("http", "www.baidu.com", "index.php", "", "username=dgw", "")
url_join2 = urlunparse(url_tuple)
print(url_join2)
urlsplit()
使用urlsplit库只要用来分析urlstring,返回包含5个参数的元组(scheme, netloc, path, query, fragment)。urlsplit()和urlparse()差不多。不过它不切分URL的参数。
def urlsplit(url, scheme='', allow_fragments=True):
"""Parse a URL into 5 components:
<scheme>://<netloc>/<path>?<query>#<fragment>
Return a 5-tuple: (scheme, netloc, path, query, fragment).
Note that we don't break the components up in smaller bits
(e.g. netloc is a single string) and we don't expand % escapes."""
url, scheme, _coerce_result = _coerce_args(url, scheme)
allow_fragments = bool(allow_fragments)
key = url, scheme, allow_fragments, type(url), type(scheme)
cached = _parse_cache.get(key, None)
......
示例代码:
from urllib.parse import urlparse, urlsplit
url = urlparse('http://www.baidu.com/index.php?username=dgw')
print(url)
url2 = urlsplit('http://www.baidu.com/index.php?username=dgw')
print(url2)
urlunsplit()
def urlunsplit(components):
"""Combine the elements of a tuple as returned by urlsplit() into a
complete URL as a string. The data argument can be any five-item iterable.
This may result in a slightly different, but equivalent URL, if the URL that
was parsed originally had unnecessary delimiters (for example, a ? with an
empty query; the RFC states that these are equivalent)."""
scheme, netloc, url, query, fragment, _coerce_result = (
_coerce_args(*components))
if netloc or (scheme and scheme in uses_netloc and url[:2] != '//'):
if url and url[:1] != '/': url = '/' + url
示例代码:
from urllib.parse import urlparse, urlsplit, urlunsplit
url = urlparse('http://www.baidu.com/index.php?username=dgw')
print(url)
url2 = urlsplit('http://www.baidu.com/index.php?username=dgw')
print(url2)
url3 = urlunsplit(url2)
print(url3)
url_tuple = ("http", "www.baidu.com", "index.php", "username=dgw", "")
url4 = urlunsplit(url_tuple)
print(url4)
运行结果:
urljoin()
urljoin()将一个基本URL和一个可能的相对URL连接起来,形成对后者的绝对地址。
注意:如果基本URL并非以字符/结尾的话,那么URL基地址最右边部分就会被这个相对路径所替换。
def urljoin(base, url, allow_fragments=True):
"""Join a base URL and a possibly relative URL to form an absolute
interpretation of the latter."""
if not base:
return url
if not url:
return base
base, url, _coerce_result = _coerce_args(base, url)
......
示例代码:
from urllib.parse import urljoin
url = urljoin('http://www.baidu.com/test/', 'index.php?username=dgw')
print(url)
url2 = urljoin('http://www.baidu.com/test', 'index.php?username=dgw')
print(url2)