python3 urllib.parse.urlencode()_urlparse()_parse_qs()_quote()_unquote().py

最新推荐文章于 2024-03-11 11:14:03 发布

学无止境慢慢来

最新推荐文章于 2024-03-11 11:14:03 发布

阅读量748

点赞数

分类专栏： python3

原文链接：https://blog.csdn.net/weixin_37989267/article/details/79432344

版权

python3 专栏收录该内容

98 篇文章 6 订阅

订阅专栏

本文深入探讨了Python中URL编码与解码的方法，包括urlencode、quote、unquote等函数的使用，解释了URL为何需要编码及不同场景下的编码规则。通过实例展示了参数序列化、特殊字符处理及编码格式的选择。

摘要由CSDN通过智能技术生成

"""
模块：python3 urllib.parse.urlencode()_urlparse()_parse_qs()_quote()_unquote().py
功能：python3 urllib.parse.urlencode()_quote()_unquote() 方法。
参考：
https://blog.csdn.net/weixin_37989267/article/details/79432344
知识点：
0.URL为何要编码、解码？
    通常如果一样东西需要编码，说明这样东西并不适合传输。
    原因多种多样，如Size过大，包含隐私数据。
    对于Url来说，之所以要进行编码，是因为Url中有些字符会引起歧义。

    例如，Url参数字符串中使用key=value键值对这样的形式来传参，键值对之间以&符号分隔，
    如/s?q=abc&ie=utf-8。如果你的value字符串中包含了=或者&，那么势必会造成接收Url的服务器解析错误，
    因此必须将引起歧义的&和=符号进行转义，也就是对其进行编码。

    又如，Url的编码格式采用的是ASCII码，而不是Unicode，
    这也就是说你不能在Url中包含任何非ASCII字符，例如中文。
    否则如果客户端浏览器和服务端浏览器支持的字符集不同的情况下，中文可能会造成问题。

1.urllib.parse.urlencode(query, doseq=False, safe='', encoding=None, errors=None,
quote_via=<function quote_plus at 0x00000201D59A37B8>)
    编码一个字典或两个元素元组的序列成一个 URL 查询字符串。

    如果 query 参数中的任何值都是序列，并且 doseq 是 True，
    则每一个序列的元素都将被转换成一个独立的参数。

    如果 query 参数是两个元素元组的序列，
    输出参数的顺序将会匹配输入参数的顺序。

    query 参数的组件可以是字符串类型，也可以是字节类型。

    safe，encoding 和 errors 参数向下传递给指定的函数 - quote_via
    （encoding 和 errors 参数，仅在组件为 str 时需要）。

2.urllib.parse.urlparse(url, scheme='', allow_fragments=True)
    Parse a URL into 6 components:
    <scheme>://<netloc>/<path>;<params>?<query>#<fragment>
    Return a 6-tuple: (scheme, netloc, path, params, query, fragment).
    Note that we don't break the components up in smaller bits
    (e.g. netloc is a single string) and we don't expand % escapes.

3.urllib.parse.parse_qs(qs, keep_blank_values=False, strict_parsing=False,
encoding='utf-8', errors='replace')
    Parse a query given as a string argument.

    Arguments:
    qs: percent-encoded query string to be parsed
    keep_blank_values: flag indicating whether blank values in
        percent-encoded queries should be treated as blank strings.
        A true value indicates that blanks should be retained as
        blank strings.  The default false value indicates that
        blank values are to be ignored and treated as if they were
        not included.
    strict_parsing: flag indicating what to do with parsing errors.
        If false (the default), errors are silently ignored.
        If true, errors raise a ValueError exception.
    encoding and errors: specify how to decode percent-encoded sequences
        into Unicode characters, as accepted by the bytes.decode() method.

2.urllib.parse.quote(string, safe='/', encoding=None, errors=None)
    quote('abc def') -> 'abc%20def'
    对url进行转义，返回转义后的url。
    string, 待转义的字符串。
    safe, 安全的字符串，即，不转义的字符串。
    string 和 safe 可以是str或字节对象。
    如果 string 是字节对象，则不能指定 encoding 和 errors。

    URL的每个部分，例如路径信息、查询等。具有一组必须转义的保留字符的不同集合。
    RFC 2396 统一资源标识符(uri)：常规语法列出以下保留字符。
    保留    = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
                  "$" | ","
    每个字符都保留在url的某些组件中，但不一定全部保留。

    默认情况下，quote函数用于引用url的路径部分。
    因此，它不会编码‘/’。
    这个字符是保留的，但是在典型的用法中，
    quote函数是在一个路径上调用的，其中现有的斜杠字符被用作保留字符。

    可选的 encoding 和 errors 参数指定如何处理非 ascii 字符，这是str.encode方法所接受的。
    默认情况下，encoding='utf-8'(字符用UTF-8编码)，
    errors='strict'(不支持的字符会触发 UnicodeEncodeError 错误)。

3.urllib.parse.unquote(string, encoding='utf-8', errors='replace')
    用它们的等效单字符替换 %xx 转义。
    可选的 encoding 和 errors 参数指定如何将 百分比编码 的序列解码为 Unicode 字符，
    如 bytes.decode() 方法所接受的。
    默认情况下，编码百分比序列用 utf-8 解码，无效序列被占位符替换。
    unquote('abc%20def') -> 'abc def'.

4.urllib.parse.quote_plus(string, safe='', encoding=None, errors=None)
    像 quote(), 但' ' ->  '+', 根据转义 html 表单值的要求。
    原始字符串中的加号被转义，除非它们包含在 safe 中。
    它没有默认的 safe = '/'。

5.urllib.parse.unquote_plus(string, encoding='utf-8', errors='replace')
    像 unquote(), 但'+' -> ' '，根据反转义 html 表单值的要求。

    unquote_plus('%7e/abc+def') -> '~/abc def'

6.如果出现3个百分号为一个原字符则为utf-8编码，
如果2个百分号则为gb2312编码。
"""
import string
from urllib import parse

# 1.parse.urlencode()
params = {"name": "gsj", "age": 32}
print(parse.urlencode(params))
# name=gsj&age=32
# 中文字符进行了转义（quote）。
params = {"p1": "温故知新"}
queryStr = parse.urlencode(params)
print("queryStr:", queryStr)
# queryStr: p1=%E6%B8%A9%E6%95%85%E7%9F%A5%E6%96%B0

# 2.parse.quote()_unquote()
# 默认使用 utf-8 编解码。
print("\n2.1.")
print(parse.quote('abc def'))  # abc%20def
print(parse.unquote('abc%20def'))  # abc def
print(parse.quote("风"))  # %E9%A3%8E
print(parse.unquote(parse.quote("风")))  # 风
# 2.2.gb2312编解码。
print("\n2.2.")
print(parse.quote("风", encoding='gb2312'))  # %B7%E7
print(parse.unquote(parse.quote("风", encoding='gb2312'), encoding='gb2312'))  # 风
# 2.3.safe 参数的使用
print("\n2.3.")
url = 'http://api.map.baidu.com/telematics/v3/weather?location=郑州市&output=json&ak=TueGDhCvwI6fOrQnLM0qmXxY9N0OkOiQ&callback=?'
print('{}'.format(parse.quote(url, safe=string.printable)))
# http://api.map.baidu.com/telematics/v3/weather?location=%E9%83%91%E5%B7%9E%E5%B8%82&output=json&ak=TueGDhCvwI6fOrQnLM0qmXxY9N0OkOiQ&callback=?
print('{}'.format(parse.quote(url)))
# http%3A//api.map.baidu.com/telematics/v3/weather%3Flocation%3D%E9%83%91%E5%B7%9E%E5%B8%82%26output%3Djson%26ak%3DTueGDhCvwI6fOrQnLM0qmXxY9N0OkOiQ%26callback%3D%3F

# 3.parse.quote()_quote_plus()_unquote()_unquote_plus()
print("\n3.1.")
print(parse.quote("9/2"))  # 9/2
print(parse.quote_plus("9/2"))  # 9%2F2
print(parse.quote("9 2"))  # 9%202
print(parse.quote_plus("9 2"))  # 9+2

print("\n3.2.")
print(parse.unquote("9+2"))  # 9+2
print(parse.unquote_plus("9+2"))  # 9 2
print(parse.unquote_plus("9/2"))  # 9/2

# # 4.可打印的字符。
# print("\n4.")
# print(string.printable)
# # '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'
# # 上下两行是一个意思。（上，交互模式的输出，下，print打印的。）
# # 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
# # 
# print("\x0b")
# # (空格)
# print("\x0c")
# # 
#
# # 5.测试实例
# print("\n5.1.")
# url = "http://localhost:8080/floorsNodes/风管.json"
# print(parse.quote(url, safe=string.printable))
# # http://localhost:8080/floorsNodes/%E9%A3%8E%E7%AE%A1.json
# print(parse.unquote(parse.quote(url, safe=string.printable)))
# # http://localhost:8080/floorsNodes/风管.json
#
# print("\n5.2.")
# url = "http://localhost:8080/floorsNodes/%25E9%25A3%258E%25E7%25AE%25A1.json"
# print(parse.unquote(url))
# # http://localhost:8080/floorsNodes/%E9%A3%8E%E7%AE%A1.json
# print(parse.unquote(parse.unquote(url)))
# # http://localhost:8080/floorsNodes/风管.json

# 6.parse.urlparse()_parse_qs()
print("\n6.")
url = r'https://docs.python.org/3.5/search.html?q=parse&check_keywords=yes&area=default'
pr = parse.urlparse(url)
print("pr:", pr)
# pr: ParseResult(scheme='https', netloc='docs.python.org', path='/3.5/search.html',
# params='', query='q=parse&check_keywords=yes&area=default', fragment='')
qd = parse.parse_qs(pr.query)
print("qd:", qd)
# qd: {'q': ['parse'], 'check_keywords': ['yes'], 'area': ['default']}
print(qd['area'])  # ['default']
# 注意：加号会被解码！
pr = parse.parse_qs('proxy=183.222.102.178:8080&task=XXXXX|5-3+2')
print("pr:", pr)
# pr: {'proxy': ['183.222.102.178:8080'], 'task': ['XXXXX|5-3 2']}