当有些请求,或者地址中的汉字以及特殊符号不编码使用不了时候,则需要去把中文进行编码,有些地址拿到之后,需要进行解码,不然中文会变成百分号加几个字母和数字的形式
1.url编码
from urllib.parse import quote # 将字符串‘程序设计’进行编码 text = quote("程序设计", 'utf-8') print(text) # 打印结果 # %E7%A8%8B%E5%BA%8F%E8%AE%BE%E8%AE%A1
2.url解码
from urllib.parse import unquote # 对字符串‘%E7%A8%8B%E5%BA%8F%E8%AE%BE%E8%AE%A1’进行解密 text = unquote("%E7%A8%8B%E5%BA%8F%E8%AE%BE%E8%AE%A1", 'utf-8') print(text) # 打印结果 # 程序设计
3.源码
def quote(string, safe='/', encoding=None, errors=None): """quote('abc def') -> 'abc%20def' Each part of a URL, e.g. the path info, the query, etc., has a different set of reserved characters that must be quoted. The quote function offers a cautious (not minimal) way to quote a string for most of these parts. RFC 3986 Uniform Resource Identifier (URI): Generic Syntax lists the following (un)reserved characters. unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" reserved = gen-delims / sub-delims gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" Each of the reserved characters is reserved in some component of a URL, but not necessarily in all of them. The quote function %-escapes all characters that are neither in the unreserved chars ("always safe") nor the additional chars set via the safe arg. The default for the safe arg is '/'. The character is reserved, but in typical usage the quote function is being called on a path where the existing slash characters are to be preserved. Python 3.7 updates from using RFC 2396 to RFC 3986 to quote URL strings. Now, "~" is included in the set of unreserved characters. string and safe may be either str or bytes objects. encoding and errors must not be specified if string is a bytes object. The optional encoding and errors parameters specify how to deal with non-ASCII characters, as accepted by the str.encode method. By default, encoding='utf-8' (characters are encoded with UTF-8), and errors='strict' (unsupported characters raise a UnicodeEncodeError). """ if isinstance(string, str): if not string: return string if encoding is None: encoding = 'utf-8' if errors is None: errors = 'strict' string = string.encode(encoding, errors) else: if encoding is not None: raise TypeError("quote() doesn't support 'encoding' for bytes") if errors is not None: raise TypeError("quote() doesn't support 'errors' for bytes") return quote_from_bytes(string, safe)
def unquote(string, encoding='utf-8', errors='replace'): """Replace %xx escapes by their single-character equivalent. The optional encoding and errors parameters specify how to decode percent-encoded sequences into Unicode characters, as accepted by the bytes.decode() method. By default, percent-encoded sequences are decoded with UTF-8, and invalid sequences are replaced by a placeholder character. unquote('abc%20def') -> 'abc def'. """ if '%' not in string: string.split return string if encoding is None: encoding = 'utf-8' if errors is None: errors = 'replace' bits = _asciire.split(string) res = [bits[0]] append = res.append for i in range(1, len(bits), 2): append(unquote_to_bytes(bits[i]).decode(encoding, errors)) append(bits[i + 1]) return ''.join(res)
Python中url的编码以及解码
于 2021-09-07 08:58:27 首次发布