Python3 的 encode 与 decode

最新推荐文章于 2024-08-20 22:32:24 发布

「已注销」

最新推荐文章于 2024-08-20 22:32:24 发布

阅读量2.1k

点赞数

分类专栏： python

本文链接：https://blog.csdn.net/gulang03/article/details/82562935

版权

python 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

编码那些事：

乱码问题一直是开发中比较常见的问题，特别是在 windows 平台下开发，会经常遇到，其实乱码产生的原因就是编码与解码的方式不一致造成的，在 windows 下默认使用的是 GBK 编码，而开发时普遍使用的是 utf-8。在 IDE 这类内置控制台运行的时候，控制台的默认编码就是 GBK（即使你通过注册表修改了控制台的默认编码，默认情况下 IDE 的默认编码仍旧是会是 GBK），所以一但出现中文等非 ASCII 输出，就会出现乱码。所以有些时候出现你所看到的乱码并不见得是你的程序内部编码转换出现问题。

关于windows 下的乱码分析和如何修改控制台的编码 (l临时/注册表永久) ，请参考我的这篇文档：

https://blog.csdn.net/gulang03/article/details/81771343

关于常见的几种编码简单介绍参看：

https://blog.csdn.net/gulang03/article/details/79328868

Python3 中编码（encode）与解码 ( decode ) :

str.encode():

源码：

def encode(self, encoding='utf-8', errors='strict'): # real signature unknown; restored from __doc__
        """
        S.encode(encoding='utf-8', errors='strict') -> bytes
        
        Encode S using the codec registered for encoding. Default encoding
        is 'utf-8'. errors may be given to set a different error
        handling scheme. Default is 'strict' meaning that encoding errors raise
        a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and
        'xmlcharrefreplace' as well as any other name registered with
        codecs.register_error that can handle UnicodeEncodeErrors.
        """
        return b""

从源码相比可以清楚的看到，其返回值是二进制（binary）的 bytes。所以顾名思义 encode 的作用就是将 str 采用指定的编码方式解码，并返回编码后的 byte 数组。从源码中可以看出其默认的编码方式就是 UTF-8 。也就是 Python3 默认的编码方式。Python2 之前并不是如此。

bytes.decode():

源码：

    def decode(self, *args, **kwargs): # real signature unknown
        """
        Decode the bytes using the codec registered for encoding.
        
          encoding
            The encoding with which to decode the bytes.
          errors
            The error handling scheme to use for the handling of decoding errors.
            The default is 'strict' meaning that decoding errors raise a
            UnicodeDecodeError. Other possible values are 'ignore' and 'replace'
            as well as any other name registered with codecs.register_error that
            can handle UnicodeDecodeErrors.
        """
        pass

decode 顾名思义就是解码喽，就是将 bytes 按照指定参数解码成 String 对象。