Python字符串编码转换

最新推荐文章于 2024-06-13 09:33:30 发布

来自江南的你

最新推荐文章于 2024-06-13 09:33:30 发布

阅读量5.2k

点赞数 6

分类专栏： Python扩展阅读文章标签： Python 编码

本文链接：https://blog.csdn.net/qq_41556318/article/details/88828357

版权

Python扩展阅读专栏收录该内容

102 篇文章 209 订阅

订阅专栏

字符串编码转换

最早的字符串编码是ASCII码，只包括0-9的数字，A-Z和a-z的字母以及空格、制表符等其他符号共256个字符。

随着信息技术的发展，各国的文字都需要进行编码，因此就出现了 GBK/GB2312 编码以及 UTF-8 编码。

在Python3中，默认采用UTF-8编码。

在Python中，有两种常用的字符串类型，一种是str，一种是bytes。

这两种类型的字符串不能拼接在一起使用，如果我们需要在网络上传输或者保存到磁盘上的话，就需要将str转换为bytes。

要实现这个转换，就需要使用 encode() 方法。

1、encode()

str.encode([encoding = "utf-8"][, errors = "strict"])

encode(...)
S.encode(encoding='utf-8', errors='strict') -> bytes

Encode S using the codec registered for encoding. Default encoding
is 'utf-8'. errors may be given to set a different error
handling scheme. Default is 'strict' meaning that encoding errors raise
a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and
'xmlcharrefreplace' as well as any other name registered with
codecs.register_error that can handle UnicodeEncodeErrors.

例如：

str1 = "人生若只如初见"
byte1 = str1.encode("GBK")  # 采用GBK编码进行转换
byte2 = str1.encode("utf-8")  # 采用utf-8编码进行转换
print("原字符串：", str1)
print("GBK转换：", byte1)
print("utf-8转换：", byte2)

>>> 
原字符串： 人生若只如初见
GBK转换： b'\xc8\xcb\xc9\xfa\xc8\xf4\xd6\xbb\xc8\xe7\xb3\xf5\xbc\xfb'
utf-8转换： b'\xe4\xba\xba\xe7\x94\x9f\xe8\x8b\xa5\xe5\x8f\xaa\xe5\xa6\x82\xe5\x88\x9d\xe8\xa7\x81'

2、encode()

bytes.decode([encoding = "utf-8"][, errors = "strict"])

decode(self, /, encoding='utf-8', errors='strict')
Decode the bytes using the codec registered for encoding.

encoding
The encoding with which to decode the bytes.
errors
The error handling scheme to use for the handling of decoding errors.
The default is 'strict' meaning that decoding errors raise a
UnicodeDecodeError. Other possible values are 'ignore' and 'replace'
as well as any other name registered with codecs.register_error that
can handle UnicodeDecodeErrors.

例如：

str1 = "人生若只如初见"
byte1 = str1.encode("GBK")  # 采用GBK编码进行转换
byte2 = str1.encode("utf-8")  # 采用utf-8编码进行转换
print("原字符串：", str1)
print("GBK转换：", byte1)
print("utf-8转换：", byte2)

str2 = byte1.decode("GBK")  # 解码
str3 = byte2.decode("utf-8")  # 解码
print("解码后：", str2)
print("解码后：", str3)

>>> 
原字符串： 人生若只如初见
GBK转换： b'\xc8\xcb\xc9\xfa\xc8\xf4\xd6\xbb\xc8\xe7\xb3\xf5\xbc\xfb'
utf-8转换： b'\xe4\xba\xba\xe7\x94\x9f\xe8\x8b\xa5\xe5\x8f\xaa\xe5\xa6\x82\xe5\x88\x9d\xe8\xa7\x81'
解码后： 人生若只如初见
解码后： 人生若只如初见

需要注意的是，使用什么格式进行编码，就必须使用该格式进行解码。

str1 = "人生若只如初见"
byte1 = str1.encode("GBK")  # 采用GBK编码进行转换
byte2 = str1.encode("utf-8")  # 采用utf-8编码进行转换
print("原字符串：", str1)
print("GBK转换：", byte1)
print("utf-8转换：", byte2)

str2 = byte1.decode("utf-8")  # 使用错误方法进项解码
print("解码后：", str2)

>>> 
原字符串： 人生若只如初见
GBK转换： b'\xc8\xcb\xc9\xfa\xc8\xf4\xd6\xbb\xc8\xe7\xb3\xf5\xbc\xfb'
utf-8转换： b'\xe4\xba\xba\xe7\x94\x9f\xe8\x8b\xa5\xe5\x8f\xaa\xe5\xa6\x82\xe5\x88\x9d\xe8\xa7\x81'
Traceback (most recent call last):
  File "C:/Users/XiangyangDai/Desktop/1.py", line 8, in <module>
    str2 = byte1.decode("utf-8")  # 使用错误方法进项解码
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc8 in position 0: invalid continuation byte

来自江南的你

关注

6
点赞
踩
25

收藏

觉得还不错? 一键收藏
0
评论
Python字符串编码转换

字符串编码转换最早的字符串编码是ASCII码，只包括0-9的数字，A-Z和a-z的字母以及空格、制表符等其他符号共256个字符。随着信息技术的发展，各国的文字都需要进行编码，因此就出现了 GBK/GB2312 编码以及 UTF-8 编码。在Python3中，默认采用UTF-8编码。在Python中，有两种常用的字符串类型，一种是str，一种是bytes。这两种类型的...
复制链接

扫一扫

专栏目录