python编码解码：encode、decode

最新推荐文章于 2024-10-12 18:30:39 发布

hhjhh76

最新推荐文章于 2024-10-12 18:30:39 发布

阅读量980

点赞数 1

分类专栏： python 文章标签：编码解码 ASCⅡ Unicode UTF-8

本文链接：https://blog.csdn.net/hhjhh76/article/details/86529840

版权

python 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

一.编解码

python提供encode、decode两个函数来实现str类型和byte类型的转换：
在这里插入图片描述

1.str类型和byte类型的转换

words = 'python编码'
print('words',words,type(words))
words1 = words.encode('utf-8')
print('words1',words1,type(words1))
words2 = words1.decode('utf-8')
print('words2',words2,type(words2))

输出：
words python编码 <class ‘str’>
words1 b’python\xe7\xbc\x96\xe7\xa0\x81’ <class ‘bytes’>
words2 python编码 <class ‘str’>

这里要注意，使用什么方式编码得到的二进制，就要使用什么方式进行解码，否则解码得到的字符无意义

words3 = words1.decode('gbk')
print('words3',words3,type(words3))

输出：
words3 python缂栫爜 <class ‘str’>

2.byte类型不同编码方式之间的转换

使用str类型作为byte类型不同编码方式转换的中介

words = b'python\xe7\xbc\x96\xe7\xa0\x81'   #utf-8编码
print('words',words,type(words))
words1 = words.decode('utf-8')
print('words1',words1,type(words1))
words2 = words1.encode('gbk')
print('words2',words2,type(words2))
words3 = words2.decode('gbk')
print('words3',words3,type(words3))

输出：
words b’python\xe7\xbc\x96\xe7\xa0\x81’ <class ‘bytes’>
words1 python编码 <class ‘str’>
words2 b’python\xb1\xe0\xc2\xeb’ <class ‘bytes’>
words3 python编码 <class ‘str’>

二.常见编码方式

编码方式实现了数字代码到语言字符之间的映射，最早的编码方式ASCⅡ使用一个字节8bit编码，最高位为0，只能表示128个字符，如英文的大小写、数字、以及其他特殊字符等

但是世界上不止英文一种语言，不同语言所包含的字符数也不一样，于是各个国家就制定了自己的编码方式，例如中国的GBK。不同国家之间的编码方式不兼容，会导致编码之间出现“同码不同符，同符不同码”的情况。为解决这个问题，需要给不同语言设置统一的编码，于是就有了Unicode，Unicode为每种语言设置了唯一的二进制编码表示方式

Unicode 只是一个符号集，它只规定了字符与二进制代码之间的映射，却没有规定这个二进制代码如何存储（如0x0000存储为b’00000000’还是b’00000000 00000000’），于是出现了Unicode的不同实现方式UTF(Unicode Transformation Format)：UTF-8，UTF-16，UTF-32等，较为常用的是UTF-8，UTF-8使用不定长编码方式，对不同范围的字符使用不同的编码，其中：0x00~0x7F编码与ASCⅡ完全相同，即ASCⅡ所能表示的128个字符其 ASCⅡ编码和UTF-8编码相同, UTF-8编码的最大长度是4个字节