编码学习笔记

最新推荐文章于 2021-02-26 15:58:03 发布

不要社工我

最新推荐文章于 2021-02-26 15:58:03 发布

阅读量203

点赞数

分类专栏： python

python 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

编码有
   ascii码
       没有中文只有英文
           A - 00000010   一个字节 8位
   unicode
           A - 00000000 00000001 00000010 00000100    4个字节 32位
           中 - 00000000 00000001 00000010 00000110    4个字节 32位
   utf-8
           A - 0010 0000   一个字节 8位
           中 - 0000001 00000010 00000110 24位 3个字节
   gbk
           A - 00000010   一个字节 8位
           中 - 00000010 00000110 16位 2个字节

1，各个编码之间的二进制，是不能互相识别的，会产生乱码
2，文件的传输和储存都是二进制，但不是Unicode，因为32位储存一个字符，占空间多

python3的字符串在内存中是用unicode编码的，导致传输和储存要转换。

====================
# 对于中文
# str 编码方式：unicode
# bytes 编码方式 utf-8 gbk等

# str 表现形式 s = "abc"
# 编码方式 01010101 unicode
# bytes 表现形式 s = b"abc"
# 编码方式 00101010 utf-8 gbk等

# s1 = 'abc'
# s2 = b'abs'
# print(s1, type(s1))
# print(s2, type(s2))
# abc <class 'str'>
# b'abs' <class 'bytes'>

# 对于英文
# str 编码方式：unicode
# bytes 编码方式 utf-8 gbk等

# str 表现形式 s = "中"
# 编码方式 01010101 unicode
# bytes 表现形式 s = b"x\e91\e91\e01\e21\e31\e32"
# 编码方式 00101010 utf-8 gbk等

# s1 = '中'
# s2 = b'中'
# print(s1, type(s1))
# print(s2, type(s2))
# SyntaxError: bytes can only contain ASCII literal characters.
# py3的bytes中文表现不出来

# ===============

# 英文
# str转化bytes
s1 = 'abc'
s2 = s1.encode()
print(s1, type(s1))
print(s2, type(s2))
   # abc <class 'str'>
   # b'abc' <class 'bytes'>
   # 转化后就可以传输和储存

# 中文
s1 = '中'
s2 = s1.encode("utf-8")
print(s1, type(s1))
print(s2, type(s2))
   # 中 <class 'str'>
   # b'\xe4\xb8\xad' <class 'bytes'>
   # 转化后就可以传输和储存