说明:
1.在python2默认编码是ASCII, python3里默认是unicode。
2.unicode 分为 utf-32(4个字节),utf-16(2个字节),utf-8(1-4个字节), utf-16是现在最常用的unicode版本, 不过在文件里存的还是utf-8,因为utf-8省空间。
3.在py3中encode,在转码的同时还会把string 变成bytes类型,decode在解码的同时还会把bytes变回string。
GBK,UTF-8编码转换思路:
以Unicode为桥梁进行转换(见文末流程图)
示例代码:
# In Python2
msg = "GBK,UTF-8编码的转换"
msg_gb2312 = msg.decode("utf-8").encode("gb2312")
gb2312_to_gbk = msg_gb2312.decode("gbk").encode("gbk")
print(msg)
print(msg_gb2312)
print(gb2312_to_gbk)
# In Python3
msg = "GBK,UTF-8编码的转换"
# msg_gb2312 = msg.decode("utf-8").encode("gb2312")
msg_gb2312 = msg.encode("gb2312") # 默认就是unicode,不用再decode
gb2312_to_unicode = msg_gb2312.decode("gb2312")
gb2312_to_utf8 = msg_gb2312.decode("gb2312").encode("utf-8")
print(msg)
print(msg_gb2312)
print(gb2312_to_unicode)
print(gb2312_to_utf8)
推荐阅读:ASCII、GB2312、GBK、GB18030、Unicode、UTF-8、BIG5 编码详解(全网最全)
本文参考于:https://www.cnblogs.com/alex3714/articles/5717620.html