python 编码问题总结

最新推荐文章于 2024-05-22 20:26:52 发布

BYR_jiandong

最新推荐文章于 2024-05-22 20:26:52 发布

阅读量461

点赞数

分类专栏： Python 基础文章标签： python 编码 unicode utf-8

本文链接：https://blog.csdn.net/lujiandong1/article/details/52960411

版权

Python 基础专栏收录该内容

24 篇文章 0 订阅

订阅专栏

1、

简而言之，Python 2.x里字符串有两种：str和Unicode

前者到后者要decode，后者到前者要encode,'utf-8'为例：
str.decode('utf-8') -> Unicode

str <- Unicode.encode('utf-8')

总结：Unicode就像是中间桥梁，utf-8编码，GBK编码都可以decode成unicode编码，而unicode编码可以encode成utf-8。编码其实就分为两类Unicode编码和非Unicode编码，非Unicode包含了uft-8,GBK之类的，utf-8和GBK的转换可以通过unicode来作为中间桥梁，先decode成unicode,再encode成相应的码

print "Type of    '中文'   is %s" % type('中文')
print "Type of   '中文'.decode('utf-8')   is %s" % type('中文'.decode('utf-8')) 
print "Type of   u '中文'   is %s" % type(u'中文')
print "Type of   u'中文'.encode('utf-8')   is %s" % type(u'中文'.encode('utf-8'))

说明：

Type of '中文' is <type 'str'>
Type of '中文'.decode('utf-8') is <type 'unicode'>
Type of u '中文' is <type 'unicode'>
Type of u'中文'.encode('utf-8') is <type 'str'>

2、避免编码问题

建议一、使用字符编码声明，并且同一工程中的所有源代码文件使用相同的字符编码声明

#encoding=utf-8

说明：如果py文件的开头已经使用了#encoding=utf-8，那么就print 就自动将print的字符转成utf-8,

test2 = u'汉字'
print test2

#encoding=utf-8
test2 = u'汉字'
print test2

说明：这样就不会报错,否则乱码

3、读写文件

从目标文件读入，然后decode成unicode码，然后再encode成utf-8码，再存到文件中。

内置的open()方法打开文件时，read()读取的是str,str可以使用GBK,utf-8，读取后需要使用正确的编码格式进行decode()。write()写入时，如果参数是unicode，则需要使用你希望写入的编码进行encode()，如果是其他编码格式的str，则需要先用该str的编码进行decode()，转成unicode后再使用写入的编码进行encode()。如果直接将unicode作为参数传入write()方法，Python将先使用源代码文件声明的字符编码进行编码然后写入。

# coding: UTF-8
 
f = open('test.txt')
s = f.read()
f.close()
print type(s) # <type 'str'>
# 已知是GBK编码，解码成unicode
u = s.decode('GBK')
 
f = open('test.txt', 'w')
# 编码成UTF-8编码的str
s = u.encode('UTF-8')
f.write(s)
f.close()