Python 中文乱码

最新推荐文章于 2024-05-22 20:26:52 发布

Survivior_Y

最新推荐文章于 2024-05-22 20:26:52 发布

阅读量447

点赞数

分类专栏： Python 文章标签： python

本文链接：https://blog.csdn.net/weixin_43533825/article/details/86075174

版权

Python 专栏收录该内容

18 篇文章 4 订阅

订阅专栏

一、encode和decode区别和一般报错

python内部编码是unicode编码

decode：将其他编码的字符串转换成unicode编码，eg：str.decode('utf8')，表示将utf8编码字符串转换为unicode编码

encode：将unicode编码的字符串转换成其它编码，eg：str.encode('utf8')，表示将unicode编码的字符串str转换成utf8编码

这里在做decode和encode之前一定要搞清楚str本身是什么编码，否则会报错：

UnicodeEncodeError:'ascii' codec can't encode characters in position ：表示ascii码无法转换为其它编码，也就是本身str不是unicode编码，还对str进行str.encode('utf8')或者其它编码操作；
UnicodeDecodeError: 'utf8' codec can't decode byte 0xc8 in position ：表示utf8编码无法解码（str编码方式其实是gb2312，所以解码方式选择错误）
\xe4\xb8\xad\xe6\x96\x87：是控制台按照ascii编码输出utf8编码的字符串结果，要解码decode('utf8')，在看具体输出

如果编码方式搞不清，这里的encode和decode不要轻易用，会导致代码不稳定；

二、#coding:utf-8 和setdefaultencoding的区别

python进行编码和解码的时候系统会有默认defaultencoding，python2.x的是ascii，所以上面第一个错误会经常被遇到；

#coding:utf-8：定义源代码的编码，源码，或者注释中有中文，需要有此声明，这样u"中文"不会报错

setdefaultencoding：是设置string的编码格式

# coding:utf-8
import sys,chardet
reload(sys)
sys.setdefaultencoding('utf8')
import os
if __name__ == "__main__":
    Filepath = os.path.join("D:\软件安装\eclipse\eclipse_jee\eclipse\workspace\pytest\src")
    
    dirpath=os.getcwd().decode('gb2312') #getcwd默认编码格式gb2312，所以要解码成unicode，不然会出现乱码
    dirpath=dirpath.encode('utf8')
    print Filepath
    print sys.getdefaultencoding()#获取系统默认编码
    print chardet.detect(Filepath)#获取当前字符串编码格式
    print chardet.detect(dirpath)
    print dirpath
    s='中文'
    s=s.decode('utf8')
    s=s.decode('utf8')
    print s
#     print chardet.detect(s)

结果如下：

D:\软件安装\eclipse\eclipse_jee\eclipse\workspace\pytest\src
utf8
{'confidence': 0.938125, 'language': '', 'encoding': 'utf-8'}
{'confidence': 0.938125, 'language': '', 'encoding': 'utf-8'}
D:\软件安装\eclipse\eclipse_jee\eclipse\workspace\pytest\src
中文

Survivior_Y

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Python 中文乱码

一、encode和decode区别和一般报错python内部编码是unicode编码decode：将其他编码的字符串转换成unicode编码，eg：str.decode('utf8')，表示将utf8编码字符串转换为unicode编码encode：将unicode编码的字符串转换成其它编码，eg：str.encode('utf8')，表示将unicode编码的字符串str转换成utf8...
复制链接

扫一扫

专栏目录