Python2.7输出中文显示乱码问题笔记_UnicodeDecodeError: ‘utf8‘ codec can‘t decode byte 0xc4 in position 0

最新推荐文章于 2024-06-07 00:11:02 发布

激动的兔子

最新推荐文章于 2024-06-07 00:11:02 发布

阅读量1.6k

点赞数 2

分类专栏： python学习手记文章标签： python

本文链接：https://blog.csdn.net/u014685432/article/details/130638631

版权

python学习手记专栏收录该内容

6 篇文章 1 订阅

订阅专栏

问题描述：

开始运行以下代码的时候，我的输出内容是英文，没有出现显示乱码的问题。后来我把输出的英文替换为中文后，输出的中文也是正常的，但是当我把中文修改为其他内容时，会出现乱码显示的问题。如下图所示：

代码修改前：
注意中文输出部分

#coding:utf-8
'''
使用 class 关键字定义类。
可以提供一个可选的父类或者说基类;
如果没有合适的基类，
那就使用 object 作为基类。
class 行之后是可选的文档字符串， 静态成员定义， 及方法定义。
'''
class FooClass(object):
    #定义类的版本，静态变量以下四个方法共享
    Version=0.3  #class (data) attribute
    #初始化属性
    '''__init__() 方法有一个特殊名字， 所有名字开始和结束都有两个下划线的方法都是特殊方法。'''
    def __init__(self,nm="Miao"):
        "constrctor"
        self.name=nm  #class instance (data) attribute
        print "My name is",self.name,nm
        
    def ShowName(self):
        #display instance attribute and class name        
        print 'My name is:',self.name
        print 'My instance class name is',self.__class__.__name__  #这个变量表示实例化它的类的名字
        print '我的名字是',self.name
        print '我实例化的类名是:',self.__class__.__name__  #这个变量表示实例化它的类的名字

    def ShowVersion(self):
        # display class(static) attribute        
        print 'The current Version is :',self.Version
        print "当前类的版本：",self.Version  #reference FooClass.Version

    def addMe2Me(self,x):        
        "apply + operation to argument "
        return x+x
        
# 类实例化
a=FooClass("Tom") 
a.ShowName()
b= a.addMe2Me("xyz")
print b

a.ShowVersion()
print a.Version

代码修改后：

    def ShowName(self):
        #display instance attribute and class name
        print '我的名字',self.name        
        print '这个变量表示实例化它的类的名字：',self.__class__.__name__  #这个变量表示实例化它的类的名字
        print 'My name is:',self.name
        print 'My instance class name is',self.__class__.__name__  #这个变量表示实例化它的类的名字

    def ShowVersion(self):
        # display class(static) attribute
        print "当前类的版本号：",self.Version  #reference FooClass.Version
        print 'The current Version is :',self.Version

输出截图：

代码修改前运行截图	代码修改后运行截图

解决办法：

针对出现中文乱码的问题，在修改后缀的中文前面加一个‘u’就可以解决问题。

例如：数据传输过程中数据不时出现丢失的情况，偶尔会丢失一部分数据
APP 中接收数据代码：

def ShowName(self):
        #display instance attribute and class name
        print u'我的名字',self.name        
        print u'这个变量表示实例化它的类的名字：',self.__class__.__name__  #这个变量表示实例化它的类的名字
        print 'My name is:',self.name
        print 'My instance class name is',self.__class__.__name__  #这个变量表示实例化它的类的名字

    def ShowVersion(self):
        # display class(static) attribute
        print u"当前类的版本号：",self.Version  #reference FooClass.Version
        print 'The current Version is :',self.Version

在这里插入图片描述

小结

现对常见的中文编码问题做一下小结：
需要说明的是我的编码方式如下图所示
在这里插入图片描述

python内部使用的是unicode编码，而外部编码则会千奇百怪，比如gbk，gb2312，utf8等，那如何将这些编码转换为unicode呢？
首先我们需要看一下源代码文件中使用字符串的情况，python默认会认为源代码文件是asci编码，比如说代码中有一个变量赋值：

test =’abc’
print test

python认为这个’abc’就是一个asci编码的字符。在仅仅使用英文字符的情况下一切正常，但是如果用了中文，比如：

test2='你好'
print test2

在这里插入图片描述

这个代码文件被执行时就会有可能出现显示乱码的问题。python默认将代码文件内容当作asci编码处理，但asci编码中不存在中文，因此就会出现中文乱码的情况。
解决问题之道就是要让python知道文件中使用的是什么编码形式，对于中文，可以用的常见编码有utf-8，gbk ，gb2312等。只需在代码文件的最前端添加如下：

# -*- coding: utf-8 -*-

这就是告知python我这个文件里的文本是用utf-8编码的，这样，python就会依照utf-8的编码形式解读其中的字符，然后转换成unicode编码内部处理使用。
不过，如果你在Windows控制台下运行此代码的话，虽然程序是执行了，但屏幕上打印出的却不是’你好’字。这是由于python编码与控制台编码的不一致造成的。Windows下控制台中的编码使用的是gbk，而在代码中使用的utf-8，python按照utf-8编码打印到gbk编码的控制台下自然就会不一致而不能打印出正确的汉字。

解决办法一：将源代码的编码也改成gbk，也就是代码第一行改成：

# -*- coding: gbk -*-

解决办法二：保持源码文件的utf-8不变，而是在’你好’前面加个u字，也就是:

test2=u’你好’
print test2

这样就可以正确打印出’你好’了。

这里的这个u表示将后面跟的字符串以unicode格式存储。python会根据代码第一行标称的utf-8编码识别代码中的汉字’你好’，然后转换成unicode对象。如果我们用type查看一下’你好’的数据类型。
在这里插入图片描述

使用unicode对象的话，除了这样使用u标记，还可以使用unicode类以及字符串的encode和decode方法。如下测试代码：

# -*- coding: cp936 -*-
print type("你好")
>>> <type 'str'>

print type(u"你好")
>>> <type 'unicode'>

test='你好'
print "test:",test
>>>test: 你好

test0=unicode('你好', 'gbk')
print "test0:",test0
>>>test0: 你好

test1=unicode('你好', 'gb2312')
print "test1:",test1
>>>test1: 你好

test9 = unicode ('你好', 'gbk').encode('utf8')
print "test9:",test9
>>>test9: 浣犲ソ


test10 = unicode ('你好', 'gbk').encode('gbk')
print "test10:",test10
>>>test10: 你好


test2=u'你好'.encode('utf8')
print "test2:",test2
>>>test2: 浣犲ソ


test3=u'你好'.encode('utf8').decode('gbk')
print "test3:",test3
>>>test3: 浣犲ソ

test4=u'你好'.encode('utf8').decode('utf8')
print "test4:",test4
>>>test4: 你好

test6="你好".decode('gbk')
print "test6:",test6
>>>test6: 你好


test7="你好".decode('gbk').encode('utf8')
print "test7:",test7
>>>test7: 浣犲ソ


test8="你好".decode('gbk').encode('gbk')
print "test8:",test8
>>>test8: 你好

testA='\xc4\xe3\xba\xc3'
print "testA:",testA
>>>testA: 你好


testB='\xe4\xbd\xa0\xe5\xa5\xbd'
print "testB:",testB
>>>testB: 浣犲ソ

testC='\xe4\xbd\xa0\xe5\xa5\xbd'
print "testC:",testC
>>>testC: 浣犲ソ

type(‘你好’)，会得到<type ‘str’>，而type(u’你好’)，则会得到<type ‘unicode’>，也就是在字符前面加u就表明这是一个unicode对象，这个字会以unicode格式存在于内存中，而如果不加u，表明这仅仅是一个使用某种编码的字符串，编码格式取决于python对源码文件编码的识别，这里是gbk。
unicode类的构造函数接受一个字符串参数和一个编码参数，将字符串封装为一个unicode，比如在这里，由于我们用的是gbk编码，所以unicode中的编码参数使用gbk将字符封装为unicode对象，然后正确输出到控制台。

参考连接：Python中文编码问题(字符串前面加’u’)

激动的兔子

关注

2
点赞
踩
2

收藏

觉得还不错? 一键收藏
打赏
1
评论
Python2.7输出中文显示乱码问题笔记_UnicodeDecodeError: ‘utf8‘ codec can‘t decode byte 0xc4 in position 0

Python中文编码问题(字符串前面加'u')
复制链接

扫一扫