Python2和Pyhton3编码问题

本文链接：https://blog.csdn.net/JackLiu16/article/details/78231515

本文探讨了Python2和Python3之间的字符串编码处理差异，包括ASCII、Unicode、UTF-8等编码形式，以及Python2中的str与unicode转换、乱码问题。Python2默认编码为ASCII，而Python3采用Unicode作为默认编码。文章还提醒读者在处理编码问题时要确保编码一致，以避免乱码。

摘要由CSDN通过智能技术生成

Python有两种不同的字符串，一种存储文本，一种存储字节。对于文本，Python内部采用Unicode存储，而字节字符串显示原始字节序列或者ASCII

https://blog.csdn.net/yanghuan313/article/details/63262477/

b' ' decode() u' '

u' ' encode() b' '

python2和python3冲突时把str看作unicode还是bytes

decode-encode

Python 2.7.5 (default, Aug  4 2017, 00:39:18) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s=u'中'
>>> type(s)
<type 'unicode'>
>>> s1='中'
>>> type(s1)
<type 'str'>
>>> s
u'\u4e2d'
>>> s1
'\xe4\xb8\xad'
>>> s.encode('utf-8')
'\xe4\xb8\xad'
>>> s1.decode('utf-8')
u'\u4e2d'
>>> s.decode('utf-8')
shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u4e2d' in position 0: ordinal not in range(128)
>>> s1.encode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)
>>> s.encode('ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u4e2d' in position 0: ordinal not in range(128)
>>> s.encode('gbk')
'\xd6\xd0'
>>> s.encode('utf-16')
'\xff\xfe-N'
>>>

Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:54:40) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> s='中'
>>> type(s)
<class 'str'>
>>> s=u'中'
>>> type(s)
<class 'str'>
>>> s.encode('utf-8')
b'\xe4\xb8\xad'
>>> s.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'decode'
>>> s1=b"中"
  File "<stdin>", line 1
SyntaxError: bytes can only contain ASCII literal characters.
>>> s1=s.encode('utf-8')
>>> s1
b'\xe4\xb8\xad'
>>> s1.decode('utf-8')
'中'
>>> s1.encode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'bytes' object has no attribute 'encode'
>>>

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Python2中str是存储字节串的，我们平时说的'字符串'字节存储为字节串了，

'字符串’什么编码方式编码转过来，须知才能用正确的编码方式解码过来

带转换为01字节串的有：unicode,都有encode方法，编码为字节码就为str存储了；
怎么显示出来得看在存储中得到了这个字节串后要怎么处理，用什么编码方式解码，
经过多少次编码解码

Traceback (most recent call last): File "test.py", line 28, in <module> fp.write("%d:%s\r\n"%(sClassid,sClassName))UnicodeEncodeError: 'ascii' codec can't encode character u'\uff08' in position 12: ordinal not in range(128

#! /usr/bin/python # -*- coding: utf-8 -*- import sys print sys.getdefaultencoding();

运行上面的程序提示

ascii

原来如此，在程序的头部加上

import sys reload(sys) sys.setdefaultencoding('utf-8')

python2.7是基于ascii去处理字符流，当字符流不属于ascii范围内，就会抛出异常（ordinal not in range(128)）。

Python2默认编码ASCII，而且内部默认自动先进行编码ASCII，
Python2自作聪明为