Python有两种不同的字符串,一种存储文本,一种存储字节。对于文本,Python内部采用Unicode存储,而字节字符串显示原始字节序列或者ASCII
https://blog.csdn.net/yanghuan313/article/details/63262477/
b' ' decode() u' '
u' ' encode() b' '
python2和python3冲突时把str看作unicode还是bytes
Python 2.7.5 (default, Aug 4 2017, 00:39:18)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s=u'中'
>>> type(s)
<type 'unicode'>
>>> s1='中'
>>> type(s1)
<type 'str'>
>>> s
u'\u4e2d'
>>> s1
'\xe4\xb8\xad'
>>> s.encode('utf-8')
'\xe4\xb8\xad'
>>> s1.decode('utf-8')
u'\u4e2d'
>>> s.decode('utf-8')
shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u4e2d' in position 0: ordinal not in range(128)
>>> s1.encode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)
>>> s.encode('ascii')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u4e2d' in position 0: ordinal not in range(128)
>>> s.encode('gbk')
'\xd6\xd0'
>>> s.encode('utf-16')
'\xff\xfe-N'
>>>
Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:54:40) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> s='中'
>>> type(s)
<class 'str'>
>>> s=u'中'
>>> type(s)
<class 'str'>
>>> s.encode('utf-8')
b'\xe4\xb8\xad'
>>> s.decode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'decode'
>>> s1=b"中"
File "<stdin>", line 1
SyntaxError: bytes can only contain ASCII literal characters.
>>> s1=s.encode('utf-8')
>>> s1
b'\xe4\xb8\xad'
>>> s1.decode('utf-8')
'中'
>>> s1.encode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'bytes' object has no attribute 'encode'
>>>
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Python2中str是存储字节串的,我们平时说的'字符串'字节存储为字节串了,
'字符串’什么编码方式编码转过来,须知才能用正确的编码方式解码过来
带转换为01字节串的有:unicode,都有encode方法,编码为字节码就为str存储了;
怎么显示出来得看在存储中得到了这个字节串后要怎么处理,用什么编码方式解码,
经过多少次编码解码
Traceback (most recent call last): File "test.py", line 28, in <module> fp.write("%d:%s\r\n"%(sClassid,sClassName))UnicodeEncodeError: 'ascii' codec can't encode character u'\uff08' in position 12: ordinal not in range(128
#! /usr/bin/python # -*- coding: utf-8 -*- import sys print sys.getdefaultencoding();
运行上面的程序提示
ascii
原来如此,在程序的头部加上
import sys reload(sys) sys.setdefaultencoding('utf-8')
python2.7是基于ascii去处理字符流,当字符流不属于ascii范围内,就会抛出异常(ordinal not in range(128))。
Python2默认编码ASCII,而且内部默认自动先进行编码ASCII,
Python2自作聪明为