Python2和Pyhton3编码问题

本文探讨了Python2和Python3之间的字符串编码处理差异,包括ASCII、Unicode、UTF-8等编码形式,以及Python2中的str与unicode转换、乱码问题。Python2默认编码为ASCII,而Python3采用Unicode作为默认编码。文章还提醒读者在处理编码问题时要确保编码一致,以避免乱码。
摘要由CSDN通过智能技术生成

Python有两种不同的字符串,一种存储文本,一种存储字节。对于文本,Python内部采用Unicode存储,而字节字符串显示原始字节序列或者ASCII

https://blog.csdn.net/yanghuan313/article/details/63262477/

 

b' '   decode()  u' '                         

u' '   encode()  b' '

python2和python3冲突时把str看作unicode还是bytes

decode-encode

Python 2.7.5 (default, Aug  4 2017, 00:39:18) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s=u'中'
>>> type(s)
<type 'unicode'>
>>> s1='中'
>>> type(s1)
<type 'str'>
>>> s
u'\u4e2d'
>>> s1
'\xe4\xb8\xad'
>>> s.encode('utf-8')
'\xe4\xb8\xad'
>>> s1.decode('utf-8')
u'\u4e2d'
>>> s.decode('utf-8')
shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u4e2d' in position 0: ordinal not in range(128)
>>> s1.encode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)
>>> s.encode('ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u4e2d' in position 0: ordinal not in range(128)
>>> s.encode('gbk')
'\xd6\xd0'
>>> s.encode('utf-16')
'\xff\xfe-N'
>>> 
Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:54:40) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> s='中'
>>> type(s)
<class 'str'>
>>> s=u'中'
>>> type(s)
<class 'str'>
>>> s.encode('utf-8')
b'\xe4\xb8\xad'
>>> s.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'decode'
>>> s1=b"中"
  File "<stdin>", line 1
SyntaxError: bytes can only contain ASCII literal characters.
>>> s1=s.encode('utf-8')
>>> s1
b'\xe4\xb8\xad'
>>> s1.decode('utf-8')
'中'
>>> s1.encode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'bytes' object has no attribute 'encode'
>>>

 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 Python2中str是存储字节串的,我们平时说的'字符串'字节存储为字节串了, 

'字符串’什么编码方式编码转过来,须知才能用正确的编码方式解码过来

带转换为01字节串的有:unicode,都有encode方法,编码为字节码就为str存储了;
怎么显示出来得看在存储中得到了这个字节串后要怎么处理,用什么编码方式解码,
经过多少次编码解码



Traceback (most recent call last): File "test.py", line 28, in <module> fp.write("%d:%s\r\n"%(sClassid,sClassName))UnicodeEncodeError: 'ascii' codec can't encode character u'\uff08' in position 12: ordinal not in range(128

#! /usr/bin/python # -*- coding: utf-8 -*- import sys print sys.getdefaultencoding();

运行上面的程序提示

ascii

 

原来如此,在程序的头部加上

import sys reload(sys) sys.setdefaultencoding('utf-8')

 

python2.7是基于ascii去处理字符流,当字符流不属于ascii范围内,就会抛出异常(ordinal not in range(128))。



Python2默认编码ASCII,而且内部默认自动先进行编码ASCII,
Python2自作聪明

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值