官方文档 :https://docs.python.org/2/howto/unicode.html
python 默认编码
Python’s default encoding is the ‘ascii’ encoding. The rules for converting a Unicode string into the ASCII encoding are simple; for each code point:
- If the code point is < 128, each byte is the same as the value of the code point.
- If the code point is 128 or greater, the Unicode string can’t be represented in this encoding. (Python raises a
UnicodeEncodeError
exception in this case.)
如何使用unicode 字符
In Python source code, Unicode literals are written as strings prefixed with the ‘u’ or ‘U’ character:u'abcdefghijk'
. Specific code points can be written using the \u
escape sequence, which is followed by four hex digits giving the code point. The \U
escape sequence is similar, but expects 8 hex digits, not 4.
如何防止python出现乱码:改变python文件默认编码格式
Python supports writing Unicode literals in any encoding, but you have to declare the encoding being used. This is done by including a special comment as either the first or second line of the source file:
The syntax is inspired by Emacs’s notation for specifying variables local to a file. Emacs supports many different variables, but Python only supports ‘coding’. The -*-
symbols indicate to Emacs that the comment is special; they have no significance to Python but are a convention. Python looks for coding: name
or coding=name
in the comment.
If you don’t include such a comment, the default encoding used will be ASCII.
字符串的decode 方法
python字符串的decode方法能把制定的文档转化为需要的编码
比如content.decode('utf-8')
关于更改python的默认编码
#!/usr/bin/env python
#encoding: utf-8
import sys #引用sys模块进来,并不是进行sys的第一次加载
reload(sys) #重新加载sys
sys.setdefaultencoding('utf8') ##调用setdefaultencoding函数
可以正确的执行,可是下面的代码会出错
#!/usr/bin/env python
#encoding: utf-8
import sys
sys.setdefaultencoding('utf8')
要在调用setdefaultencoding时必须要先reload一次sys模块,因为这里的import语句其实并不是sys的第一次导入语句,也就是说这里其实可能是第二、三次进行sys模块的import,这里只是一个对sys的引用,只能reload才能进行重新加载。
那么为什么要重新加载,而直接引用过来则不能调用该函数呢?因为setdefaultencoding函数在被系统调用后被删除了,所以通过import引用进来时其实已经没有了,所以必须reload一次sys模块,这样setdefaultencoding才会为可用,才能在代码里修改解释器当前的字符编码。
在python安装目录的Lib文件夹下,有一个叫site.py的文件,在里面可以找到main() --> setencoding()-->sys.setdefaultencoding(encoding),因为这个site.py每次启动python解释器时会自动加载,所以main函数每次都会被执行,setdefaultencoding函数一出来就已经被删除了。