python学习之编码

最新推荐文章于 2016-10-01 15:45:05 发布

sentimental_dog

最新推荐文章于 2016-10-01 15:45:05 发布

阅读量443

点赞数

分类专栏：机器学习

本文链接：https://blog.csdn.net/sentimental_dog/article/details/52614295

版权

机器学习专栏收录该内容

32 篇文章 0 订阅

订阅专栏

官方文档：https://docs.python.org/2/howto/unicode.html

python 默认编码

Python’s default encoding is the ‘ascii’ encoding. The rules for converting a Unicode string into the ASCII encoding are simple; for each code point:

If the code point is < 128, each byte is the same as the value of the code point.
If the code point is 128 or greater, the Unicode string can’t be represented in this encoding. (Python raises a UnicodeEncodeError exception in this case.)

UTF-8 is one of the most commonly used encodings. UTF stands for “Unicode Transformation Format”, and the ‘8’ means that 8-bit numbers are used in the encoding. (There’s also a UTF-16 encoding, but it’s less frequently used than UTF-8.)

如何使用unicode 字符

In Python source code, Unicode literals are written as strings prefixed with the ‘u’ or ‘U’ character:u'abcdefghijk'. Specific code points can be written using the \u escape sequence, which is followed by four hex digits giving the code point. The \U escape sequence is similar, but expects 8 hex digits, not 4.

如何防止python出现乱码：改变python文件默认编码格式

Python supports writing Unicode literals in any encoding, but you have to declare the encoding being used. This is done by including a special comment as either the first or second line of the source file:

 
  #!/usr/bin/env python
# -*- coding: latin-1 -*-

u = u'abcdé'
print ord(u[-1])

The syntax is inspired by Emacs’s notation for specifying variables local to a file. Emacs supports many different variables, but Python only supports ‘coding’. The -*- symbols indicate to Emacs that the comment is special; they have no significance to Python but are a convention. Python looks for coding: nameor coding=name in the comment.

If you don’t include such a comment, the default encoding used will be ASCII.

字符串的decode 方法

python字符串的decode方法能把制定的文档转化为需要的编码

比如content.decode('utf-8')

关于更改python的默认编码

#!/usr/bin/env python    
#encoding: utf-8  
import sys   #引用sys模块进来，并不是进行sys的第一次加载  
reload(sys)  #重新加载sys  
sys.setdefaultencoding('utf8')  ##调用setdefaultencoding函数

可以正确的执行，可是下面的代码会出错

#!/usr/bin/env python    
#encoding: utf-8  
import sys     
sys.setdefaultencoding('utf8')

要在调用setdefaultencoding时必须要先reload一次sys模块，因为这里的import语句其实并不是sys的第一次导入语句，也就是说这里其实可能是第二、三次进行sys模块的import，这里只是一个对sys的引用，只能reload才能进行重新加载。

那么为什么要重新加载，而直接引用过来则不能调用该函数呢？因为setdefaultencoding函数在被系统调用后被删除了，所以通过import引用进来时其实已经没有了，所以必须reload一次sys模块，这样setdefaultencoding才会为可用，才能在代码里修改解释器当前的字符编码。

在python安装目录的Lib文件夹下，有一个叫site.py的文件，在里面可以找到main() --> setencoding()-->sys.setdefaultencoding(encoding),因为这个site.py每次启动python解释器时会自动加载，所以main函数每次都会被执行，setdefaultencoding函数一出来就已经被删除了。

sentimental_dog

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python学习之编码

官方文档：https://docs.python.org/2/howto/unicode.htmlpython 默认编码 Python’s default encoding is the ‘ascii’ encoding. The rules for converting a Unicode string into the ASCII encoding are simpl
复制链接

扫一扫