python学习之 编码

官方文档 :https://docs.python.org/2/howto/unicode.html

python 默认编码 

Python’s default encoding is the ‘ascii’ encoding. The rules for converting a Unicode string into the ASCII encoding are simple; for each code point:

  1. If the code point is < 128, each byte is the same as the value of the code point.
  2. If the code point is 128 or greater, the Unicode string can’t be represented in this encoding. (Python raises a UnicodeEncodeError exception in this case.)
UTF-8 is one of the most commonly used encodings. UTF stands for “Unicode Transformation Format”, and the ‘8’ means that 8-bit numbers are used in the encoding. (There’s also a UTF-16 encoding, but it’s less frequently used than UTF-8.)


如何使用unicode 字符

In Python source code, Unicode literals are written as strings prefixed with the ‘u’ or ‘U’ character:u'abcdefghijk'. Specific code points can be written using the \u escape sequence, which is followed by four hex digits giving the code point. The \U escape sequence is similar, but expects 8 hex digits, not 4.


如何防止python出现乱码:改变python文件默认编码格式

Python supports writing Unicode literals in any encoding, but you have to declare the encoding being used. This is done by including a special comment as either the first or second line of the source file:

#!/usr/bin/env python
# -*- coding: latin-1 -*-

u = u'abcdé'
print ord(u[-1])

The syntax is inspired by Emacs’s notation for specifying variables local to a file. Emacs supports many different variables, but Python only supports ‘coding’. The -*- symbols indicate to Emacs that the comment is special; they have no significance to Python but are a convention. Python looks for coding: nameor coding=name in the comment.

If you don’t include such a comment, the default encoding used will be ASCII.


字符串的decode 方法

python字符串的decode方法能把制定的文档转化为需要的编码

比如content.decode('utf-8')


关于更改python的默认编码

#!/usr/bin/env python    
#encoding: utf-8  
import sys   #引用sys模块进来,并不是进行sys的第一次加载  
reload(sys)  #重新加载sys  
sys.setdefaultencoding('utf8')  ##调用setdefaultencoding函数

可以正确的执行,可是下面的代码会出错

#!/usr/bin/env python    
#encoding: utf-8  
import sys     
sys.setdefaultencoding('utf8')

要在调用setdefaultencoding时必须要先reload一次sys模块,因为这里的import语句其实并不是sys的第一次导入语句,也就是说这里其实可能是第二、三次进行sys模块的import,这里只是一个对sys的引用,只能reload才能进行重新加载。

那么为什么要重新加载,而直接引用过来则不能调用该函数呢?因为setdefaultencoding函数在被系统调用后被删除了,所以通过import引用进来时其实已经没有了,所以必须reload一次sys模块,这样setdefaultencoding才会为可用,才能在代码里修改解释器当前的字符编码。

在python安装目录的Lib文件夹下,有一个叫site.py的文件,在里面可以找到main() --> setencoding()-->sys.setdefaultencoding(encoding),因为这个site.py每次启动python解释器时会自动加载,所以main函数每次都会被执行,setdefaultencoding函数一出来就已经被删除了。



  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值