python技巧31[unicode和bytes]

最新推荐文章于 2020-12-10 06:25:02 发布

weixin_33739541

最新推荐文章于 2020-12-10 06:25:02 发布

阅读量77

点赞数

文章标签： python

一 Python3 中字符串的类型

bytearray ([ source[, encoding[, errors]]] )

Return a new array of bytes. The bytearray type is a mutable sequence of integers in the range 0 <= x < 256.

bytes ([ source[, encoding[, errors]]] )

Return a new “bytes” object, which is an immutable sequence of integers in the range 0 <= x < 256. bytes is an immutable version of bytearray.

str ([ object[, encoding[, errors]]] )

Return a string version of an object. str默认为unicode的字符串。

貌似也没有了2.x中的basestring类型了。

二实例

# -*- coding: gbk -*-

def TestisStrOrUnicdeOrString():
  bs = b ' Hello '
  ustr = ' abc '
   print (isinstance(bs, str))   # False
   print (isinstance(bs,bytes)) # True
   print (isinstance(ustr,str)) # True
   print (isinstance(ustr, bytes)) # False
   print (isinstance(bs,(bytes,str))) # True

def TestChinese():
  us = ' 中国 '
  bs = b' AAA '
  bs2 = bytes( ' 中国 ' , ' gbk ' )

   print (us + ' : ' + str(type(us))) # 中国:<class 'str'>
   print (bs) #b'AAA'
   print (bs2) # b'\xd6\xd0\xb9\xfa'
   print ( ' : ' + str(type(bs2))) # :<class 'bytes'>
   print (bs2.decode( ' gbk ' )) # 中国

   # TypeError: Can't convert 'bytes' object to str implicitly
   # newstr = us + bs2

   print ( ' us == bs2 ' + ' : ' + str(us == bs2)) # us == bs2:False

  s3 = ' AAA中国 '
   print (s3) # AAA中国

  s4 = bytes( ' AAA中国 ' , ' gbk ' )
   print (s4) # b'AAA\xd6\xd0\xb9\xfa'

def TestPrint():
   print ( ' AAA ' + ' 中国 ' )   # AAA中国
   # print (b'AAA' + b'中国') #  # SyntaxError: bytes can only contain ASCII literal characters.
   # print ('AAA' + bytes('中国','gbk')) # TypeError: Can't convert 'bytes' object to str implicitly

def TestCodecs():
     import codecs

    look   = codecs.lookup( " gbk " )

    a = bytes( " 北京 " , ' gbk ' )

     print (len(a), a, type(a)) # 4 b'\xb1\xb1\xbe\xa9' <class 'bytes'>

    b = look.decode(a)
     print (b[ 1 ], b[0], type(b[0])) # 4 北京 <class 'str'>


if __name__ == ' __main__ ' :
    TestisStrOrUnicdeOrString()
    TestChinese()
    TestPrint()
    TestCodecs()

三总结

1） Python 3会假定我们的源码 — 即.py文件 — 使用的是UTF-8编码方式。Python 2里，.py文件默认的编码方式为ASCII。可以使用# -*- coding: windows-1252 -*-方式来改变文件的编码。如果py文件中包含中文的字符串，则需要制定为# -*- coding: gbk -*-，貌似默认的utf8不够哦。

2） python3中默认的str为unicode的，可以使用str.encode来转为bytes类型。

3） python3的print函数只支持unicode的str，貌似没有对bytes的解码功能，所以对对不能解码的bytes不能正确输出。

4） str和bytes不能连接和比较。

5） codecs任然可以用来str和bytes间的转化。

6）定义非ascii码的bytes时，必须使用如 bytes('中国','gbk') 来转码。

7) 貌似必须在中文系统或者系统安装中文的语言包后gbk解码才能正常工作。

python 2.6 的字符及编码转化见：http://www.cnblogs.com/itech/archive/2011/03/27/1996883.html

完！

weixin_33739541

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python技巧31[unicode和bytes]

一 Python3 中字符串的类型 bytearray([source[, encoding[, errors]]]) Return a new array of bytes. The bytearray type is a mutable sequence of integers in the range 0 &lt;= x &lt; 256. bytes([source[, enc...
复制链接

扫一扫