python中文字符_python处理中文字符

weixin_39838829

于 2020-11-24 10:48:04 发布

阅读量400

点赞数

文章标签： python中文字符

本文介绍了如何在Python中正确处理包含中文字符的文件，通过例子展示了如何在文件头声明UTF-8编码，以及如何将str和unicode字符串转换、展示和存储。特别强调了非ASCII字符的处理和编码的重要性。

摘要由CSDN通过智能技术生成

1.在py文件中使用中文字符

unicode.py文件内容如下所示：

# -*- coding:utf-8 -*-

str_ch = '我们women'

uni_ch = u'我们women'

print "type:", type(str_ch), "content:", str_ch, repr(str_ch)

print "type:", type(uni_ch), "content:", uni_ch, repr(uni_ch)

需要在文件第一行输入以下内容：“# -*- coding: utf-8 -*-"，否则在执行时将会抛出如下异常信息。

SyntaxError: Non-ASCII character '\xe6' in file unicode.py on line 3, but no encoding declared;

在声明编码类别后，执行结果如下：

type: content: 我们women '\xe6\x88\x91\xe4\xbb\xacwomen'

type: content: 我们women u'\u6211\u4eecwomen'

使用命令“od -t c unicode.py”查看文件在硬盘上的内容如下：

0000000 # - * - c o d i n g : u t f

0000020 - 8 - * - \n \n s t r _ c h =

0000040 ' 346 210 221 344 273 254 w o m e n ' \n u

0000060 n i _ c h = u ' 346 210 221 344 273 254

0000100 w o m e n ' \n \n p r i n t " t

0000120 y p e : " , t y p e ( s t r _

0000140 c h ) , " c o n t e n t : " ,

0000160 s t r _ c h , r e p r ( s

0000200 t r _ c h ) \n p r i n t " t y

0000220 p e : " , t y p e ( u n i _ c

0000240 h ) , " c o n t e n t : " ,

0000260 u n i _ c h , r e p r ( u n

0000300 i _ c h ) \n

注：346为8进制。

可以看到中文字符在硬盘中以utf-8形式保存，在执行时被python解释器读入内存，遇到非ascii字符时，需要用指定的编码进行转换。

2. Python中字符类型str和unicode

Unicode使用code point描述字符，一个code point就是一个整数值，16-bit。所以，unicode字符串就是一串code point。

书写的方式可以是：

uni_str = u"我们"

uni_str = u"\xac" # 2个16进制数表示

uni_str = u"\u1234" # 4个16进制数表示

uni_str = u"\U00008000" # 8个16进制数表示

str是8bit，从0-255。书写方式如下：

s = '0'

s = '\x30'

s = '\060'

s = chr(48)

# ord(s) 都是48

encoding：将unicode字符串转换成一串bytes（0-255）。

python默认的encoding和decoding都是ascii，当数值超过128时都将会报编码或解码错误。

weixin_39838829

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python中文字符_python处理中文字符

1.在py文件中使用中文字符unicode.py文件内容如下所示：# -*- coding:utf-8 -*-str_ch = '我们women'uni_ch = u'我们women'print "type:", type(str_ch), "content:", str_ch, repr(str_ch)print "type:", type(uni_ch), "content:", uni_c...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。