python中unicode_Unicode 范围以及python中生成所有Unicode的方法

最新推荐文章于 2023-02-06 06:30:00 发布

weixin_39942033

最新推荐文章于 2023-02-06 06:30:00 发布

阅读量688

点赞数

文章标签： python中unicode

Unicode范围和表示语言

Unicode是一个通用的字符集，包含了65535个字符。计算机在处理特殊字符(除了ASCII表以外的所有字符)时都是把Unicode按照一种编码来保存的。当然了，unicode的统一花了不少人的精力，而且不同编码到今天还有一些不兼容的问题，不过平常的代码中了解一些基础也就够了。

Unicode字符表示语言的范围参考下文：

http://www.cnblogs.com/chenwenbiao/archive/2011/08/17/2142718.html

中文(包括日文韩文同用)的范围：

Python生成所有Unicode

python2 版本：

defprint_unicode(start, end):

with open('unicode_set.txt', 'w') as f:

loc_start=start

ct=0while loc_start <=end:try:

ustr= hex(loc_start)[2:]

od= (4 - len(ustr)) * '0' + ustr #前补0

ustr =unichr(loc_start) #'\u' + od

index = loc_start - start + 1f.write(str(index)+ '\t' + '0x' + od + '\t' + ustr.encode('utf-8', 'ignore'))

loc_start= loc_start + 1

exceptException as e:

traceback.print_exc()

loc_start+= 1

print(loc_start)

由于python3对编码的处理方式变化(str和unicode合并，去掉unicode关键字；bytes替代python2的str)，上述代码python2不能使用

python3版本如下

importtracebackdefprint_unicode3(start, end):#'wb' must be set, or f.write(str) will report error

with open('unicode_set.txt', 'wb') as f:

loc_start=start

ct=0while loc_start <=end:try:

tmpstr= hex(loc_start)[2:]

od= (4 - len(tmpstr)) * '0' + tmpstr #前补0

ustr = chr(loc_start) # index = loc_start - start + 1line= (str(index) + '\t' + '0x' + od + '\t' + ustr + '\r\n').encode('utf-8')f.write(line)

loc_start= loc_start + 1

exceptException as e:

traceback.print_exc()

loc_start+= 1

print(loc_start)defexpect_test(expected, actual):if expected !=actual:print('expected', expected, 'actual', actual)#测试：

print_unicode3(0x4e00, 0x9fbf)

expect_test('在', '\u5728')

生成结果

中文

可以看到有些是不能显示的。

weixin_39942033

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。