Unicode

5 篇文章 0 订阅
打开网址http://inamidst.com/stuff/unidata/
可以查看unicode以及对应的字符:

点击选择一个字符后,会转到http://www.fileformat.info这个网址,这个网站上会显示该字符的 详细信息,包Unicode Data,Encodings,在html/c/c++/java/python 语言中的编码信息。
比如下面是美元符号的信息:
Unicode Data
Name DOLLAR SIGN
Block Basic Latin
Category Symbol, Currency [Sc]
Combine 0
BIDI European Number Terminator [ET]
Mirror N
Index entries milreis
DOLLAR SIGN
escudo
Comments milreis, escudo
glyph may have one or two vertical bars
other currency symbol characters: U+20A0-U+20B8
See Also currency sign U+00A4
heavy dollar sign U+1F4B2
Version Unicode 1.1.0 (June, 1993)
Encodings
HTML Entity (decimal) $
HTML Entity (hex) $
How to type in Microsoft Windows Alt +0024
Alt 036
Alt 36
UTF-8 (hex) 0x24 (24)
UTF-8 (binary) 00100100
UTF-16 (hex) 0x0024 (0024)
UTF-16 (decimal) 36
UTF-32 (hex) 0x00000024 (0024)
UTF-32 (decimal) 36
C/C++/Java source code "\u0024"
Python source code u"\u0024"
More...
Java Data
string.toUpperCase() $
string.toLowerCase() $
Character.UnicodeBlock BASIC_LATIN
Character.charCount() 1
Character.getDirectionality() DIRECTIONALITY_EUROPEAN_NUMBER_TERMINATOR [5]
Character.getNumericValue() -1
Character.getType() 26
Character.isDefined() Yes
Character.isDigit() No
Character.isIdentifierIgnorable() No
Character.isISOControl() No
Character.isJavaIdentifierPart() Yes
Character.isJavaIdentifierStart() Yes
Character.isLetter() No
Character.isLetterOrDigit() No
Character.isLowerCase() No
Character.isMirrored() No
Character.isSpaceChar() No
Character.isSupplementaryCodePoint() No
Character.isTitleCase() No
Character.isUnicodeIdentifierPart() No
Character.isUnicodeIdentifierStart() No
Character.isUpperCase() No
Character.isValidCodePoint() Yes
Character.isWhitespace() No

wiki 上code point的解释:
In  character encoding  terminology, a  code point  or  code position  is any of 
the numerical values that make up the  code space  (or  code page ). [1]  

For example,  ASCII comprises 128 code points in the range 0 hex  to 7F hex
Extended ASCII  comprises 256 code points in the range 0 hex  to FF hex , and 
Unicode  comprises 1,114,112 code points in the range 0 hex  to 10FFFF hex .
The Unicode code space is divided into seventeen  planes  (the basic multilingual 
plane,  and 16 supplementary planes), each with 65,536 (= 2 16 ) code points. 
Thus the total size of the Unicode code space is 17 × 65,536 = 1,114,112.

在Python中,可以通过unicode name的取得相应的字符,如可以通过名字'dollar sign',
来得到dollar符号:
----------------------------------------------------------------------------------------------------------
>>> dollar = u"\N{dollar sign}"
>>> print dollar
$

----------------------------------------------------------------------------------------------------------

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值