Unicode

最新推荐文章于 2024-08-29 07:50:02 发布

feixingfei

最新推荐文章于 2024-08-29 07:50:02 发布

阅读量740

点赞数

分类专栏： Web python 文章标签： hex microsoft python basic each html

本文链接：https://blog.csdn.net/feixingfei/article/details/7012328

版权

40 篇文章 0 订阅

订阅专栏

5 篇文章 0 订阅

订阅专栏

打开网址http://inamidst.com/stuff/unidata/

可以查看unicode以及对应的字符：

点击选择一个字符后，会转到http://www.fileformat.info这个网址，这个网站上会显示该字符的详细信息，包Unicode Data，Encodings，在html/c/c++/java/python 语言中的编码信息。

比如下面是美元符号的信息：

Unicode Data
Name	DOLLAR SIGN
Block	Basic Latin
Category	Symbol, Currency [Sc]
Combine	0
BIDI	European Number Terminator [ET]
Mirror	N
Index entries	milreis DOLLAR SIGN escudo
Comments	milreis, escudo glyph may have one or two vertical bars other currency symbol characters: U+20A0-U+20B8
See Also	currency sign U+00A4 heavy dollar sign U+1F4B2
Version	Unicode 1.1.0 (June, 1993)

Encodings
HTML Entity (decimal)	$
HTML Entity (hex)	$
How to type in Microsoft Windows	Alt +0024 Alt 036 Alt 36
UTF-8 (hex)	0x24 (24)
UTF-8 (binary)	00100100
UTF-16 (hex)	0x0024 (0024)
UTF-16 (decimal)	36
UTF-32 (hex)	0x00000024 (0024)
UTF-32 (decimal)	36
C/C++/Java source code	"\u0024"
Python source code	u"\u0024"
More...

Java Data
string.toUpperCase()	$
string.toLowerCase()	$
Character.UnicodeBlock	BASIC_LATIN
Character.charCount()	1
Character.getDirectionality()	DIRECTIONALITY_EUROPEAN_NUMBER_TERMINATOR [5]
Character.getNumericValue()	-1
Character.getType()	26
Character.isDefined()	Yes
Character.isDigit()	No
Character.isIdentifierIgnorable()	No
Character.isISOControl()	No
Character.isJavaIdentifierPart()	Yes
Character.isJavaIdentifierStart()	Yes
Character.isLetter()	No
Character.isLetterOrDigit()	No
Character.isLowerCase()	No
Character.isMirrored()	No
Character.isSpaceChar()	No
Character.isSupplementaryCodePoint()	No
Character.isTitleCase()	No
Character.isUnicodeIdentifierPart()	No
Character.isUnicodeIdentifierStart()	No
Character.isUpperCase()	No
Character.isValidCodePoint()	Yes
Character.isWhitespace()	No

wiki 上code point的解释：

In character encoding terminology, a code point or code position is any of

the numerical values that make up the code space (or code page ). ^[1]

For example, ASCII comprises 128 code points in the range 0 _hex to 7F _hex ,

Extended ASCII comprises 256 code points in the range 0 _hex to FF _hex , and

Unicode comprises 1,114,112 code points in the range 0 _hex to 10FFFF _hex .

The Unicode code space is divided into seventeen planes (the basic multilingual

plane, and 16 supplementary planes), each with 65,536 (= 2 ¹⁶ ) code points.

Thus the total size of the Unicode code space is 17 × 65,536 = 1,114,112.

在Python中，可以通过unicode name的取得相应的字符，如可以通过名字'dollar sign',

来得到dollar符号：

----------------------------------------------------------------------------------------------------------

>>> dollar = u"\N{dollar sign}"
>>> print dollar
$

----------------------------------------------------------------------------------------------------------

关注

专栏目录