CharacterData类是一个抽象类,这个抽象类中定义了许多判断字符属性的抽象方法,这些方法的具体实现都在Character0X类中。其实Character类中有许多对应的方法,CharacterData子类实现抽象类的方法来实现字符属性的判断。我们并不关心这个字符由哪个具体类中的方法来判断,如果以后还增加了一些增补字符,那么只需要实现抽象类并且稍加修改of()方法即可。这就是使用策略模式的好处。
package java.lang;
abstract class CharacterData {
abstract int getProperties(int ch);
abstract int getType(int ch);
abstract boolean isWhitespace(int ch);
abstract boolean isMirrored(int ch);
abstract boolean isJavaIdentifierStart(int ch);
abstract boolean isJavaIdentifierPart(int ch);
abstract boolean isUnicodeIdentifierStart(int ch);
abstract boolean isUnicodeIdentifierPart(int ch);
abstract boolean isIdentifierIgnorable(int ch);
abstract int toLowerCase(int ch);
abstract int toUpperCase(int ch);
abstract int toTitleCase(int ch);
abstract int digit(int ch, int radix);
abstract int getNumericValue(int ch);
abstract byte getDirectionality(int ch);
//need to implement for JSR204
int toUpperCaseEx(int ch) {
return toUpperCase(ch);
}
char[] toUpperCaseCharArray(int ch) {
return null;
}
boolean isOtherLowercase(int ch) {
return false;
}
boolean isOtherUppercase(int ch) {
return false;
}
boolean isOtherAlphabetic(int ch) {
return false;
}
boolean isIdeographic(int ch) {
return false;
}
// Character <= 0xff (basic latin) is handled by internal fast-path
// to avoid initializing large tables.
// Note: performance of this "fast-path" code may be sub-optimal
// in negative cases for some accessors due to complicated ranges.
// Should revisit after optimization of table initialization.
static final CharacterData of(int ch) {
if (ch >>> 8 == 0) { // fast-path
return CharacterDataLatin1.instance;
} else {
switch(ch >>> 16) { //plane 00-16
case(0):
return CharacterData00.instance;
case(1):
return CharacterData01.instance;
case(2):
return CharacterData02.instance;
case(14):
return CharacterData0E.instance;
case(15): // Private Use
case(16): // Private Use
return CharacterDataPrivateUse.instance;
default:
return CharacterDataUndefined.instance;
}
}
}
}
那么Java是怎么判断这些字符的属性的呢?其实每一个Java字符都用一个32位,也就是4个字节来表示这个属性。
举例说明:
当我们传入一个'0'字符时,实际上通过static final CharacterData of(int ch),'0'对应ASCII码为48,方法判断后,最终会调用CharacterDataLatin1类中对应的方法去处理。
CharacterDataLatin1源码:
package java.lang;
/* The CharacterData class encapsulates the large tables found in
Java.lang.Character. /
class CharacterDataLatin1 extends CharacterData {
/* The character properties are currently encoded into 32 bits in the following manner:
1 bit mirrored property
4 bits directionality property
9 bits signed offset used for converting case
1 bit if 1, adding the signed offset converts the character to lowercase
1 bit if 1, subtracting the signed offset converts the character to uppercase
1 bit if 1, this character has a titlecase equivalent (possibly itself)
3 bits 0 may not be part of an identifier
1 ignorable control; may continue a Unicode identifier or Java identifier
2 may continue a Java identifier but not a Unicode identifier (unused)
3 may conti