I stumbled over this (again) today:
class Test {
char ok = '\n';
char okAsWell = '\u000B';
char error = '\u000A';
}
It does not compile:
Invalid character constant in line 4.
The compiler seems to insist that I write '\n' instead. I see no reason for this, yet it's very annoying.
Is there a logical explanation why characters that have a special notation (like \t, \n, \r) must be expressed in that form in Java source?
解决方案
Unicode characters are replaced by their value, so your line is replaced by the compiler with:
char error = '
';
which is not a valid Java statement.
This is dictated by the Language Specification:
A compiler for the Java programming language ("Java compiler") first recognizes Unicode escapes in its input, translating the ASCII characters \u followed by four hexadecimal digits to the UTF-16 code unit (§3.1) of the indicated hexadecimal value, and passing all other characters unchanged. Representing supplementary characters requires two consecutive Unicode escapes. This translation step results in a sequence of Unicode input characters.
This can lead to surprising stuff, for example, this is a valid Java program (it contains hidden unicode characters) - courtesy of Peter Lawrey:
public static void main(String[] args) {
for (char ch = 0; ch < Character.MAX_VALUE; ch++) {
if (Character.isJavaIdentifierPart(ch) && !Character.isJavaIdentifierStart(ch)) {
System.out.printf("%04x %n", (int) ch, "" + ch);
}
}
}