我一直在尝试从Java应用程序中检索“
unicode用户输入”,以获得一个小的实用程序片段。问题是,它似乎在“开箱即用”的Ubuntu上运行,我猜它在UTF-8上具有操作系统范围的编码,但是从“
cmd”运行时在Windows上不起作用。考虑的代码如下:
public class SerTest {
public static void main(String[] args) throws Exception {
testUnicode();
}
public static void testUnicode() throws Exception {
System.out.println("Default charset: " +
Charset.defaultCharset().name());
BufferedReader in =
new BufferedReader(new InputStreamReader(System.in, "UTF-8"));
System.out.printf("Enter 'абвгд эюя': ");
String line = in.readLine();
String s = "абвгд эюя";
byte[] sBytes = s.getBytes();
System.out.println("strg bytes: " + Arrays.toString(sBytes));
byte[] lineBytes = line.getBytes();
System.out.println("line bytes: " + Arrays.toString(lineBytes));
PrintStream out = new PrintStream(System.out, true, "UTF-8");
out.print("--->" + s + "
out.print("--->" + line + "
}
}
在Ubuntu上的输出(不对配置进行任何更改):
me@host> javac SerTest.java && java SerTest
Default charset: UTF-8
Enter 'абвгд эюя': абвгд эюя
strg bytes: [-48, -80, -48, -79, -48, -78, -48, -77, -48, -76, 32, -47, -115, -47, -114, -47, -113]
line bytes: [-48, -80, -48, -79, -48, -78, -48, -77, -48, -76, 32, -47, -115, -47, -114, -47, -113]
--->абвгд эюя
--->абвгд эюя
在Windows CMD提示符下输出(绝不受JAVA_TOOL_OPTIONS影响):
E:\>chcp 65001
Active code page: 65001
E:\>java -Dfile.encoding=utf8 SerTest
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=utf8
Default charset: UTF-8
Enter 'абвгд эюя': юя': ': абвгд эюя
strg bytes: [-48, -80, -48, -79, -48, -78, -48, -77, -48, -76, 32, -47, -115, -47, -114, -47, -113]
Exception in thread "main" java.lang.NullPointerException
at SerTest.testUnicode(SerTest.java:26) # byte[] lineBytes = line.getBytes();
at SerTest.main(SerTest.java:15)
在Eclipse控制台中的输出(使用JAVA_TOOL_OPTIONS之后):
Default charset: UTF-8
Enter 'абвгд эюя': абвгд эюя
strg bytes: [-48, -80, -48, -79, -48, -78, -48, -77, -48, -76, 32, -47, -115, -47, -114, -47, -113]
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=utf8
line bytes: [-48, -80, -48, -79, -48, -78, -48, -77, -48, -76, 32, -47, -115, -47, -114, -47, -113]
--->абвгд эюя
--->абвгд эюя
在Eclipse控制台上,它可以正常工作是因为我添加了一个系统范围的环境变量(JAVA_TOOL_OPTIONS),如果可能的话,我会避免该变量。
在Eclipse控制台中的输出( 删除 JAVA_TOOL_OPTIONS之后):
Default charset: UTF-8
Enter 'абвгд эюя': абвгд эюя
strg bytes: [-48, -80, -48, -79, -48, -78, -48, -77, -48, -76, 32, -47, -115, -47, -114, -47, -113]
line bytes: [-61, -112, -62, -80, -61, -112, -62, -79, -61, -112, -62, -78, -61, -112, -62, -77, -61, -112, -62, -76, 32, -61, -111, -17, -65, -67, -61, -111, -59, -67, -61, -111, -17, -65, -67]
--->абвгд эюя
--->абвгд �ю�
所以我的问题是:这 到底 是怎么回事?为确保此代码段适用于各种“ Unicode”输入,需要进行哪些代码更改?
很抱歉长期困扰您,并预先感谢
佐助