cmd改变java编码,在javac中指定编码产生与更改Windows CMD中的活动代码页然后直接编译相同的结果？...

最新推荐文章于 2023-02-18 01:26:07 发布

weixin_39589241

最新推荐文章于 2023-02-18 01:26:07 发布

阅读量95

点赞数

文章标签： cmd改变java编码

I am trying to compile a piece of Java code in Windows CMD using Windows-1250 encoding, and I can't seem to get the -encoding option to work right.

The compiler just doesn't seem to use the specified encoding unless there are illegal characters, in which case it just displays the error message. Otherwise it uses the active code page anyway.

In particular, I am trying to display a string containing Albanian characters, specifically 'ë'.

The string I need to display is as follows:

Hëllë Wërld

Here are the commands I am using and the output they produce:

chcp

Output: Active code page: 437

javac -encoding Windows-1250 AlbanianHello.java

java AlbanianHello

Output: Hδllδ Wδrld

As you can see, it still uses the default encoding, which is Cp437, even though I specified the encoding I wish to use.

Now this is what happens when I change the code page to 1250 and then compile without specifying the encoding:

chcp 1250

Output: Active code page: 1250

javac AlbanianHello.java

java AlbanianHello

Output: Hëllë Wërld

Seems to work properly.

Specifying the encoding in this case yields the same results:

chcp 1250

Output: Active code page: 1250

javac -encoding Windows-1250 AlbanianHello.java

java AlbanianHello

Output: Hëllë Wërld

So does it just completely ignore my specified encoding? Not quite. When I try to use the encoding that is not supposed to work with my string, it displays a bunch of error messages:

javac -encoding UTF8 AlbanianHello.java

Output: AlbanianHello.java:5: error: unmappable character for encoding UTF8

System.out.println("H?ll? W?rld");

...

3 errors

My question is:

Why does it ignore the encoding when it should theoretically work, and doesn't ignore it when it shouldn't work?

I would also like to know if there is any difference in the result between these commands:

chcp 1250

javac AlbanianHello.java

And these ones:

chcp 1250

javac -encoding Windows-1250 AlbanianHello.java

解决方案

Welcome to the site! The javac encoding option sets how javac will map the bytes in your source file to Unicode characters, since Java uses Unicode internally. The chcp command sets how the Windows console will map bytes of output to glyphs in a font. Java doesn't know or care about chcp, and vice versa. If both match, all is well. If not...

In your first example, Java correctly interprets your Windows-1250 source. Character ë is U+00EB. When that byte (0xEB) is output to a code-page 437 terminal, the displayed result is what byte 0xEB means in cp437, regardless of what you thought you wanted to display. Per the CP437 character table, that is lowercase delta, δ. (Just to highlight the difference, δ is U+03B4 in Unicode.)

For completeness, it turns out to be less than easy to find out what the default encoding for javac is. The docs for Charset say that:

The default charset is determined during virtual-machine startup and typically depends upon the locale and charset being used by the underlying operating system.

Based on the behaviour you saw, I am guessing javac on your system is reading the code page from the console and using that as the default. Either that, or the default is a code page in which ë = 0xEB (e.g., CP1252 or ISO 8859-1, either of which might be the default depending on your configuration (as far as I know)).

Edit On my machine, the default is CP1252 (Java charset name windows-1252). I have put the code I used on GitHub.