cmd改变java编码,在javac中指定编码产生与更改Windows CMD中的活动代码页然后直接编译相同的结果?...

I am trying to compile a piece of Java code in Windows CMD using Windows-1250 encoding, and I can't seem to get the -encoding option to work right.

The compiler just doesn't seem to use the specified encoding unless there are illegal characters, in which case it just displays the error message. Otherwise it uses the active code page anyway.

In particular, I am trying to display a string containing Albanian characters, specifically 'ë'.

The string I need to display is as follows:

Hëllë Wërld

Here are the commands I am using and the output they produce:

chcp

Output: Active code page: 437

javac -encoding Windows-1250 AlbanianHello.java

java AlbanianHello

Output: Hδllδ Wδrld

As you can see, it still uses the default encoding, which is Cp437, even though I specified the encoding I wish to use.

Now this is what happens when I change the code page to 1250 and then compile without specifying the encoding:

chcp 1250

Output: Active code page: 1250

javac AlbanianHello.java

java AlbanianHello

Output: Hëllë Wërld

Seems to work properly.

Specifying the encoding in this case yields the same results:

chcp 1250

Output: Active code page: 1250

javac -encoding Windows-1250 AlbanianHello.java

java AlbanianHello

Output: Hëllë Wërld

So does it just completely ignore my specified encoding? Not quite. When I try to use the encoding that is not supposed to work with my string, it displays a bunch of error messages:

javac -encoding UTF8 AlbanianHello.java

Output: AlbanianHello.java:5: error: unmappable character for encoding UTF8

System.out.println("H?ll? W?rld");

^

...

3 errors

My question is:

Why does it ignore the encoding when it should theoretically work, and doesn't ignore it when it shouldn't work?

I would also like to know if there is any difference in the result between these commands:

chcp 1250

javac AlbanianHello.java

And these ones:

chcp 1250

javac -encoding Windows-1250 AlbanianHello.java

解决方案

Welcome to the site! The javac encoding option sets how javac will map the bytes in your source file to Unicode characters, since Java uses Unicode internally. The chcp command sets how the Windows console will map bytes of output to glyphs in a font. Java doesn't know or care about chcp, and vice versa. If both match, all is well. If not...

In your first example, Java correctly interprets your Windows-1250 source. Character ë is U+00EB. When that byte (0xEB) is output to a code-page 437 terminal, the displayed result is what byte 0xEB means in cp437, regardless of what you thought you wanted to display. Per the CP437 character table, that is lowercase delta, δ. (Just to highlight the difference, δ is U+03B4 in Unicode.)

For completeness, it turns out to be less than easy to find out what the default encoding for javac is. The docs for Charset say that:

The default charset is determined during virtual-machine startup and typically depends upon the locale and charset being used by the underlying operating system.

Based on the behaviour you saw, I am guessing javac on your system is reading the code page from the console and using that as the default. Either that, or the default is a code page in which ë = 0xEB (e.g., CP1252 or ISO 8859-1, either of which might be the default depending on your configuration (as far as I know)).

Edit On my machine, the default is CP1252 (Java charset name windows-1252). I have put the code I used on GitHub.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值