java控制台编码,java控制台输出的默认字符编码

How does Java determine the encoding used for System.out?

Given the following class:

import java.io.File;

import java.io.PrintWriter;

public class Foo

{

public static void main(String[] args) throws Exception

{

String s = "xxäñxx";

System.out.println(s);

PrintWriter out = new PrintWriter(new File("test.txt"), "UTF-8");

out.println(s);

out.close();

}

}

It is saved as UTF-8 and compiled with javac -encoding UTF-8 Foo.java on a Windows system.

Afterwards on a git-bash console (using UTF-8 charset) I do:

$ java Foo

xxõ±xx

$ java -Dfile.encoding=UTF-8 Foo

xxäñxx

$ cat test.txt

xxäñxx

$ java Foo | cat

xxäñxx

$ java -Dfile.encoding=UTF-8 Foo | cat

xxäñxx

What is going on here?

Obviously java checks if it is connected to a terminal and is changing its encoding in that case. Is there a way to force Java to simply output plain UTF-8?

I tried the same with the cmd console, too. Redirecting STDOUT does not seem to make any difference there. Without the file.encoding parameter it outputs ansi encoding with the parameter it outputs utf8 encoding.

解决方案

I'm assuming that your console still runs under cmd.exe. I doubt your console is really expecting UTF-8 - I expect it is really an OEM DOS encoding (e.g. 850 or 437.)

Java will encode bytes using the default encoding set during JVM initialization.

Reproducing on my PC:

java Foo

Java encodes as windows-1252; console decodes as IBM850. Result: Mojibake

java -Dfile.encoding=UTF-8 Foo

Java encodes as UTF-8; console decodes as IBM850. Result: Mojibake

cat test.txt

cat decodes file as UTF-8; cat encodes as IBM850; console decodes as IBM850.

java Foo | cat

Java encodes as windows-1252; cat decodes as windows-1252; cat encodes as IBM850; console decodes as IBM850

java -Dfile.encoding=UTF-8 Foo | cat

Java encodes as UTF-8; cat decodes as UTF-8; cat encodes as IBM850; console decodes as IBM850

This implementation of cat must use heuristics to determine if the character data is UTF-8 or not, then transcodes the data from either UTF-8 or ANSI (e.g. windows-1252) to the console encoding (e.g. IBM850.)

This can be confirmed with the following commands:

$ java HexDump utf8.txt

78 78 c3 a4 c3 b1 78 78

$ cat utf8.txt

xxäñxx

$ java HexDump ansi.txt

78 78 e4 f1 78 78

$ cat ansi.txt

xxäñxx

The cat command can make this determination because e4 f1 is not a valid UTF-8 sequence.

You can correct the Java output by:

Using the Console type

Using some shiv layer as you are doing with cat

HexDump is a trivial Java application:

import java.io.*;

class HexDump {

public static void main(String[] args) throws IOException {

try (InputStream in = new FileInputStream(args[0])) {

int r;

while((r = in.read()) != -1) {

System.out.format("%02x ", 0xFF & r);

}

System.out.println();

}

}

}

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值