java中字符集兼容问题,java中的文件名字符集问题

Trying to open a file it states it cannot be found, due to a charset mismatch, when file names have accents.

I work using UTF-8 on a linux system (/etc/locales sets UTF-8 as well). Running jboss with -Dfile.encoding=UTF-8 and environment variable JBOSS_ENCODING="UTF-8"

With a JSP I am getting the name of the file :

String fileName = element.getChildText("FileName");

out.println("File to be opened : " + filename);

Displays :

File to be opened : aaaaaà.txt

But, a new File(fileName) won't work. Just file.exists() is false.

Trying to:

File[] files = dir.listFiles();

for (int i=0; i

out.println(fileName);

I get : aaaaaà .txt

Why is it reading and trying to open the file taking of the file in HDD as ISO-8859-1?

Is it a JBoss config? A java config? How can I force java.io.File to read the file using the UTF-8 as the charset of the file name?

I've used other tools and the name is always read fine, using UTF-8.

(note I'm always talking about the name of the file, never the content, it could be a void file)

解决方案

I am trying to track down the problem. Here is what I already have:

There is Exists.java:

import java.io.*;

public class Exists {

public static void main(String[] args) {

new File("aaa").exists();

new File("aaa\u00E4").exists();

new File("aaa\u00C3\u00A4").exists();

}

}

And there is java -version:

java version "1.6.0_20"

Java(TM) SE Runtime Environment (build 1.6.0_20-b02)

Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)

Now to the interesting part:

$ strace -f -o strace.out java Exists && grep 'stat("aaa' strace.out

31942 stat("aaa", 0x41464950) = -1 ENOENT (No such file or directory)

31942 stat("aaa\303\244", 0x41464950) = -1 ENOENT (No such file or directory)

31942 stat("aaa\303\203\302\244", 0x41464950) = -1 ENOENT (No such file or directory)

The nice thing is that strace works on byte-level, not character-level like Java. So everything is ok in this case. I have the environment variable LANG set to en_US.UTF-8, all of the LC_* variables are unset.

Now tracking down the problem to a minimal working example:

$ strace -f -o strace.out env - LC_ALL=en_US.UTF-8 /home/roland/bin/java Exists && grep 'stat("aaa' strace.out

31968 stat("aaa", 0x41a75950) = -1 ENOENT (No such file or directory)

31968 stat("aaa\303\244", 0x41a75950) = -1 ENOENT (No such file or directory)

31968 stat("aaa\303\203\302\244", 0x41a75950) = -1 ENOENT (No such file or directory)

That still works. So let's try another encoding:

$ strace -f -o strace.out env - LANG=en_US.ISO-8859-1 /home/roland/bin/java Exists && grep 'stat("aaa' strace.out

32070 stat("aaa", 0x407a3950) = -1 ENOENT (No such file or directory)

32070 stat("aaa?", 0x407a3950) = -1 ENOENT (No such file or directory)

32070 stat("aaa??", 0x407a3950) = -1 ENOENT (No such file or directory)

So this doesn't work. One possible reason might be that I selected a locale that is not in the list printed by locale -a. But this shouldn't be the reason for Java to convert the letters to question marks.

As soon as LANG points to a non-existing locale, the setting of the sun.jnu.encoding property doesn't have any effect anymore. So I'm out of ideas now.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值