问题浮现:
获取这个文件时,打印路径,发现乱码,然后我尝试用JDK 的file.encoding 编码字符集来把path 转成字节数组,在以此字符集解码这个字节数组,发现还是乱码。(原因可以分析源码)
String path = PoiUtil.class.getClassLoader().getResource("template/中国.txt").getPath();
System.out.println(path);
//尝试使用系统编码方式utf-8 来解码,还是不行
String encode = System.getProperties().getProperty("file.encoding");
System.out.println(encode);
path = new String(path.getBytes(encode),encode);
System.out.println(path);
结果:
/D:/project01/tsms-parent/tsms-web/target/classes/template/%e4%b8%ad%e5%9b%bd.txt
UTF-8
/D:/project01/tsms-parent/tsms-web/target/classes/template/%e4%b8%ad%e5%9b%bd.txt
解决方案:
String path = PoiUtil.class.getClassLoader().getResource("template/中国.txt").getPath();
path = URLDecoder.decode(path,"utf-8");
结果:
/D:/project01/tsms-parent/tsms-web/target/classes/template/中国.txt
因为ClassLoader 的getResource 方法使用了utf-8 对路径信息进行了编码,当路径中存在中文和空格时,他会对这些字符进行转换,这样有时会出现乱码,所以在可以使用URLDecoder 的decoder方法进行解码,以便得到原始的中文及空格路径。
源码解析:
这里是 URLDecoder.decode(path,"utf-8"); 的源码 (主要是对汉字转化时出现的 %e4%b8%ad%e5%9b%bd 的这一段进行处理)
/**
* Decodes a {@code application/x-www-form-urlencoded} string using a specific
* encoding scheme.
* The supplied encoding is used to determine
* what characters are represented by any consecutive sequences of the
* form "<i>{@code %xy}</i>".
* <p>
* <em><strong>Note:</strong> The <a href=
* "http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars">
* World Wide Web Consortium Recommendation</a> states that
* UTF-8 should be used. Not doing so may introduce
* incompatibilities.</em>
*
* @param s the {@code String} to decode
* @param enc The name of a supported
* <a href="../lang/package-summary.html#charenc">character
* encoding</a>.
* @return the newly decoded {@code String}
* @exception UnsupportedEncodingException
* If character encoding needs to be consulted, but
* named character encoding is not supported
* @see URLEncoder#encode(java.lang.String, java.lang.String)
* @since 1.4
*/
public static String decode(String s, String enc)
throws UnsupportedEncodingException{
boolean needToChange = false;
int numChars = s.length();
StringBuffer sb = new StringBuffer(numChars > 500 ? numChars / 2 : numChars);
int i = 0;
if (enc.length() == 0) {
throw new UnsupportedEncodingException ("URLDecoder: empty string enc parameter");
}
char c;
byte[] bytes = null;
while (i < numChars) {
c = s.charAt(i);
switch (c) {
case '+':
sb.append(' ');
i++;
needToChange = true;
break;
case '%':
/*
* Starting with this instance of %, process all
* consecutive substrings of the form %xy. Each
* substring %xy will yield a byte. Convert all
* consecutive bytes obtained this way to whatever
* character(s) they represent in the provided
* encoding.
*/
try {
// (numChars-i)/3 is an upper bound for the number
// of remaining bytes
if (bytes == null)
bytes = new byte[(numChars-i)/3];
int pos = 0;
while ( ((i+2) < numChars) &&
(c=='%')) {
//把从 i + 1 ~ i+3 的字符串以16进制转为一个整数
int v = Integer.parseInt(s.substring(i+1,i+3),16);
if (v < 0)
throw new IllegalArgumentException("URLDecoder: Illegal hex characters in escape (%) pattern - negative value");
bytes[pos++] = (byte) v;
i+= 3;
if (i < numChars)
c = s.charAt(i);
}
// A trailing, incomplete byte encoding such as
// "%x" will cause an exception to be thrown
if ((i < numChars) && (c=='%'))
throw new IllegalArgumentException(
"URLDecoder: Incomplete trailing escape (%) pattern");
//把以十六进制转为整数的字节数组以 utf-8 解码
sb.append(new String(bytes, 0, pos, enc));
} catch (NumberFormatException e) {
throw new IllegalArgumentException(
"URLDecoder: Illegal hex characters in escape (%) pattern - "
+ e.getMessage());
}
needToChange = true;
break;
default:
sb.append(c);
i++;
break;
}
}
return (needToChange? sb.toString() : s);
}
源码分解:
1.提取%标记的16进制,并转为字节数组
String str1 = "%e4%b8%ad";
int i = 0;
int j = 0;
byte[] bb = new byte[3];
while ((i+2< str1.length()) && (str1.charAt(i) == '%')) {
//取出%号后面的16进制数
String hex = str1.substring(i+1, i+ 3);
//把16进制数转化成10进制数
int i1 = Integer.parseInt(hex, 16);
//把十进制数转成字节放入字节数组中,
bb[j] = (byte) i1;
j++;
i+=3;
}
//这样字节数组中就有3个字节了,把字节数组以utf-8 解码为一个字符串
String s11 = new String(bb, "utf-8");
System.out.println(s11);
结果:中
2.把字节数组转成字符串
byte[] ss = new byte []{(byte) 228,(byte) 184,(byte) 173};
String s = new String(ss, "utf-8");
System.out.println(s);
结果:中