解决通过this.class.getResource()得到的URL中乱码的问题及源码解析

问题浮现:

获取这个文件时,打印路径,发现乱码,然后我尝试用JDK 的file.encoding 编码字符集来把path 转成字节数组,在以此字符集解码这个字节数组,发现还是乱码。(原因可以分析源码)

   String path = PoiUtil.class.getClassLoader().getResource("template/中国.txt").getPath();
        System.out.println(path);

        //尝试使用系统编码方式utf-8 来解码,还是不行  
        String encode = System.getProperties().getProperty("file.encoding");
        System.out.println(encode);
        path = new String(path.getBytes(encode),encode);
        System.out.println(path);

结果:

/D:/project01/tsms-parent/tsms-web/target/classes/template/%e4%b8%ad%e5%9b%bd.txt
UTF-8
/D:/project01/tsms-parent/tsms-web/target/classes/template/%e4%b8%ad%e5%9b%bd.txt

解决方案:

String path = PoiUtil.class.getClassLoader().getResource("template/中国.txt").getPath();
path = URLDecoder.decode(path,"utf-8");

结果:

/D:/project01/tsms-parent/tsms-web/target/classes/template/%e4%b8%ad%e5%9b%bd.txt

因为ClassLoader 的getResource 方法使用了utf-8 对路径信息进行了编码,当路径中存在中文和空格时,他会对这些字符进行转换,这样有时会出现乱码,所以在可以使用URLDecoder 的decoder方法进行解码,以便得到原始的中文及空格路径。

源码解析:

这里是  URLDecoder.decode(path,"utf-8"); 的源码  (主要是对汉字转化时出现的 %e4%b8%ad%e5%9b%bd 的这一段进行处理)

/**
* Decodes a {@code application/x-www-form-urlencoded} string using a specific
* encoding scheme.
* The supplied encoding is used to determine
* what characters are represented by any consecutive sequences of the
* form "<i>{@code %xy}</i>".
* <p>
* <em><strong>Note:</strong> The <a href=
* "http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars">
* World Wide Web Consortium Recommendation</a> states that
* UTF-8 should be used. Not doing so may introduce
* incompatibilities.</em>
*
* @param s the {@code String} to decode
* @param enc   The name of a supported
*    <a href="../lang/package-summary.html#charenc">character
*    encoding</a>.
* @return the newly decoded {@code String}
* @exception  UnsupportedEncodingException
*             If character encoding needs to be consulted, but
*             named character encoding is not supported
* @see URLEncoder#encode(java.lang.String, java.lang.String)
* @since 1.4
*/
public static String decode(String s, String enc)
    throws UnsupportedEncodingException{

    boolean needToChange = false;
    int numChars = s.length();
    StringBuffer sb = new StringBuffer(numChars > 500 ? numChars / 2 : numChars);
    int i = 0;

    if (enc.length() == 0) {
        throw new UnsupportedEncodingException ("URLDecoder: empty string enc parameter");
    }

    char c;
    byte[] bytes = null;
    while (i < numChars) {
        c = s.charAt(i);
        switch (c) {
        case '+':
            sb.append(' ');
            i++;
            needToChange = true;
            break;
        case '%':
            /*
             * Starting with this instance of %, process all
             * consecutive substrings of the form %xy. Each
             * substring %xy will yield a byte. Convert all
             * consecutive  bytes obtained this way to whatever
             * character(s) they represent in the provided
             * encoding.
             */

            try {

                // (numChars-i)/3 is an upper bound for the number
                // of remaining bytes
                if (bytes == null)
                    bytes = new byte[(numChars-i)/3];
                int pos = 0;

                while ( ((i+2) < numChars) &&
                        (c=='%')) {
                    //把从 i + 1 ~ i+3  的字符串以16进制转为一个整数
                    int v = Integer.parseInt(s.substring(i+1,i+3),16);
                    if (v < 0)
                        throw new IllegalArgumentException("URLDecoder: Illegal hex characters in escape (%) pattern - negative value");
                    bytes[pos++] = (byte) v;
                    i+= 3;
                    if (i < numChars)
                        c = s.charAt(i);
                }

                // A trailing, incomplete byte encoding such as
                // "%x" will cause an exception to be thrown

                if ((i < numChars) && (c=='%'))
                    throw new IllegalArgumentException(
                     "URLDecoder: Incomplete trailing escape (%) pattern");
         //把以十六进制转为整数的字节数组以 utf-8 解码
                sb.append(new String(bytes, 0, pos, enc));
            } catch (NumberFormatException e) {
                throw new IllegalArgumentException(
                "URLDecoder: Illegal hex characters in escape (%) pattern - "
                + e.getMessage());
            }
            needToChange = true;
            break;
        default:
            sb.append(c);
            i++;
            break;
        }
    }

    return (needToChange? sb.toString() : s);
}

源码分解:

1.将%后面的16进制数转化成10进制数

String str1 = "%e4%b8%ad";
        int i = 0;
        int j = 0;
        byte[] bb = new byte[3];
        while ((i+2< str1.length()) && (str1.charAt(i) == '%')) {
            //取出%号后面的16进制数
            String hex = str1.substring(i+1, i+ 3);
            //把16进制数转化成10进制数
            int i1 = Integer.parseInt(hex, 16);
            //把十进制数转成字节放入字节数组中,
            bb[j] = (byte) i1;
            j++;
            i+=3;
        }
        //这样字节数组中就有3个字节了,把字节数组以utf-8 解码为一个字符串
        String s11 = new String(bb, "utf-8");
        System.out.println(s11);

结果:

2.把字节数组转成字符串

byte[] ss = new byte []{(byte) 228,(byte) 184,(byte) 173};
String s = new String(ss, "utf-8");
System.out.println(s);

结果:

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

独行客-编码爱好者

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值