解决通过this.class.getResource()得到的URL中乱码的问题及源码解析

最新推荐文章于 2021-12-03 14:13:56 发布

独行客-编码爱好者

最新推荐文章于 2021-12-03 14:13:56 发布

阅读量744

点赞数

分类专栏： java 基础知识

本文链接：https://blog.csdn.net/donkeyboy001/article/details/119547874

版权

java 基础知识专栏收录该内容

39 篇文章 0 订阅

订阅专栏

问题浮现：

获取这个文件时，打印路径，发现乱码，然后我尝试用JDK 的file.encoding 编码字符集来把path 转成字节数组，在以此字符集解码这个字节数组，发现还是乱码。（原因可以分析源码）

   String path = PoiUtil.class.getClassLoader().getResource("template/中国.txt").getPath();
        System.out.println(path);

        //尝试使用系统编码方式utf-8 来解码，还是不行  
        String encode = System.getProperties().getProperty("file.encoding");
        System.out.println(encode);
        path = new String(path.getBytes(encode),encode);
        System.out.println(path);

结果：

/D:/project01/tsms-parent/tsms-web/target/classes/template/%e4%b8%ad%e5%9b%bd.txt
UTF-8
/D:/project01/tsms-parent/tsms-web/target/classes/template/%e4%b8%ad%e5%9b%bd.txt

解决方案：

String path = PoiUtil.class.getClassLoader().getResource("template/中国.txt").getPath();
path = URLDecoder.decode(path,"utf-8");

结果：

/D:/project01/tsms-parent/tsms-web/target/classes/template/%e4%b8%ad%e5%9b%bd.txt

因为ClassLoader 的getResource 方法使用了utf-8 对路径信息进行了编码，当路径中存在中文和空格时，他会对这些字符进行转换，这样有时会出现乱码，所以在可以使用URLDecoder 的decoder方法进行解码，以便得到原始的中文及空格路径。

源码解析：

这里是 URLDecoder.decode(path,"utf-8"); 的源码（主要是对汉字转化时出现的 %e4%b8%ad%e5%9b%bd 的这一段进行处理）

/**
* Decodes a {@code application/x-www-form-urlencoded} string using a specific
* encoding scheme.
* The supplied encoding is used to determine
* what characters are represented by any consecutive sequences of the
* form "<i>{@code %xy}</i>".
* <p>
* <em><strong>Note:</strong> The <a href=
* "http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars">
* World Wide Web Consortium Recommendation</a> states that
* UTF-8 should be used. Not doing so may introduce
* incompatibilities.</em>
*
* @param s the {@code String} to decode
* @param enc   The name of a supported
*    <a href="../lang/package-summary.html#charenc">character
*    encoding</a>.
* @return the newly decoded {@code String}
* @exception  UnsupportedEncodingException
*             If character encoding needs to be consulted, but
*             named character encoding is not supported
* @see URLEncoder#encode(java.lang.String, java.lang.String)
* @since 1.4
*/
public static String decode(String s, String enc)
    throws UnsupportedEncodingException{

    boolean needToChange = false;
    int numChars = s.length();
    StringBuffer sb = new StringBuffer(numChars > 500 ? numChars / 2 : numChars);
    int i = 0;

    if (enc.length() == 0) {
        throw new UnsupportedEncodingException ("URLDecoder: empty string enc parameter");
    }

    char c;
    byte[] bytes = null;
    while (i < numChars) {
        c = s.charAt(i);
        switch (c) {
        case '+':
            sb.append(' ');
            i++;
            needToChange = true;
            break;
        case '%':
            /*
             * Starting with this instance of %, process all
             * consecutive substrings of the form %xy. Each
             * substring %xy will yield a byte. Convert all
             * consecutive  bytes obtained this way to whatever
             * character(s) they represent in the provided
             * encoding.
             */

            try {

                // (numChars-i)/3 is an upper bound for the number
                // of remaining bytes
                if (bytes == null)
                    bytes = new byte[(numChars-i)/3];
                int pos = 0;

                while ( ((i+2) < numChars) &&
                        (c=='%')) {
                    //把从 i + 1 ~ i+3  的字符串以16进制转为一个整数
                    int v = Integer.parseInt(s.substring(i+1,i+3),16);
                    if (v < 0)
                        throw new IllegalArgumentException("URLDecoder: Illegal hex characters in escape (%) pattern - negative value");
                    bytes[pos++] = (byte) v;
                    i+= 3;
                    if (i < numChars)
                        c = s.charAt(i);
                }

                // A trailing, incomplete byte encoding such as
                // "%x" will cause an exception to be thrown

                if ((i < numChars) && (c=='%'))
                    throw new IllegalArgumentException(
                     "URLDecoder: Incomplete trailing escape (%) pattern");
         //把以十六进制转为整数的字节数组以 utf-8 解码
                sb.append(new String(bytes, 0, pos, enc));
            } catch (NumberFormatException e) {
                throw new IllegalArgumentException(
                "URLDecoder: Illegal hex characters in escape (%) pattern - "
                + e.getMessage());
            }
            needToChange = true;
            break;
        default:
            sb.append(c);
            i++;
            break;
        }
    }

    return (needToChange? sb.toString() : s);
}

源码分解：

1.将%后面的16进制数转化成10进制数

String str1 = "%e4%b8%ad";
        int i = 0;
        int j = 0;
        byte[] bb = new byte[3];
        while ((i+2< str1.length()) && (str1.charAt(i) == '%')) {
            //取出%号后面的16进制数
            String hex = str1.substring(i+1, i+ 3);
            //把16进制数转化成10进制数
            int i1 = Integer.parseInt(hex, 16);
            //把十进制数转成字节放入字节数组中，
            bb[j] = (byte) i1;
            j++;
            i+=3;
        }
        //这样字节数组中就有3个字节了，把字节数组以utf-8 解码为一个字符串
        String s11 = new String(bb, "utf-8");
        System.out.println(s11);

结果：

中

2.把字节数组转成字符串

byte[] ss = new byte []{(byte) 228,(byte) 184,(byte) 173};
String s = new String(ss, "utf-8");
System.out.println(s);

结果：

中

独行客-编码爱好者

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
解决通过this.class.getResource()得到的URL中乱码的问题及源码解析

问题浮现：获取这个文件时，打印路径，发现乱码，然后我尝试用JDK的file.encoding编码字符集来把path转成字节数组，在以此字符集解码这个字节数组，发现还是乱码。（原因可以分析源码） String path = PoiUtil.class.getClassLoader().getResource("template/中国.txt").getPath(); System.out.println(path); //尝试使用系统编码方式utf-8 来...
复制链接

扫一扫