JAVA服务端的解码

最新推荐文章于 2024-05-05 01:45:37 发布

置顶 hongxingxiaonan

最新推荐文章于 2024-05-05 01:45:37 发布

阅读量5.3k

点赞数

分类专栏： http Java 文章标签： java 乱码编码

本文链接：https://blog.csdn.net/hongxingxiaonan/article/details/49978611

版权

Java 同时被 2 个专栏收录

17 篇文章 0 订阅

订阅专栏

http

3 篇文章 0 订阅

订阅专栏

JAVA服务端的解码

引起乱码问题的缘由是编码与解码方法的不对称，为了能和客户端正确的交互，需要了解下服务端是何时，何处以及如何对内容进行编解码的。按照请求处理的过程，我们先来一一分析一下Java服务端对HTTP内容的解码过程。

一，URI的解码

在tomcat中，负责对URI解码的是org.apache.catalina.connector.CoyoteAdapter的convertURI方法。

    protected void convertURI(MessageBytes uri, org.apache.catalina.connector.Request request) throws Exception {
        ByteChunk bc = uri.getByteChunk();
        int length = bc.getLength();
        CharChunk cc = uri.getCharChunk();
        cc.allocate(length, -1);
        String enc = this.connector.getURIEncoding();
        if(enc != null) {
            B2CConverter bbuf = request.getURIConverter();

            try {
                if(bbuf == null) {
                    bbuf = new B2CConverter(enc);
                    request.setURIConverter(bbuf);
                }
            } catch (IOException var11) {
                log.error("Invalid URI encoding; using HTTP default");
                this.connector.setURIEncoding((String)null);
            }

            if(bbuf != null) {
                try {
                    bbuf.convert(bc, cc);
                    uri.setChars(cc.getBuffer(), cc.getStart(), cc.getLength());
                    return;
                } catch (IOException var12) {
                    log.error("Invalid URI character encoding; trying ascii");
                    cc.recycle();
                }
            }
        }

        byte[] var13 = bc.getBuffer();
        char[] cbuf = cc.getBuffer();
        int start = bc.getStart();

        for(int i = 0; i < length; ++i) {
            cbuf[i] = (char)(var13[i + start] & 255);
        }

        uri.setChars(cbuf, 0, length);
    }

这句代码 String enc = this.connector.getURIEncoding()获取到了connector所设置的编码，如果取到了设置的编码则用B2CConverter的convert，将byte数组转换成char数组。若没有设置编码则用默认的解码方案，即认为它是 ISO-8859-1 编码的。设置URI的编码是在servlet.xml配置文件的Connector中:

二，查询参数的解码

参数的解码发生在第一次获取参数的时候，即调用HttpServletRequest的getParameter 、getParameterMap 、getParameterNames 和getParameterValues方法。tomcat提供给servlet的HttpServletRequest的实现是Catalina的org.apache.catalina.connector.RequestFacade。这个外观Request持有一个Catalina内部的org.apache.catalina.connector.Request对象，获取查询字符串的操作会直接委托给内部的Request。看看这个Request得getParameter方法：

    public String getParameter(String name) {
        if(!this.parametersParsed) {
            this.parseParameters();
        }

        return this.coyoteRequest.getParameters().getParameter(name);
    }

若目前还没对参数进行解析，则会通过parseParameters方法，解析出所有的参数值。见如下代码：

    protected void parseParameters() {
        this.parametersParsed = true;
        Parameters parameters = this.coyoteRequest.getParameters();
        String enc = this.getCharacterEncoding();
        boolean useBodyEncodingForURI = this.connector.getUseBodyEncodingForURI();
        if(enc != null) {
            parameters.setEncoding(enc);
            if(useBodyEncodingForURI) {
                parameters.setQueryStringEncoding(enc);
            }
        } else {
            parameters.setEncoding("ISO-8859-1");
            if(useBodyEncodingForURI) {
                parameters.setQueryStringEncoding("ISO-8859-1");
            }
        }

        parameters.handleQueryParameters();
        if(!this.usingInputStream && !this.usingReader) {
            if(this.getMethod().equalsIgnoreCase("POST")) {
                String contentType = this.getContentType();
                if(contentType == null) {
                    contentType = "";
                }

                int semicolon = contentType.indexOf(59);
                if(semicolon >= 0) {
                    contentType = contentType.substring(0, semicolon).trim();
                } else {
                    contentType = contentType.trim();
                }

                if("application/x-www-form-urlencoded".equals(contentType)) {
                    int len = this.getContentLength();
                    if(len > 0) {
                        int formData = this.connector.getMaxPostSize();
                        if(formData > 0 && len > formData) {
                            if(this.context.getLogger().isDebugEnabled()) {
                                this.context.getLogger().debug(sm.getString("coyoteRequest.postTooLarge"));
                            }

                            return;
                        }

                        Object e = null;
                        byte[] e1;
                        if(len < CACHED_POST_LEN) {
                            if(this.postData == null) {
                                this.postData = new byte[CACHED_POST_LEN];
                            }

                            e1 = this.postData;
                        } else {
                            e1 = new byte[len];
                        }

                        try {
                            if(this.readPostBody(e1, len) != len) {
                                return;
                            }
                        } catch (IOException var11) {
                            if(this.context.getLogger().isDebugEnabled()) {
                                this.context.getLogger().debug(sm.getString("coyoteRequest.parseParameters"), var11);
                            }

                            return;
                        }

                        parameters.processParameters(e1, 0, len);
                    } else if("chunked".equalsIgnoreCase(this.coyoteRequest.getHeader("transfer-encoding"))) {
                        Object formData1 = null;

                        byte[] formData2;
                        try {
                            formData2 = this.readChunkedPostBody();
                        } catch (IOException var10) {
                            if(this.context.getLogger().isDebugEnabled()) {
                                this.context.getLogger().debug(sm.getString("coyoteRequest.parseParameters"), var10);
                            }

                            return;
                        }

                        if(formData2 != null) {
                            parameters.processParameters(formData2, 0, formData2.length);
                        }
                    }

                }
            }
        }
    }

下面具体分析下参数的解析的过程。首先，获得charEncoding，得到 charEncoding的规则就是：若已经显示指定（通过Request的api）则使用指定的值，若未指定则尝试获取ContentType头部中指定的字符集，都获取不到返回null。见Coyote中Request的getCharacterEncoding方法

  public String getCharacterEncoding() {
        if(this.charEncoding != null) {
            return this.charEncoding;
        } else {
            this.charEncoding = ContentType.getCharsetFromContentType(this.getContentType());
            return this.charEncoding;
        }
    }

然后，若 charEncoding不为null则将charEncoding设置到Parameters的encoding中。否则将Parameters的encoding设置为默认编码ISO-8859-1。若声明了useBodyEncodingForURI为true，则将Parameters的queryStringEncoding也设置为与encoding相同的编码。useBodyEncodingForURI的使用同样是在Connector配置中：

<Connector URIEncoding=”UTF-8” useBodyEncodingForURI=”true”/>
接下来就是利用Parameters处理请求行中的查询字符串

    public void handleQueryParameters() {
        if(!this.didQueryParameters) {
            this.didQueryParameters = true;
            if(this.queryMB != null && !this.queryMB.isNull()) {
                if(debug > 0) {
                    this.log("Decoding query " + this.decodedQuery + " " + this.queryStringEncoding);
                }

                try {
                    this.decodedQuery.duplicate(this.queryMB);
                } catch (IOException var2) {
                    var2.printStackTrace();
                }

                this.processParameters(this.decodedQuery, this.queryStringEncoding);
            }
        }
    }

有上面的方法可以看出，在解码请求行的查询字符串时使用的编码是queryStringEncoding。若没有设置queryStringEncoding的值，则使用ISO-8859-1，具体的解码过程在Parameters的urlDecode方法中。

    private String urlDecode(ByteChunk bc, String enc) throws IOException {
        if(this.urlDec == null) {
            this.urlDec = new UDecoder();
        }

        this.urlDec.convert(bc);
        String result = null;
        if(enc != null) {
            bc.setEncoding(enc);
            result = bc.toString();
        } else {
            CharChunk cc = this.tmpNameC;
            int length = bc.getLength();
            cc.allocate(length, -1);
            byte[] bbuf = bc.getBuffer();
            char[] cbuf = cc.getBuffer();
            int start = bc.getStart();

            for(int i = 0; i < length; ++i) {
                cbuf[i] = (char)(bbuf[i + start] & 255);
            }

            cc.setChars(cbuf, 0, length);
            result = cc.toString();
            cc.recycle();
        }

        return result;
    }

解析出来的参数就放在Parameters中，此时只处理完了请求行中的查询参数。对于GET方法只处理请求行就已经足够了，然而POST方法的查询参数还可能出现在请求体中，所以还要进一步的解析。所以接下来检查Content-Type是否为 "application/x-www-form-urlencoded"，只有 "application/x-www-form-urlencoded"类型的POST才会到请求体中解析查询参数。从请求体中读取到相关后使用Parameters的下面方法解析参数

    public void processParameters(byte[] bytes, int start, int len) {
        this.processParameters(bytes, start, len, this.encoding);
    }

可以看到，从请求体中解析查询参数使用的是encoding的编码，同样如果没有设置的话还是采用 ISO-8859-1。经过parseParameters处理过之后，所有的请求参数都已经包含在Parameters中，查询参数来自于请求行和请求体，同名的参数值会作为数组保存起来。然后可以利用参数的名字通过 Parameters的getParameter方法获取参数的值：

    public String getParameter(String name) {
        String[] values = this.getParameterValues(name);
        return values != null?(values.length == 0?"":values[0]):null;
    }

这个方法虽然只有两行，但是揭示了一个我们用Request的getParameter获取参数的一条重要规则。通常我们在传递参数的时候都是一个key对应一个value，当需要传递数组的时候会使用名字相同的key传递两次。如查询字符串为a=1&a=2，我们在服务端就可以通过getParameterValues方法获取到一个数组，数组的第一个和第二个元素分别为1和2，另外也可以通过getParameter方法获取第一个参数。

通过上面的分析可知，参数解析的编码是可以指定的。通常我们得做法是在Filter中设置查询字符串的编码，如：

    <filter>
        <filter-name>characterEncodingFilter</filter-name>
        <filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
        <init-param>
            <param-name>encoding</param-name>
            <param-value>UTF-8</param-value>
        </init-param>
        <init-param>
            <param-name>forceEncoding</param-name>
            <param-value>true</param-value>
        </init-param>
    </filter>

若再与URIEncoding、useBodyEncodingForURI配置参数结合，则可以完全控制服务端的URI与参数解码。

    private String urlDecode(ByteChunk bc, String enc) throws IOException {
        if(this.urlDec == null) {
            this.urlDec = new UDecoder();
        }

        this.urlDec.convert(bc);
        String result = null;
        if(enc != null) {
            bc.setEncoding(enc);
            result = bc.toString();
        } else {
            CharChunk cc = this.tmpNameC;
            int length = bc.getLength();
            cc.allocate(length, -1);
            byte[] bbuf = bc.getBuffer();
            char[] cbuf = cc.getBuffer();
            int start = bc.getStart();

            for(int i = 0; i < length; ++i) {
                cbuf[i] = (char)(bbuf[i + start] & 255);
            }

            cc.setChars(cbuf, 0, length);
            result = cc.toString();
            cc.recycle();
        }

        return result;
    }

hongxingxiaonan

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
JAVA服务端的解码

JAVA服务端的解码引起乱码问题的缘由是编码与解码方法的不对称，为了能和客户端正确的交互，需要了解下服务端是何时，何处以及如何对内容进行编解码的。按照请求处理的过程，我们先来一一分析一下Java服务端对HTTP内容的解码过程。一，URI的解码在tomcat中，负责对URI解码的是org.apache.catalina.connector.CoyoteAdapter的co
复制链接

扫一扫