Tomcat 编解码解析

很久很久以前写的关于解决Tomcat乱码的文章:
tomcat的编码设置
Servlet乱码问题解决

那时候只知道搜索解决方案,对于内部实现一概不知。终究下一次遇到还是不会解决。


Tomcat request 结构

这里写图片描述

其中org.apache.coyote.Request是应用层拿到的Request对象的底层实现,不便使用。
org.apache.catalina.connector.Request类封装了org.apache.coyote.Request类,并且实现了HttpServletRequest接口,已经具备了实际使用能力,不过它还包含了很多Catalina的方法,这些方法不应该暴露给应用层,以免引起与其他容器实现的兼容性问题。

org.apache.catalina.connector.RequestFacade类实现了HttpServletRequest接口,并在其中包含了一个org.apache.catalina.connector.Request对象,将所有HttpServletRequest接口的调用,都代理给org.apache.catalina.connector.Request对象来处理,这样就屏蔽了Catalina的相关的内部方法,使用户可以专注于servlet的标准方法。
这里写图片描述

验证:
我们在一个Servlet程序中调用request.setCharacterEncoding("UTF-8");
调用堆栈如下图所示:
这里写图片描述
实际上,我们获得的HttpServletRequset即为RequestFacade
这里写图片描述


tomcat-6.0

AttributeDescription
URIEncodingThis specifies the character encoding used to decode the URI bytes, after %xx decoding the URL. If not specified, ISO-8859-1 will be used.
useBodyEncodingForURIThis specifies if the encoding specified in contentType should be used for URI query parameters, instead of using the URIEncoding. This setting is present for compatibility with Tomcat 4.1.x, where the encoding specified in the contentType, or explicitly set using Request.setCharacterEncoding method was also used for the parameters from the URL. The default value is false.

tomcat-7.0

AttributeDescription
URIEncodingThis specifies the character encoding used to decode the URI bytes, after %xx decoding the URL. If not specified, ISO-8859-1 will be used.
useBodyEncodingForURIThis specifies if the encoding specified in contentType should be used for URI query parameters, instead of using the URIEncoding. This setting is present for compatibility with Tomcat 4.1.x, where the encoding specified in the contentType, or explicitly set using Request.setCharacterEncoding method was also used for the parameters from the URL. The default value is false. Notes: 1) This setting is applied only to the query string of a request. Unlike URIEncoding it does not affect the path portion of a request URI. 2) If request character encoding is not known (is not provided by a browser and is not set by SetCharacterEncodingFilter or a similar filter using Request.setCharacterEncoding method), the default encoding is always “ISO-8859-1”. The URIEncoding setting has no effect on this default.

tomcat-8.0

AttributeDescription
URIEncodingThis specifies the character encoding used to decode the URI bytes, after %xx decoding the URL. If not specified, UTF-8 will be used unless the org.apache.catalina.STRICT_SERVLET_COMPLIANCE system property is set to true in which case ISO-8859-1 will be used.
useBodyEncodingForURIThis specifies if the encoding specified in contentType should be used for URI query parameters, instead of using the URIEncoding. This setting is present for compatibility with Tomcat 4.1.x, where the encoding specified in the contentType, or explicitly set using Request.setCharacterEncoding method was also used for the parameters from the URL. The default value is false.Notes: 1) This setting is applied only to the query string of a request. Unlike URIEncoding it does not affect the path portion of a request URI. 2) If request character encoding is not known (is not provided by a browser and is not set by SetCharacterEncodingFilter or a similar filter using Request.setCharacterEncoding method), the default encoding is always “ISO-8859-1”. The URIEncoding setting has no effect on this default.

tomcat-9.0

AttributeDescription
URIEncodingThis specifies the character encoding used to decode the URI bytes, after %xx decoding the URL. If not specified, UTF-8 will be used unless the org.apache.catalina.STRICT_SERVLET_COMPLIANCE system property is set to true in which case ISO-8859-1 will be used.
useBodyEncodingForURIThis specifies if the encoding specified in contentType should be used for URI query parameters, instead of using the URIEncoding. This setting is present for compatibility with Tomcat 4.1.x, where the encoding specified in the contentType, or explicitly set using Request.setCharacterEncoding method was also used for the parameters from the URL. The default value is false.Notes: 1) This setting is applied only to the query string of a request. Unlike URIEncoding it does not affect the path portion of a request URI. 2) If request character encoding is not known (is not provided by a browser and is not set by SetCharacterEncodingFilter or a similar filter using Request.setCharacterEncoding method), the default encoding is always “ISO-8859-1”. The URIEncoding setting has no effect on this default.

可以看到 Tomcat 8 以后 URIEncoding 默认为 UTF-8

这里写图片描述

这里写图片描述

可以看到queryString 的默认编码即为 URIEncoding 指定的值。

这里写图片描述

同时也可以看到 useBodyEncodingForURI 默认为false。


https://zh.wikipedia.org/wiki/%E7%99%BE%E5%88%86%E5%8F%B7%E7%BC%96%E7%A0%81

这里写图片描述

这里写图片描述


Tomcat对于参数的处理方法为org.apache.catalina.connector.Request#parseParameters

protected void parseParameters() {

        // 只在第一次获取请求参数的时候解析
        parametersParsed = true;

        // 对请求参数的封装
        Parameters parameters = coyoteRequest.getParameters();
        boolean success = false;
        try {
            // Set this every time in case limit has been changed via JMX
            parameters.setLimit(getConnector().getMaxParameterCount());

            // getCharacterEncoding() may have been overridden to search for hidden form field containing request encoding
            // Http Requset Header --> Content-Type
            // 这个编码值可以通过 setCharacterEncoding 方法设置,优先级大于获取 request Content-Type
            String enc = getCharacterEncoding();

            // 是否配置了useBodyEncodingForURI
            // 默认为 false 在 Server.xml 里的 Connector 里配置
            boolean useBodyEncodingForURI = connector.getUseBodyEncodingForURI();
            if (enc != null) {
                parameters.setEncoding(enc);
                if (useBodyEncodingForURI) {
                    // 设置请求参数的编码值
                    // 如果 useBodyEncodingForURI 为 false,则默认为 UTF-8
                    parameters.setQueryStringEncoding(enc);
                }
            } else {
                // 默认的编码方式为ISO-8859-1
                parameters.setEncoding
                    (org.apache.coyote.Constants.DEFAULT_CHARACTER_ENCODING);
                if (useBodyEncodingForURI) {
                    // 使用默认编码方式解码
                    parameters.setQueryStringEncoding
                        (org.apache.coyote.Constants.DEFAULT_CHARACTER_ENCODING);
                }
            }

            // 由 Parameters 对象处理请求参数
            // 此处特指 queryString
            parameters.handleQueryParameters();
            //...
        }
}

而对于 POST 请求在请求体中传递请求参数(Content-type 必须为application/x-www-form-urlencoded) 则对应于:

parameters.processParameters(formData, 0, len);
public void processParameters( byte bytes[], int start, int len ) 
{
processParameters(bytes, start, len, getCharset(encoding));
}
/** 此处 encoding 默认为 ISO-8859-1
* 所以需要使用 request.setCharacterEncoding("UTF-8");
* 指定编码或者在后端使用new String(name.getBytes("ISO-8859-1"),"传递过来的数据的编码");进行转换
*/

测试

@Override
protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
    String name = request.getParameter("name");
    name = new String(name.getBytes("ISO-8859-1"), "UTF-8");
    System.out.println(name);
}

这里写图片描述

这里写图片描述

这里写图片描述


总结:

对于get请求:tomcat 8以后不用再去配置任何编码就可以保证不乱码
对于post请求,则依赖与Content-Type 或者request.setCharacterEncoding("UTF-8");,如果都没有指定则一定乱码

参考:
http://www.10tiao.com/html/308/201703/2650076500/1.html

http://www.jianshu.com/p/ad58e82d6117

http://justing.me/article/20160425092601

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

N3verL4nd

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值