很久很久以前写的关于解决Tomcat乱码的文章:
tomcat的编码设置
Servlet乱码问题解决
那时候只知道搜索解决方案,对于内部实现一概不知。终究下一次遇到还是不会解决。
Tomcat request 结构
其中org.apache.coyote.Request
是应用层拿到的Request
对象的底层实现,不便使用。
org.apache.catalina.connector.Request
类封装了org.apache.coyote.Request
类,并且实现了HttpServletRequest
接口,已经具备了实际使用能力,不过它还包含了很多Catalina的方法,这些方法不应该暴露给应用层,以免引起与其他容器实现的兼容性问题。
org.apache.catalina.connector.RequestFacade
类实现了HttpServletRequest
接口,并在其中包含了一个org.apache.catalina.connector.Request
对象,将所有HttpServletRequest
接口的调用,都代理给org.apache.catalina.connector.Request
对象来处理,这样就屏蔽了Catalina的相关的内部方法,使用户可以专注于servlet的标准方法。
验证:
我们在一个Servlet程序中调用request.setCharacterEncoding("UTF-8");
调用堆栈如下图所示:
实际上,我们获得的HttpServletRequset即为RequestFacade
tomcat-6.0
Attribute | Description |
---|---|
URIEncoding | This specifies the character encoding used to decode the URI bytes, after %xx decoding the URL. If not specified, ISO-8859-1 will be used. |
useBodyEncodingForURI | This specifies if the encoding specified in contentType should be used for URI query parameters, instead of using the URIEncoding. This setting is present for compatibility with Tomcat 4.1.x, where the encoding specified in the contentType, or explicitly set using Request.setCharacterEncoding method was also used for the parameters from the URL. The default value is false. |
tomcat-7.0
Attribute | Description |
---|---|
URIEncoding | This specifies the character encoding used to decode the URI bytes, after %xx decoding the URL. If not specified, ISO-8859-1 will be used. |
useBodyEncodingForURI | This specifies if the encoding specified in contentType should be used for URI query parameters, instead of using the URIEncoding. This setting is present for compatibility with Tomcat 4.1.x, where the encoding specified in the contentType, or explicitly set using Request.setCharacterEncoding method was also used for the parameters from the URL. The default value is false. Notes: 1) This setting is applied only to the query string of a request. Unlike URIEncoding it does not affect the path portion of a request URI. 2) If request character encoding is not known (is not provided by a browser and is not set by SetCharacterEncodingFilter or a similar filter using Request.setCharacterEncoding method), the default encoding is always “ISO-8859-1”. The URIEncoding setting has no effect on this default. |
tomcat-8.0
Attribute | Description |
---|---|
URIEncoding | This specifies the character encoding used to decode the URI bytes, after %xx decoding the URL. If not specified, UTF-8 will be used unless the org.apache.catalina.STRICT_SERVLET_COMPLIANCE system property is set to true in which case ISO-8859-1 will be used. |
useBodyEncodingForURI | This specifies if the encoding specified in contentType should be used for URI query parameters, instead of using the URIEncoding. This setting is present for compatibility with Tomcat 4.1.x, where the encoding specified in the contentType, or explicitly set using Request.setCharacterEncoding method was also used for the parameters from the URL. The default value is false.Notes: 1) This setting is applied only to the query string of a request. Unlike URIEncoding it does not affect the path portion of a request URI. 2) If request character encoding is not known (is not provided by a browser and is not set by SetCharacterEncodingFilter or a similar filter using Request.setCharacterEncoding method), the default encoding is always “ISO-8859-1”. The URIEncoding setting has no effect on this default. |
tomcat-9.0
Attribute | Description |
---|---|
URIEncoding | This specifies the character encoding used to decode the URI bytes, after %xx decoding the URL. If not specified, UTF-8 will be used unless the org.apache.catalina.STRICT_SERVLET_COMPLIANCE system property is set to true in which case ISO-8859-1 will be used. |
useBodyEncodingForURI | This specifies if the encoding specified in contentType should be used for URI query parameters, instead of using the URIEncoding. This setting is present for compatibility with Tomcat 4.1.x, where the encoding specified in the contentType, or explicitly set using Request.setCharacterEncoding method was also used for the parameters from the URL. The default value is false.Notes: 1) This setting is applied only to the query string of a request. Unlike URIEncoding it does not affect the path portion of a request URI. 2) If request character encoding is not known (is not provided by a browser and is not set by SetCharacterEncodingFilter or a similar filter using Request.setCharacterEncoding method), the default encoding is always “ISO-8859-1”. The URIEncoding setting has no effect on this default. |
可以看到 Tomcat 8 以后 URIEncoding 默认为 UTF-8
可以看到queryString 的默认编码即为 URIEncoding 指定的值。
同时也可以看到 useBodyEncodingForURI 默认为false。
Tomcat对于参数的处理方法为org.apache.catalina.connector.Request#parseParameters
protected void parseParameters() {
// 只在第一次获取请求参数的时候解析
parametersParsed = true;
// 对请求参数的封装
Parameters parameters = coyoteRequest.getParameters();
boolean success = false;
try {
// Set this every time in case limit has been changed via JMX
parameters.setLimit(getConnector().getMaxParameterCount());
// getCharacterEncoding() may have been overridden to search for hidden form field containing request encoding
// Http Requset Header --> Content-Type
// 这个编码值可以通过 setCharacterEncoding 方法设置,优先级大于获取 request Content-Type
String enc = getCharacterEncoding();
// 是否配置了useBodyEncodingForURI
// 默认为 false 在 Server.xml 里的 Connector 里配置
boolean useBodyEncodingForURI = connector.getUseBodyEncodingForURI();
if (enc != null) {
parameters.setEncoding(enc);
if (useBodyEncodingForURI) {
// 设置请求参数的编码值
// 如果 useBodyEncodingForURI 为 false,则默认为 UTF-8
parameters.setQueryStringEncoding(enc);
}
} else {
// 默认的编码方式为ISO-8859-1
parameters.setEncoding
(org.apache.coyote.Constants.DEFAULT_CHARACTER_ENCODING);
if (useBodyEncodingForURI) {
// 使用默认编码方式解码
parameters.setQueryStringEncoding
(org.apache.coyote.Constants.DEFAULT_CHARACTER_ENCODING);
}
}
// 由 Parameters 对象处理请求参数
// 此处特指 queryString
parameters.handleQueryParameters();
//...
}
}
而对于 POST 请求在请求体中传递请求参数(Content-type 必须为application/x-www-form-urlencoded
) 则对应于:
parameters.processParameters(formData, 0, len);
public void processParameters( byte bytes[], int start, int len )
{
processParameters(bytes, start, len, getCharset(encoding));
}
/** 此处 encoding 默认为 ISO-8859-1
* 所以需要使用 request.setCharacterEncoding("UTF-8");
* 指定编码或者在后端使用new String(name.getBytes("ISO-8859-1"),"传递过来的数据的编码");进行转换
*/
测试
@Override
protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
String name = request.getParameter("name");
name = new String(name.getBytes("ISO-8859-1"), "UTF-8");
System.out.println(name);
}
总结:
对于get请求:tomcat 8以后不用再去配置任何编码就可以保证不乱码
对于post请求,则依赖与Content-Type 或者request.setCharacterEncoding("UTF-8");
,如果都没有指定则一定乱码
参考:
http://www.10tiao.com/html/308/201703/2650076500/1.html