我们知道网络传输的,都是二进制字节流,那么服务器如何编码,怎么知道哪个字符集进行编码呢,那我们深入分析下tomcat连接,仔细探讨下。
接下来,我们看一下段代码,这是一个很简单的表单。
<form action="demo01?name=中国" method="post">
<input type="text" name="name1" value="张三"/>
<input type="submit" value="提交"/>
</form>
controller中,我们直接用 HttpServletRequest,不用spring获取参数。
@RequestMapping(value = "/demo01", method = RequestMethod.GET)
public String dologin1(HttpServletRequest request) throws UnsupportedEncodingException {
log.info(request.getCharacterEncoding());
log.info("name:中国" + request.getParameter("name"));
log.info("name1:张三" + request.getParameter("name1"));
return "login";
}
运行tomcat,结果如下,中文乱码:
我们用fiddler查看请求的详情:
我们来经过测试下:
@Test
public void test() throws UnsupportedEncodingException {
String str = "中国";
byte[] bytes = str.getBytes("utf-8");
System.out.println(Hex.encodeHex(bytes));
System.out.println(new String(bytes, "iso8859-1"));
String str1 = "张三";
byte[] bytes1 = str1.getBytes("utf-8");
System.out.println(Hex.encodeHex(bytes1));
System.out.println(new String(bytes1, "iso8859-1"));
}
打印如下:
e4b8ade59bbd
iso8859-1编码: ä¸å›½
e5bca0e4b889
å¼ ä¸‰
由此,可以发现,我使用的谷歌浏览器,默认使用的中文编码为utf-8,而tomcat编码默认的是iso8859-1编码,由于编码对应的字符不同,所以造成乱码。
既然有编码问题,那么肯定可以解决,查看tomcat手册
发现tomcat连接器可以指定uri编码,参数URIEncoding:This specifies the character encoding used to decode the URI bytes, after %xx decoding the URL. If not specified, ISO-8859-1 will be used.
在server.xml中配置如下:
<Connector connectionTimeout="20000" port="8080" protocol="HTTP/1.1" URIEncoding="utf-8" redirectPort="8443"/>
此时运行tomcat,uri参数问题解决,结果如下:
那请求体参数如何进行编码呢?我们查看servelt源码发现,请求体的编码可以在获取参数前进行设置,由此猜想,tomcat解析请求体参数是在第一次使用时进行解析,也不难理解,字符串解析是耗性能的,既然不需要使用,那么不用解析,同样就不用消耗这部分性能。
/**
* Overrides the name of the character encoding used in the body of this
* request. This method must be called prior to reading request parameters
* or reading input using getReader(). Otherwise, it has no effect.
*
* @param env <code>String</code> containing the name of
* the character encoding.
* @throws UnsupportedEncodingException if this
* ServletRequest is still in a state where a
* character encoding may be set, but the specified
* encoding is invalid
*/
public void setCharacterEncoding(String env) throws UnsupportedEncodingException;
改变controller代码,增加utf-8编码:
@RequestMapping(value = "/demo01", method = RequestMethod.POST)
public String dologin(HttpServletRequest request) throws UnsupportedEncodingException {
request.setCharacterEncoding("utf-8");
log.info(request.getCharacterEncoding());
log.info("name:中国" + request.getParameter("name"));
log.info("name1:张三" + request.getParameter("name1"));
return "login";
}
运行tomcat,发现编码问题完美解决:
难道每次获取参数前都要设置编码吗?肯定有更省事的方式,那就是过滤器,且我们可以直接用spring提供的现成的,org.springframework.web.filter.CharacterEncodingFilter,查看其代码:
@Override
protected void doFilterInternal(
HttpServletRequest request, HttpServletResponse response, FilterChain filterChain)
throws ServletException, IOException {
if (this.encoding != null && (this.forceEncoding || request.getCharacterEncoding() == null)) {
request.setCharacterEncoding(this.encoding);
if (this.forceEncoding) {
response.setCharacterEncoding(this.encoding);
}
}
filterChain.doFilter(request, response);
}
发现也就是设置request编码而已,没什么神秘的,不过既然有现成的,我们何必再造轮子 呢。
没有从tomcat源码中分析出问题有些遗憾,查看了tomcat部分源码,也没得要领,只能说明功力还不够,需要继续精进,不过合理的推导也不失为解决问题的一种好办法。