搞定后台url/uri编码问题

最新推荐文章于 2022-11-17 15:12:16 发布

joenqc

最新推荐文章于 2022-11-17 15:12:16 发布

阅读量2k

点赞数 1

分类专栏： tomcat 文章标签： tomcat url 编码乱码

本文链接：https://blog.csdn.net/joenqc/article/details/70188599

版权

tomcat 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

一般来说，浏览器客户端可以以任意字符集对url进行urlEncode，但却无法让后端服务器使用指定的编码方式，除非后端服务器拟定了某种特殊规范，比如在http header中添加类似 urlEncoding:utf-8 的请求头，然后后端服务器则按照其指定的编码方式进行解码。

以下在讨论tomcat处理uri的源码，首先贴张图，了解一下url与uri的区别：

这里写图片描述

对于编解码的影响，最大的区别就是uri不包含queryString。

在tomcat中，对uri进行解析的工作是在Connector模块的 CoyoteAdapter类中完成的，该类是一个适配器，将Endpoint模块封装的org.apache.coyote.Request req, org.apache.coyote.Response res处理后，封装成org.apache.catalina.connector.Request，org.apache.catalina.connector.Response，交给Container模块的PipeLine链进行处理。将请求交付到Container模块前，会首先对uri进行解析，以此确定将此request传递给哪个Host的哪个Context的哪个Wrapper进行处理，具体代码如下：

@Override
public void service(org.apache.coyote.Request req,
                    org.apache.coyote.Response res)
    throws Exception {

    Request request = (Request) req.getNote(ADAPTER_NOTES);
    Response response = (Response) res.getNote(ADAPTER_NOTES);
    ...
    //对uri进行解析
    postParseSuccess = postParseRequest(req, request, res, response);
    if (postParseSuccess) {
    ...
    // Calling the container
    connector.getService().getContainer().getPipeline().getFirst().invoke(request, response);
    ...

postParseRequest方法内部调用 parsePathParameters(req, request); 完成解码操作：

// Process in bytes (this is default format so this is normally a NO-OP
req.decodedURI().toBytes();

ByteChunk uriBC = req.decodedURI().getByteChunk();
int semicolon = uriBC.indexOf(';', 0);

// What encoding to use? Some platforms, eg z/os, use a default
// encoding that doesn't give the expected result so be explicit
String enc = connector.getURIEncoding();
if (enc == null) {
    enc = "ISO-8859-1";
}
Charset charset = null;
try {
    charset = B2CConverter.getCharset(enc);
} catch (UnsupportedEncodingException e1) {
    log.warn(sm.getString("coyoteAdapter.parsePathParam",
            enc));
}

可以看到，首先会从Connector中获取URIEncoding，如果获取不到，则默认使用ISO-8859-1进行解码。这就是uri传中文乱码的根源。

因此，为了指定tomcat解析uri编码的类型，可以修改tomcat源码，将其默认值改为你想要的类型（如utf-8）。但是这样显然成本太大，只限于本地debug源码调试时使用。更重要的是，对uri和querystring的解析并不是在同一个地方进行的，如果修改源码，难免有遗漏。从源码来看，首先是从Connector获取的编码方式，这暗示着其是可以配置的。查阅资料发现，可以在tomcat的server.xml中为Connector指定uri编码方式：

<Connector port="8080" protocol="HTTP/1.1"   
   maxThreads="150" connectionTimeout="20000"   
   redirectPort="8443" URIEncoding="UTF-8"/>

综上，讲解了后台tomcat指定解析uri编码的方式，但对于消息的编解码不只是一方的事情，前后台都有涉及，因此不出现乱码的唯一方式就是前后台编解码一致。

joenqc

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录