StrngEntity中文编码问题

最新推荐文章于 2023-05-29 09:59:59 发布

林大虫子

最新推荐文章于 2023-05-29 09:59:59 发布

阅读量4.9k

点赞数 3

分类专栏： JAVA 文章标签：编码

本文链接：https://blog.csdn.net/west_609/article/details/83308242

版权

JAVA 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

最近在调查系统中一个有关中文编码的问题，发现了一些坑，最终定位了是有着HTTP编码的问题。

最开始问题是这样的，我们有A/B/C三个系统，A系统发出的内容包含了中文，通过Tcpdump抓包发现B系统接收到的Http中的内容是正确的，但是B在转发给C系统之前，重新构建了一个HttpRequest，从这个新构建的Request发出来的内容就不正确了，表现为Content-Length少了。

构建新的HttpRequest的代码如下：

String body = // request body
StringEntity requestEntity = new StringEntity(body);
// Create a new http request
HttpUriRequest httpRequest = new HttpPost(...)
request.setEntity(reqeustEntity)
...

// Send Http request

问题就是上面的StringEntity constructor: new StringEntity(body)
查看这个默认的构造函数，可以看到StringEntity用是ISO_8859_1对文进行byte的读取，而不是utf-8，ISO_8859_1由于是用单字节进行编码的，无法正确地表示中文。参数：UTF-8 vs ISO_8859
StringEntity.java

    public StringEntity(String string) throws UnsupportedEncodingException {
        this(string, ContentType.DEFAULT_TEXT); //默认构建函数没有传递content-type，所以用了ContentType.DEFAULT_TEXT
    }
    
    public StringEntity(String string, ContentType contentType) throws UnsupportedCharsetException {
        Args.notNull(string, "Source string");
        Charset charset = contentType != null ? contentType.getCharset() : null;
        if (charset == null) {
            charset = HTTP.DEF_CONTENT_CHARSET;  
        }

        this.content = string.getBytes(charset);   // 用默认的Charset ISO_8859_1对中文进行读取
        if (contentType != null) {
            this.setContentType(contentType.toString());
        }
    }

    static {
        APPLICATION_ATOM_XML = create("application/atom+xml", Consts.ISO_8859_1);
        APPLICATION_FORM_URLENCODED = create("application/x-www-form-urlencoded", Consts.ISO_8859_1);
        APPLICATION_JSON = create("application/json", Consts.UTF_8);
       ...
        TEXT_PLAIN = create("text/plain", Consts.ISO_8859_1);
        TEXT_XML = create("text/xml", Consts.ISO_8859_1);
    }

改造方案，构建新的请求需要用原来HttpRequest里面的content-type以及charset:

...
StringEntity requestEntity = new StringEntity(body, ContentType.create(request.getContentType(), request.getCharacterEncoding()));
...

林大虫子

关注

3
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
StrngEntity中文编码问题

最近在调查系统中一个有关中文编码的问题，发现了一些坑，最终定位了是有着HTTP编码的问题。最开始问题是这样的，我们有A/B/C三个系统，A系统发出的内容包含了中文，通过Tcpdump抓包发现B系统接收到的Http中的内容是正确的，但是B在转发给C系统之前，重新构建了一个HttpRequest，从这个新构建的Request发出来的内容就不正确了，表现为Content-Length少了。构建新的H...
复制链接

扫一扫