JSON数据传输中文乱码问题 getByte()为罪魁祸首

最新推荐文章于 2021-02-13 19:25:43 发布

简烦

最新推荐文章于 2021-02-13 19:25:43 发布

阅读量3.4k

点赞数 2

分类专栏： Netty

本文链接：https://blog.csdn.net/baidu_35751704/article/details/107230799

版权

Netty 专栏收录该内容

4 篇文章 1 订阅

订阅专栏

1.出现场景：

netty客户端向服务端传输json字符串转为byte字节数组后的数据，但是服务器端接收到的后经过hexStr2Str转为10进制字符串后是乱码的json字符串，自然反序列化后的对象里的字段值也是乱码的。

2.罪魁祸首

 outData.writeBytes(data.getBytes());

data是json字符串

3.getBytes() 方法解读

/**
     * Encodes this {@code String} into a sequence of bytes using the
     * platform's default charset, storing the result into a new byte array.
     *
     * <p> The behavior of this method when this string cannot be encoded in
     * the default charset is unspecified.  The {@link
     * java.nio.charset.CharsetEncoder} class should be used when more control
     * over the encoding process is required.
     *
     * @return  The resultant byte array
     *
     * @since      JDK1.1
     */
    public byte[] getBytes() {
        return StringCoding.encode(value, 0, value.length);
    }

点进去跳到StringCoding中

static byte[] encode(char[] ca, int off, int len) {
        String csn = Charset.defaultCharset().name();
        try {
            // use charset name encode() variant which provides caching.
            return encode(csn, ca, off, len);
        } catch (UnsupportedEncodingException x) {
            warnUnsupportedCharset(csn);
        }
        try {
            return encode("ISO-8859-1", ca, off, len);
        } catch (UnsupportedEncodingException x) {
            // If this code is hit during VM initialization, MessageUtils is
            // the only way we will be able to get any kind of error message.
            MessageUtils.err("ISO-8859-1 charset not available: "
                             + x.toString());
            // If we can not find ISO-8859-1 (a required encoding) then things
            // are seriously wrong with the installation.
            System.exit(1);
            return null;
        }
    }

看这句String csn = Charset.defaultCharset().name();获取编码集，进入到Charset类

/**
     * Returns the default charset of this Java virtual machine.
     *
     * <p> The default charset is determined during virtual-machine startup and
     * typically depends upon the locale and charset of the underlying
     * operating system.
     *
     * @return  A charset object for the default charset
     *
     * @since 1.5
     */
    public static Charset defaultCharset() {
        if (defaultCharset == null) {
            synchronized (Charset.class) {
                String csn = AccessController.doPrivileged(
                    new GetPropertyAction("file.encoding"));
                Charset cs = lookup(csn);
                if (cs != null)
                    defaultCharset = cs;
                else
                    defaultCharset = forName("UTF-8");
            }
        }
        return defaultCharset;
    }

可以看到，这里获取了file.encoding属性，并通过该属性查找到了对应的Charset对象，如果找不到该属性所对应的Charset，就默认返回utf-8，说明在值合理的情况下，file.encoding属性确实决定了所谓的默认编码。
java默认编码方案主要是由-Dfile.encoding来控制，如果没有指定那么就使用系统默认编码方案，如果指定但是不存在那么使用utf-8编码。

4.原因分析

《1》先确定系统的编码集

我的客户端程序是运行在windows上

在Windows平台下，进入DOS窗口，输入：chcp

下表列出了所有支持的代码页及其国家(地区)或者语言： 
代码页       国家(地区)或语言 
437          美国 
708          阿拉伯文(ASMO 708)
720          阿拉伯文(DOS)
850          多语言(拉丁文 I) 
852          中欧(DOS) - 斯拉夫语(拉丁文 II) 
855          西里尔文(俄语) 
857          土耳其语 
860          葡萄牙语 
861          冰岛语 
862          希伯来文(DOS)
863          加拿大 - 法语 
865          日耳曼语 
866          俄语 - 西里尔文(DOS) 
869          现代希腊语
874          泰文(Windows)
932          日文(Shift-JIS)
936          中国 - 简体中文(GB2312)
949          韩文
950          繁体中文(Big5)
1200         Unicode        
1201         Unicode (Big-Endian)
1250         中欧(Windows)
1251         西里尔文(Windows)
1252         西欧(Windows)
1253         希腊文(Windows)
1254         土耳其文(Windows)
1255         希伯来文(Windows)
1256         阿拉伯文(Windows)
1257         波罗的海文(Windows)
1258         越南文(Windows)
20866        西里尔文(KOI8-R)
21866        西里尔文(KOI8-U)
28592        中欧(ISO)
28593        拉丁文 3 (ISO)
28594        波罗的海文(ISO)
28595        西里尔文(ISO)
28596        阿拉伯文(ISO)
28597        希腊文(ISO)
28598        希伯来文(ISO-Visual)
38598        希伯来文(ISO-Logical)
50000        用户定义的
50001        自动选择
50220        日文(JIS)
50221        日文(JIS-允许一个字节的片假名)
50222        日文(JIS-允许一个字节的片假名 - SO/SI)
50225        韩文(ISO)
50932        日文(自动选择)
50949        韩文(自动选择)
51932        日文(EUC)
51949        韩文(EUC)
52936        简体中文(HZ)
65000        Unicode (UTF-7)
65001        Unicode (UTF-8)

936，它对于的编码格式为GBK。

日志打印也证实了默认编码集是GBK：

2020-07-09 15:17:45 [Thread-648] INFO  c.q.q.qxzn.serviceImpl.IPCNVRDeviceServiceImpl - sendPDCResultDTO METHOD data = {"alarmInfo":"当前报警类型为：移动侦测, 报警通道个数：0, 报警通道号：","alarmTime":"2020-07-09 15:17:45","alarmType":"移动侦测","deviceIp":"192.168.0.62"}, length = 125
2020-07-09 15:17:45 [Thread-648] INFO  c.q.q.qxzn.serviceImpl.IPCNVRDeviceServiceImpl - 系统的默认编码集为：GBK
2020-07-09 15:17:45 [Thread-648] INFO  c.q.q.qxzn.serviceImpl.IPCNVRDeviceServiceImpl - data.getBytes().length = 154
2020-07-09 15:17:45 [Thread-648] INFO  c.q.q.qxzn.serviceImpl.IPCNVRDeviceServiceImpl - data.getBytes(ISO-8859-1).length = 125
2020-07-09 15:17:45 [Thread-648] INFO  c.q.q.qxzn.serviceImpl.IPCNVRDeviceServiceImpl - data.getBytes(UTF-8).length = 183

还打印了不同编码集下的获取的数组长度~~

《2》一个大坑就是

如果你是用的ide 代码编辑器里面去调试，你得到的永远是你内部jvm运行设置的编码环境，不是之后你真正部署的环境

所以你单纯在本地调试是很难发现问题的，我也是比对了服务器和本地的getByte()之后的byte数据长度，发现，我擦，还不一样长，然后嘞，本地就是好使~~ 蜜汁晕眩~~

5.解决方案

outData.writeBytes(JsonUtils.toJson(data).getBytes("UTF-8")

强制指定一个编码集，应该是以后都要养成这种习惯，不然挖坑埋自己~~

猛男落泪~~

简烦

关注

2
点赞
踩
5

收藏

觉得还不错? 一键收藏
3
评论
JSON数据传输中文乱码问题 getByte()为罪魁祸首

1.出现场景：netty客户端向服务端传输json字符串转为byte字节数组后的数据，但是服务器端接收到的后经过hexStr2Str转为10进制字符串后是乱码的json字符串，自然反序列化后的对象里的字段值也是乱码的。2.罪魁祸首 outData.writeBytes(data.getBytes());data是json字符串3.getByte() 方法解读/** * Encodes this {@code String} into a sequence of byte
复制链接

扫一扫