中文字符用java.lang.String转码的小结（Java/Scala）

最新推荐文章于 2023-05-08 22:09:12 发布

碣石观海

最新推荐文章于 2023-05-08 22:09:12 发布

阅读量1.5k

点赞数

分类专栏： Java/Scala

本文链接：https://blog.csdn.net/weixin_39469127/article/details/95355025

版权

Java/Scala 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

先给出最保险的转码操作，既无视平台编码，也无视字符编码：

/** 保证接收到的字符串转为 UTF-8 格式
 *    以 UTF-8 格式编码，再以 UTF-8 格式解码
 */
val strUTF8 = new String(strGBK.getBytes("UTF-8"), "UTF-8")

1. 对字符串的编解码使用了如下四个方法（ java.lang.String ），还有其它的几个方法差不多，这里不说了：
   | getBytes(charsetName) ：按指定字符编码格式将字符串编码为字节数组；
   | getBytes() ：按平台默认字符编码格式将字符串编码为字节数组；
   | String(bytes, offset, length, charsetName)：按指定字符编码格式将字节数组解码为字符串，并指定数组起始；
   | String(bytes, charsetName)：按指定字符编码格式将字节数组解码为字符串，按字节数组的默认起始；

    /**
     * @param  charsetName
     *         The name of a supported {@linkplain java.nio.charset.Charset
     *         charset}
     *
     * @return  The resultant byte array
     */
    public byte[] getBytes(String charsetName);

    /* @param  bytes
     *         The bytes to be decoded into characters
     *
     * @param  offset
     *         The index of the first byte to decode
     *
     * @param  length
     *         The number of bytes to decode

     * @param  charsetName
     *         The name of a supported {@linkplain java.nio.charset.Charset
     *         charset}
     */
    public String(byte bytes[], int offset, int length, String charsetName);

    /**
     * @return  The resultant byte array
     */
    public byte[] getBytes();

    /**
     * @param  bytes
     *         The bytes to be decoded into characters
     *
     * @param  charsetName
     *         The name of a supported {@linkplain java.nio.charset.Charset
     *         charset}
     */
    public String(byte bytes[], String charsetName);

// 1. 使用默认的字节数组长度
val strUTF8 = new String(strGBK.getBytes("UTF-8"), "UTF-8")
// 2. 或者 指定转为UTF-8的字节长度
//    这种方式如果指定的字节数组小于UTF-8编码后的字节数组长度，最后几个中文字符会出现乱码
val strUTF8 = new String(strGBK.getBytes("UTF-8"), 0, strGBK.length()*3, "UTF-8")
// 3. （推荐）使用 UTF-8 编解码格式
val strUTF8 = new String(strGBK.getBytes("UTF-8"), "UTF-8")

3. 完整测试代码（scala/java）：

object a extends App {
  
  testUTF8ToGBK
  testGBKToUTF8
  
  def testUTF8ToGBK = {
    println("-------------------[Test UTF8 To GBK]-------------------------")
    val strBytes = new String("中文").getBytes("UTF-8")
    println("strBytes: " + strBytes.mkString(" "))
    
    val strUTF8 = new String(strBytes, "UTF-8")
    println("strUTF8 Bytes: " + strUTF8.getBytes("UTF-8").mkString(" "))
    
    // 使用默认的字节数组长度
    val strGBK = new String(strUTF8.getBytes("GBK"), "GBK")
//    // 或者 指定转为GBK的字节长度
//    val strGBK = new String(strUTF8.getBytes("GBK"), 0, strUTF8.length()*2, "GBK")
    println("strGBK Bytes: " + strGBK.getBytes("GBK").mkString(" "))
    
    println("strUTF8: " + strUTF8)
    println("strGBK: " + strGBK)
  }
  
  def testGBKToUTF8 = {
    println("-------------------[Test GBK To UTF8]-------------------------")
    val strBytes = new String("中文").getBytes("GBK")
    println("strBytes: " + strBytes.mkString(" "))
    
    val strGBK = new String(strBytes, "GBK")
    println("strGBK Bytes: " + strGBK.getBytes("GBK").mkString(" "))
    
//    // 1. 使用默认的字节数组长度
//    val strUTF8 = new String(strGBK.getBytes("UTF-8"), "UTF-8")
//    // 2. 或者 指定转为UTF-8的字节长度
//    //    这种方式如果指定的字节数组小于UTF-8编码后的字节数组长度，会出现乱码
//    val strUTF8 = new String(strGBK.getBytes("UTF-8"), 0, strGBK.length()*3, "UTF-8")
    // 3. （推荐）使用 UTF-8 编解码格式
    val strUTF8 = new String(strGBK.getBytes("UTF-8"), "UTF-8")
    println("strUTF8 Bytes: " + strUTF8.getBytes("UTF-8").mkString(" "))
    
    println("strGBK: " + strGBK)
    println("strUTF8: " + strUTF8)
  }
  
}

-------------------[Test UTF8 To GBK]-------------------------
strBytes: -28 -72 -83 -26 -106 -121
strUTF8 Bytes: -28 -72 -83 -26 -106 -121
strGBK Bytes: -42 -48 -50 -60
strUTF8: 中文
strGBK: 中文
-------------------[Test GBK To UTF8]-------------------------
strBytes: -42 -48 -50 -60
strGBK Bytes: -42 -48 -50 -60
strUTF8 Bytes: -28 -72 -83 -26 -106 -121
strGBK: 中文
strUTF8: 中文

碣石观海

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
中文字符用java.lang.String转码的小结（Java/Scala）

先给出最保险的转码操作，既无视平台编码，也无视字符编码：/** 保证接收到的字符串转为 UTF-8 格式 * 以 UTF-8 格式编码，再以 UTF-8 格式解码 */val strUTF8 = new String(strGBK.getBytes("UTF-8"), "UTF-8")1. 对字符串的编解码使用了如下四个方法（ java.lang.String ），还有其它的...
复制链接

扫一扫

专栏目录