Java的char和Unicode

最新推荐文章于 2024-07-24 18:18:36 发布

daelly

最新推荐文章于 2024-07-24 18:18:36 发布

阅读量169

点赞数

文章标签： java 开发语言

本文链接：https://blog.csdn.net/daelly/article/details/129667095

版权

1、Java中char是两个字节的unsigned short，大小范围是0-65536。


    /**
     * The constant value of this field is the smallest value of type
     * {@code char}, {@code '\u005Cu0000'}.
     *
     * @since   1.0.2
     */
    public static final char MIN_VALUE = '\u0000';

    /**
     * The constant value of this field is the largest value of type
     * {@code char}, {@code '\u005CuFFFF'}.
     *
     * @since   1.0.2
     */
    public static final char MAX_VALUE = '\uFFFF';

之所以是两个字节，不是像c一样单字节，大多是历史原因。可以参考：Java中关于Char存储中文到底是2个字节还是3个还是4个？ - 知乎

2、non-BMP字符在Java中的使用

    /**
     * 输出
     * 龘
     * 知乎-发现更大的世界，😂
     * 30693 20046 45 21457 29616 26356 22823 30340 19990 30028 65292 128514
     * code unit size:13
     * code point size:12
     */
    @Test
    public void testChar1() {
        char ch = '龘';
        System.out.println(ch);

//        for (int i = 0; i <= Integer.MAX_VALUE; i++) {
//            char c = (char) i;
//            System.out.print(c);
//            System.out.println("=" + i);
//        }

        //对non-MBP的字符，使用两个char来保存
        final String content = "知乎-发现更大的世界，\uD83D\uDE02";
        for (int i = 0; i < content.length(); i++) {
            char c = content.charAt(i);
            System.out.print(c);
        }

        System.out.println();

        for (int j = 0; j < content.codePointCount(0, content.length()); j++) {
            int value = content.codePointAt(j);
            System.out.print(value + " ");
        }

        System.out.println();


        //char 数组的大小是13个
        System.out.println("code unit size:" + content.length());
        //表示的字符只有12个
        System.out.println("code point size:" + content.codePointCount(0, content.length()));
    }

因为non-BMP的出现，java的char已经不是单纯的char了，它是一个code unit，符号的最小的存储单元。（注：不包括java9+的String实现，jdk9为何要将String的底层实现由char[]改成了byte[]? - 知乎）