JAVA中offsetByCodePoints与索引逐一递增的区别

最新推荐文章于 2022-11-27 22:42:47 发布

MaxIsTaken

最新推荐文章于 2022-11-27 22:42:47 发布

阅读量1.1k

点赞数 1

文章标签：字符串 java

本文链接：https://blog.csdn.net/SagarZhang/article/details/116405369

版权

我们都知道

public int offsetByCodePoints(int index , int codePointOffset)

返回 String 中从 index 处偏移 codePointOffset 个代码点（CodePoint）的索引。

但是无论是英文，还是作为宽字符的中文，代码

String str = "abc哈哈哈";
int idx = 0, idx_off = 0;
while(idx < str.length())
{
	System.out.println(String.valueof(idx_off = str.offsetByCodePoints(idx, 1))+": "+str.substring(idx, idx_off) );
	idx = idx_off;
}

与代码

String str = "abc哈哈哈";
int idx = 0, idx_off = 0;
while(idx < str.length())
{
	System.out.println(String.valueof(idx_off = idx + 1)+
						": "+
						str.substring(idx, idx_off) );
	idx = idx_off;
}

没有什么区别，输出均为

1: a
2: b
3: c
4: 哈
5: 哈
6: 哈

变量idx_off都是以1为步长递增。
这是因为无论是中文还是英文，都是占用一个Unicode16字符，但是对于一些符号，可能需要占用两个。
例如数学符号双线（double-struck）O “𝕆”。

package com.company.main4;

public class main4 {
    private static void printOneByOne(String str)//使用offsetByCodePoints进行逐字打印
    {
        int idx = 0, idx_off = 0;
        while(str.length() > idx)
        {
            System.out.println(String.valueOf(idx_off = str.offsetByCodePoints(idx,1))
                    +": "
                    +str.substring(idx,idx_off));
            idx = idx_off;
        }
    }

    private static void printOneByOneMI(String str)//不使用offsetByCodePoints进行逐字打印
    {
        int idx = 0, idx_off = 0;
        while(str.length() > idx)
        {
            System.out.println(String.valueOf(idx_off = idx + 1)
                    +": "
                    +str.substring(idx,idx_off));
            idx = idx_off;
        }

    }
    public static void main(String[] args)
    {
        String tmp_str1 = "\uD835\uDD46ABCDEFG";//\uD835\uDD46就数学符号是双线0
        String tmp_str2 = "abc哈哈哈";
        
        System.out.println("=============================\n使用offsetByCodePoints输出\n=============================");
        printOneByOne(tmp_str1);
        System.out.println(" ");//输出空行
        printOneByOne(tmp_str2);

		System.out.println("=============================\n不使用offsetByCodePoints输出\n=============================");
        printOneByOneMI(tmp_str1);
        System.out.println(" ");//输出空行
        printOneByOneMI(tmp_str2);
    }
}

上面代码的输出是


=============================
使用offsetByCodePoints输出
=============================
2: 𝕆
3: A
4: B
5: C
6: D
7: E
8: F
9: G
 
1: a
2: b
3: c
4: 哈
5: 哈
6: 哈
=============================
不使用offsetByCodePoints输出
=============================
1: ?
2: ?
3: A
4: B
5: C
6: D
7: E
8: F
9: G
 
1: a
2: b
3: c
4: 哈
5: 哈
6: 哈

可以发现，在使用offsetByCodePoints逐字输出的时候，符号𝕆占用了两个String索引。
当只是按1递增索引输出时，符号𝕆无法正常输出。