String API理解（查询）

最新推荐文章于 2022-03-08 20:26:59 发布

爱学习_程序员

最新推荐文章于 2022-03-08 20:26:59 发布

阅读量254

点赞数

文章标签： java 字符串

本文链接：https://blog.csdn.net/baidu_38766791/article/details/105874075

版权

char charAt(int index)

返回给定位置的代码单元的字符。
这里指的是代码单元，java字符串由char值序列组成，而char数据类型是一个采用UTF-16编码表示Unicode码点的代码单元。
当Unicode码点值范围在U+0~U+FFFF(不包括U+D800~U+D8FF)时(即正常字符时)，对应的字符由一个代码单元表示。
当Unicode码点值范围在U+10000~U+10FFFF之间时（即辅助字符时），对应的字符由两个代码单元表示。
针对辅助字符时，使用charAt方法会出现一些错误，需要加以判断。
举例如下：

package test;

import com.sun.webkit.ThemeClient;

public class StringTest{
	public static void main(String[] args) {
		String normal="🃏 is大王";
		System.out.println("the sentence is: "+normal);
		System.out.println("the length of the sentence: "+normal.length());
		System.out.println("the second code point unit is: "+normal.charAt(1));
	}
}

结果：

the sentence is: 🃏 is大王
the length of the sentence: 7
the second code point unit is: ?

这里第二个代码单元是特殊字符“大王”的第二个代码单元，而不是我们自认为的“空格”！

解决方法：
将字符串的Unicode码输出，通过获得的Unicode码来创建新的字符串，来获得正确的字符串。

package test.string;

import java.util.Arrays;
public class StringTest {

	public static void main(String[] args) {
		// TODO Auto-generated method stub
		
		String normal="🃏 is 大王";
		System.out.println("the sentence is: "+normal);
		System.out.println("the length of the sentence: "+normal.length());
		System.out.println("the second code point unit is: "+normal.charAt(1));
		int[] codePoints=normal.codePoints().toArray();
		System.out.println("the Unicode of the string is: "+Arrays.toString(codePoints));
		System.out.println("the corresponding string is: "+new String(codePoints, 0, codePoints.length));
	}
}

结果：

the sentence is: 🃏 is 大王
the length of the sentence: 8
the second code point unit is: ?
the Unicode of the string is: [127183, 32, 105, 115, 32, 22823, 29579]
the corresponding string is: 🃏 is 大王

int codePointAt(int index)
返回指定位置代码单元的UTF16字节码。
理解方法同charAt方法。

代码为：

package test.string;

import java.util.Arrays;

import test.UnicodeTest;
public class StringTest {

	public static void main(String[] args) {
		// TODO Auto-generated method stub
		
		String normal="🃏 is 大王";
		System.out.println("the sentence is: "+normal);
		System.out.println("the length of the sentence: "+normal.length());
		System.out.println("the second code point unit is: "+normal.charAt(1));
		
		int[] codePoints=normal.codePoints().toArray();
		System.out.println("the Unicode of the string is: "+Arrays.toString(codePoints));
		System.out.println("the corresponding string is: "+new String(codePoints, 0, codePoints.length));
		
		String firstChar=UnicodeTest.changeToUTF16(codePoints[0]);
		System.out.println(UnicodeTest.formatString(firstChar));		
		System.out.println("the second code point is: "+normal.codePointAt(1));
	}
}

结果：

the sentence is: 🃏 is 大王
the length of the sentence: 8
the second code point unit is: ?
the Unicode of the string is: [127183, 32, 105, 115, 32, 22823, 29579]
the corresponding string is: 🃏 is 大王
 11011000 01111100 11011100 11001111
the second code point is: 56527

由结果可见第二个字节单元对应的代码单元就是：🃏字符对应的UTF16码对中的后一个代码单元。
3. int offsetByCodePoints(int startIndex, int cpCount)
cpCount为字符数量。当startIndex为特殊字符UTF16代码对的第一个代码单元时得出结果有与字面理解不同。
这个方法使用时存在bug，从结果看存在index=length的情况
验证代码：

/*验证代码1：*/
package test.string;

import java.util.Arrays;

import test.UnicodeTest;

public class StringTest {

	public static void main(String[] args) {
		String normal;
		normal = "this";
		for(int j=0;j<normal.length();j++) {
			System.out.println("----------------" + "startIndex = "+j + "------------------");
			for (int i=0; i <= normal.length(); i++) {
				try {
					System.out.println("normal.offsetByCodePoints("+j+", "+i+")  :   "+normal.offsetByCodePoints(j, i));
				} catch (Exception e) {
					System.out.println("normal.offsetByCodePoints("+j+", "+i+")  :   out of range");
				}
				
			}
		}

/*验证结果1： */
----------------startIndex = 0------------------
normal.offsetByCodePoints(0, 0)  :   0
normal.offsetByCodePoints(0, 1)  :   1
normal.offsetByCodePoints(0, 2)  :   2
normal.offsetByCodePoints(0, 3)  :   3
normal.offsetByCodePoints(0, 4)  :   4
----------------startIndex = 1------------------
normal.offsetByCodePoints(1, 0)  :   1
normal.offsetByCodePoints(1, 1)  :   2
normal.offsetByCodePoints(1, 2)  :   3
normal.offsetByCodePoints(1, 3)  :   4
normal.offsetByCodePoints(1, 4)  :   out of range
----------------startIndex = 2------------------
normal.offsetByCodePoints(2, 0)  :   2
normal.offsetByCodePoints(2, 1)  :   3
normal.offsetByCodePoints(2, 2)  :   4
normal.offsetByCodePoints(2, 3)  :   out of range
normal.offsetByCodePoints(2, 4)  :   out of range
----------------startIndex = 3------------------
normal.offsetByCodePoints(3, 0)  :   3
normal.offsetByCodePoints(3, 1)  :   4
normal.offsetByCodePoints(3, 2)  :   out of range
normal.offsetByCodePoints(3, 3)  :   out of range
normal.offsetByCodePoints(3, 4)  :   out of range

/*验证代码2：*/
package test.string;

import java.util.Arrays;

import test.UnicodeTest;

public class StringTest {

	public static void main(String[] args) {
		String normal;
		normal = "🃏 is";
		for(int j=0;j<normal.length();j++) {
			System.out.println("----------------" + "startIndex = "+j + "------------------");
			for (int i=0; i <= normal.length(); i++) {
				try {
					System.out.println("normal.offsetByCodePoints("+j+", "+i+")  :   "+normal.offsetByCodePoints(j, i));
				} catch (Exception e) {
					System.out.println("normal.offsetByCodePoints("+j+", "+i+")  :   out of range");
				}
				
			}
		}
}
/*验证结果2： */
----------------startIndex = 0------------------
normal.offsetByCodePoints(0, 0)  :   0
normal.offsetByCodePoints(0, 1)  :   2
normal.offsetByCodePoints(0, 2)  :   3
normal.offsetByCodePoints(0, 3)  :   4
normal.offsetByCodePoints(0, 4)  :   5
normal.offsetByCodePoints(0, 5)  :   out of range
----------------startIndex = 1------------------
normal.offsetByCodePoints(1, 0)  :   1
normal.offsetByCodePoints(1, 1)  :   2
normal.offsetByCodePoints(1, 2)  :   3
normal.offsetByCodePoints(1, 3)  :   4
normal.offsetByCodePoints(1, 4)  :   5
normal.offsetByCodePoints(1, 5)  :   out of range
----------------startIndex = 2------------------
normal.offsetByCodePoints(2, 0)  :   2
normal.offsetByCodePoints(2, 1)  :   3
normal.offsetByCodePoints(2, 2)  :   4
normal.offsetByCodePoints(2, 3)  :   5
normal.offsetByCodePoints(2, 4)  :   out of range
normal.offsetByCodePoints(2, 5)  :   out of range
----------------startIndex = 3------------------
normal.offsetByCodePoints(3, 0)  :   3
normal.offsetByCodePoints(3, 1)  :   4
normal.offsetByCodePoints(3, 2)  :   5
normal.offsetByCodePoints(3, 3)  :   out of range
normal.offsetByCodePoints(3, 4)  :   out of range
normal.offsetByCodePoints(3, 5)  :   out of range
----------------startIndex = 4------------------
normal.offsetByCodePoints(4, 0)  :   4
normal.offsetByCodePoints(4, 1)  :   5
normal.offsetByCodePoints(4, 2)  :   out of range
normal.offsetByCodePoints(4, 3)  :   out of range
normal.offsetByCodePoints(4, 4)  :   out of range
normal.offsetByCodePoints(4, 5)  :   out of range

IntStream codePoints()
返回组成该字符串的字符数组的Unicode码值的整数流。

/*验证代码：*/
package test.string;

import java.util.Arrays;

import test.UnicodeTest;
public class StringTest {

	public static void main(String[] args) {
		// TODO Auto-generated method stub
		
		String normal="🃏 🃏is 大王";
		System.out.println("the sentence is: "+normal);		
		int[] codePoints=normal.codePoints().toArray();
		System.out.println("the Unicode of the string is: "+Arrays.toString(codePoints));
	}
}

/*验证结果：*/
the sentence is: 🃏 🃏is 大王
the Unicode of the string is: [127183, 32, 127183, 105, 115, 32, 22823, 29579]

boolean isEmpty()
只有字符串长度为0时，返回true，否则都返回false。
boolean isBlank()
当字符串为空或者只包含空白字符时，返回true，否则返回false。

/*验证代码:*/
package test.string;


public class StringTest {

	public static void main(String[] args) {

		String empty = "";
		System.out.println(empty+" is empty: "+empty.isEmpty());
		System.out.println(empty+" is blank: "+empty.isBlank());
		String blank = "  ";
		System.out.println(blank+" is empty: "+blank.isEmpty());
		System.out.println(blank+" is blank: "+blank.isBlank());
		String nor=" this is ";
		System.out.println(nor+" is blank: "+nor.isBlank());
		System.out.println(nor+" is empty: "+nor.isEmpty());

	}
}


/*验证结果:*/
 is empty: true
 is blank: true
   is empty: false
   is blank: true
 this is  is blank: false
 this is  is empty: false

boolean startsWith(String prefix)
boolean endsWith(String suffix)
当字符串一prefix或者suffix结尾时，返回true，否则返回false。

/*验证代码：*/
package test.string;
public class StringTest {

	public static void main(String[] args) {

		String normal = "🃏 🃏is 大王";
		String prefix="🃏";
		String suffix="大王";
		System.out.println(normal+"  start with "+prefix+" ? "+normal.startsWith(prefix));
		System.out.println(normal+"  end with "+suffix+" ? "+normal.endsWith(suffix));
		System.out.println(normal+"  end with "+prefix+" ? "+normal.endsWith(prefix));
		System.out.println(normal+"  start with "+suffix+" ? "+normal.startsWith(suffix));		

	}
}
/*验证结果：*/
🃏 🃏is 大王  start with 🃏 ? true
🃏 🃏is 大王  end with 大王 ? true
🃏 🃏is 大王  end with 🃏 ? false
🃏 🃏is 大王  start with 大王 ? false

int indexOf(String str)
int indexOf(String str, int fromIndex)
int indexOf(int cp)
int indexOf(int cp, int fromIndex)
查询第一个与str或cp相同的子字符串相同的索引值；或者查询从fromIndex索引后第一个与str或cp相同的子字符串相同的索引值。不存在时返回-1。

/*验证代码：*/
package test.string;
import java.util.Arrays;
import test.UnicodeTest;
public class StringTest {
	public static void main(String[] args) {
		String normal = "🃏 🃏is 大王";
		System.out.println("the sentence is: " + normal);
		int[] codePoints = normal.codePoints().toArray();
		System.out.println("the Unicode of the string is: " + Arrays.toString(codePoints));
		System.out.println("the first index of 🃏 is: "+normal.indexOf("🃏"));
		System.out.println("the first index of 🃏 which is lager than 1 is : "+normal.indexOf("🃏", 1));
		System.out.println("the first index of 32 is: "+normal.indexOf(32));
		System.out.println("the first index of 32 which is lager than 3 is : "+normal.indexOf(32,3));
		System.out.println("the first index of 50 is:"+normal.indexOf(50));
	}
}

/*验证结果：*/
the sentence is: 🃏 🃏is 大王
the Unicode of the string is: [127183, 32, 127183, 105, 115, 32, 22823, 29579]
the first index of 🃏 is: 0
the first index of 🃏 which is lager than 1 is : 3
the first index of 32 is: 2
the first index of 32 which is lager than 3 is : 7
the first index of 50 is:-1

int lastIndexOf(String str)
int lastIndexOf(String str, int fromIndex)
int lastIndexOf(int cp)
int lastIndexOf(int cp, int fromIndex)
与indexOf方法理解相同，不同的地方在于lastIndexOf方法是从后往前进行匹配运算的。

/*验证代码：*/
package test.string;
import java.util.Arrays;
import test.UnicodeTest;
public class StringTest {
	public static void main(String[] args) {
		String normal = "🃏 🃏is 大王";
		System.out.println("the sentence is: " + normal);
		int[] codePoints = normal.codePoints().toArray();
		System.out.println("the Unicode of the string is: " + Arrays.toString(codePoints));
		System.out.println("the last index of 🃏 is: "+normal.lastIndexOf("🃏"));
		System.out.println("the last index of 🃏 which is less than 1 is : "+normal.lastIndexOf("🃏", 2));
		System.out.println("the last index of 32 is: "+normal.lastIndexOf(32));
		System.out.println("the last index of 32 which is less than 3 is : "+normal.lastIndexOf(32,3));
		System.out.println("the last index of 50 is:"+normal.lastIndexOf(50));
	}
}

/*验证结果：*/
the sentence is: 🃏 🃏is 大王
the Unicode of the string is: [127183, 32, 127183, 105, 115, 32, 22823, 29579]
the last index of 🃏 is: 3
the last index of 🃏 which is less than 1 is : 0
the last index of 32 is: 7
the last index of 32 which is less than 3 is : 2
the last index of 50 is:-1

int length()
返回组成字符串的UTF16的代码单元数量。通常字符由一个代码单元构成，辅助字符由两个代码单元构成。针对有特殊字符时应注意结果与显示的字符数量不同。
int codePointCount(int startIndex, int endIndex)
返回endIndex与startIndex之间字符的数量。计算时不包含endIndex。当出现特殊字符时startIndex或者endIndex为代码单元对中一个时都算一个字符。

/*验证代码：*/
package test.string;

import java.util.Arrays;

import test.UnicodeTest;

public class StringTest {

	public static void main(String[] args) {
		String normal = "🃏 🃏is 大王";
		System.out.println("the sentence is: " + normal);
		int[] codePoints = normal.codePoints().toArray();
		System.out.println("the Unicode of the string is: " + Arrays.toString(codePoints));		
		System.out.println("normal.codePointCount(1, 10): "+normal.codePointCount(1, 10));
		System.out.println("normal.codePointCount(0, 10): "+normal.codePointCount(0, 10));
	}
}

/*验证结果：*/
the sentence is: 🃏 🃏is 大王
the Unicode of the string is: [127183, 32, 127183, 105, 115, 32, 22823, 29579]
normal.codePointCount(1, 10): 8
normal.codePointCount(0, 10): 8

爱学习_程序员

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
String API理解（查询）

char charAt(int index)返回给定位置的代码单元。这里指的是代码单元，java字符串由char值序列组成，而char数据类型是一个采用UTF-16编码表示Unicode码点的代码单元。当Unicode码点值范围在U+0~U+FFFF(不包括U+D800~U+D8FF)时(即正常字符时)，对应的字符由一个代码单元表示。当Unicode码点值范围在U+10000~U+10F...
复制链接

扫一扫