String API理解(查询)

  1. char charAt(int index)

返回给定位置的代码单元的字符。
这里指的是代码单元,java字符串由char值序列组成,而char数据类型是一个采用UTF-16编码表示Unicode码点的代码单元。
当Unicode码点值范围在U+0~U+FFFF(不包括U+D800~U+D8FF)时(即正常字符时),对应的字符由一个代码单元表示。
当Unicode码点值范围在U+10000~U+10FFFF之间时(即辅助字符时),对应的字符由两个代码单元表示。
针对辅助字符时,使用charAt方法会出现一些错误,需要加以判断

举例如下:

package test;

import com.sun.webkit.ThemeClient;

public class StringTest{
	public static void main(String[] args) {
		String normal="🃏 is大王";
		System.out.println("the sentence is: "+normal);
		System.out.println("the length of the sentence: "+normal.length());
		System.out.println("the second code point unit is: "+normal.charAt(1));
	}
}

结果:

the sentence is: 🃏 is大王
the length of the sentence: 7
the second code point unit is: ?

这里第二个代码单元是特殊字符“大王”的第二个代码单元,而不是我们自认为的“空格”!

解决方法:
将字符串的Unicode码输出,通过获得的Unicode码来创建新的字符串,来获得正确的字符串。

package test.string;

import java.util.Arrays;
public class StringTest {

	public static void main(String[] args) {
		// TODO Auto-generated method stub
		
		String normal="🃏 is 大王";
		System.out.println("the sentence is: "+normal);
		System.out.println("the length of the sentence: "+normal.length());
		System.out.println("the second code point unit is: "+normal.charAt(1));
		int[] codePoints=normal.codePoints().toArray();
		System.out.println("the Unicode of the string is: "+Arrays.toString(codePoints));
		System.out.println("the corresponding string is: "+new String(codePoints, 0, codePoints.length));
	}
}

结果:

the sentence is: 🃏 is 大王
the length of the sentence: 8
the second code point unit is: ?
the Unicode of the string is: [127183, 32, 105, 115, 32, 22823, 29579]
the corresponding string is: 🃏 is 大王
  1. int codePointAt(int index)
    返回指定位置代码单元的UTF16字节码。
    理解方法同charAt方法。

代码为:

package test.string;

import java.util.Arrays;

import test.UnicodeTest;
public class StringTest {

	public static void main(String[] args) {
		// TODO Auto-generated method stub
		
		String normal="🃏 is 大王";
		System.out.println("the sentence is: "+normal);
		System.out.println("the length of the sentence: "+normal.length());
		System.out.println("the second code point unit is: "+normal.charAt(1));
		
		int[] codePoints=normal.codePoints().toArray();
		System.out.println("the Unicode of the string is: "+Arrays.toString(codePoints));
		System.out.println("the corresponding string is: "+new String(codePoints, 0, codePoints.length));
		
		String firstChar=UnicodeTest.changeToUTF16(codePoints[0]);
		System.out.println(UnicodeTest.formatString(firstChar));		
		System.out.println("the second code point is: "+normal.codePointAt(1));
	}
}

结果:

the sentence is: 🃏 is 大王
the length of the sentence: 8
the second code point unit is: ?
the Unicode of the string is: [127183, 32, 105, 115, 32, 22823, 29579]
the corresponding string is: 🃏 is 大王
 11011000 01111100 11011100 11001111
the second code point is: 56527

由结果可见第二个字节单元对应的代码单元就是:🃏字符对应的UTF16码对中的后一个代码单元。
3. int offsetByCodePoints(int startIndex, int cpCount)
cpCount为字符数量。当startIndex为特殊字符UTF16代码对的第一个代码单元时得出结果有与字面理解不同。
这个方法使用时存在bug,从结果看存在index=length的情况
验证代码:

/*验证代码1:*/
package test.string;

import java.util.Arrays;

import test.UnicodeTest;

public class StringTest {

	public static void main(String[] args) {
		String normal;
		normal = "this";
		for(int j=0;j<normal.length();j++) {
			System.out.println("----------------" + "startIndex = "+j + "------------------");
			for (int i=0; i <= normal.length(); i++) {
				try {
					System.out.println("normal.offsetByCodePoints("+j+", "+i+")  :   "+normal.offsetByCodePoints(j, i));
				} catch (Exception e) {
					System.out.println("normal.offsetByCodePoints("+j+", "+i+")  :   out of range");
				}
				
			}
		}

/*验证结果1: */
----------------startIndex = 0------------------
normal.offsetByCodePoints(0, 0)  :   0
normal.offsetByCodePoints(0, 1)  :   1
normal.offsetByCodePoints(0, 2)  :   2
normal.offsetByCodePoints(0, 3)  :   3
normal.offsetByCodePoints(0, 4)  :   4
----------------startIndex = 1------------------
normal.offsetByCodePoints(1, 0)  :   1
normal.offsetByCodePoints(1, 1)  :   2
normal.offsetByCodePoints(1, 2)  :   3
normal.offsetByCodePoints(1, 3)  :   4
normal.offsetByCodePoints(1, 4)  :   out of range
----------------startIndex = 2------------------
normal.offsetByCodePoints(2, 0)  :   2
normal.offsetByCodePoints(2, 1)  :   3
normal.offsetByCodePoints(2, 2)  :   4
normal.offsetByCodePoints(2, 3)  :   out of range
normal.offsetByCodePoints(2, 4)  :   out of range
----------------startIndex = 3------------------
normal.offsetByCodePoints(3, 0)  :   3
normal.offsetByCodePoints(3, 1)  :   4
normal.offsetByCodePoints(3, 2)  :   out of range
normal.offsetByCodePoints(3, 3)  :   out of range
normal.offsetByCodePoints(3, 4)  :   out of range

/*验证代码2:*/
package test.string;

import java.util.Arrays;

import test.UnicodeTest;

public class StringTest {

	public static void main(String[] args) {
		String normal;
		normal = "🃏 is";
		for(int j=0;j<normal.length();j++) {
			System.out.println("----------------" + "startIndex = "+j + "------------------");
			for (int i=0; i <= normal.length(); i++) {
				try {
					System.out.println("normal.offsetByCodePoints("+j+", "+i+")  :   "+normal.offsetByCodePoints(j, i));
				} catch (Exception e) {
					System.out.println("normal.offsetByCodePoints("+j+", "+i+")  :   out of range");
				}
				
			}
		}
}
/*验证结果2: */
----------------startIndex = 0------------------
normal.offsetByCodePoints(0, 0)  :   0
normal.offsetByCodePoints(0, 1)  :   2
normal.offsetByCodePoints(0, 2)  :   3
normal.offsetByCodePoints(0, 3)  :   4
normal.offsetByCodePoints(0, 4)  :   5
normal.offsetByCodePoints(0, 5)  :   out of range
----------------startIndex = 1------------------
normal.offsetByCodePoints(1, 0)  :   1
normal.offsetByCodePoints(1, 1)  :   2
normal.offsetByCodePoints(1, 2)  :   3
normal.offsetByCodePoints(1, 3)  :   4
normal.offsetByCodePoints(1, 4)  :   5
normal.offsetByCodePoints(1, 5)  :   out of range
----------------startIndex = 2------------------
normal.offsetByCodePoints(2, 0)  :   2
normal.offsetByCodePoints(2, 1)  :   3
normal.offsetByCodePoints(2, 2)  :   4
normal.offsetByCodePoints(2, 3)  :   5
normal.offsetByCodePoints(2, 4)  :   out of range
normal.offsetByCodePoints(2, 5)  :   out of range
----------------startIndex = 3------------------
normal.offsetByCodePoints(3, 0)  :   3
normal.offsetByCodePoints(3, 1)  :   4
normal.offsetByCodePoints(3, 2)  :   5
normal.offsetByCodePoints(3, 3)  :   out of range
normal.offsetByCodePoints(3, 4)  :   out of range
normal.offsetByCodePoints(3, 5)  :   out of range
----------------startIndex = 4------------------
normal.offsetByCodePoints(4, 0)  :   4
normal.offsetByCodePoints(4, 1)  :   5
normal.offsetByCodePoints(4, 2)  :   out of range
normal.offsetByCodePoints(4, 3)  :   out of range
normal.offsetByCodePoints(4, 4)  :   out of range
normal.offsetByCodePoints(4, 5)  :   out of range
  1. IntStream codePoints()
    返回组成该字符串的字符数组的Unicode码值的整数流。
/*验证代码:*/
package test.string;

import java.util.Arrays;

import test.UnicodeTest;
public class StringTest {

	public static void main(String[] args) {
		// TODO Auto-generated method stub
		
		String normal="🃏 🃏is 大王";
		System.out.println("the sentence is: "+normal);		
		int[] codePoints=normal.codePoints().toArray();
		System.out.println("the Unicode of the string is: "+Arrays.toString(codePoints));
	}
}

/*验证结果:*/
the sentence is: 🃏 🃏is 大王
the Unicode of the string is: [127183, 32, 127183, 105, 115, 32, 22823, 29579]
  1. boolean isEmpty()
    只有字符串长度为0时,返回true,否则都返回false。
  2. boolean isBlank()
    当字符串为空或者只包含空白字符时,返回true,否则返回false。
/*验证代码:*/
package test.string;


public class StringTest {

	public static void main(String[] args) {

		String empty = "";
		System.out.println(empty+" is empty: "+empty.isEmpty());
		System.out.println(empty+" is blank: "+empty.isBlank());
		String blank = "  ";
		System.out.println(blank+" is empty: "+blank.isEmpty());
		System.out.println(blank+" is blank: "+blank.isBlank());
		String nor=" this is ";
		System.out.println(nor+" is blank: "+nor.isBlank());
		System.out.println(nor+" is empty: "+nor.isEmpty());

	}
}


/*验证结果:*/
 is empty: true
 is blank: true
   is empty: false
   is blank: true
 this is  is blank: false
 this is  is empty: false
  1. boolean startsWith(String prefix)
  2. boolean endsWith(String suffix)
    当字符串一prefix或者suffix结尾时,返回true,否则返回false。
/*验证代码:*/
package test.string;
public class StringTest {

	public static void main(String[] args) {

		String normal = "🃏 🃏is 大王";
		String prefix="🃏";
		String suffix="大王";
		System.out.println(normal+"  start with "+prefix+" ? "+normal.startsWith(prefix));
		System.out.println(normal+"  end with "+suffix+" ? "+normal.endsWith(suffix));
		System.out.println(normal+"  end with "+prefix+" ? "+normal.endsWith(prefix));
		System.out.println(normal+"  start with "+suffix+" ? "+normal.startsWith(suffix));		

	}
}
/*验证结果:*/
🃏 🃏is 大王  start with 🃏 ? true
🃏 🃏is 大王  end with 大王 ? true
🃏 🃏is 大王  end with 🃏 ? false
🃏 🃏is 大王  start with 大王 ? false
  1. int indexOf(String str)
  2. int indexOf(String str, int fromIndex)
  3. int indexOf(int cp)
  4. int indexOf(int cp, int fromIndex)
    查询第一个与str或cp相同的子字符串相同的索引值;或者查询从fromIndex索引后第一个与str或cp相同的子字符串相同的索引值。不存在时返回-1。
/*验证代码:*/
package test.string;
import java.util.Arrays;
import test.UnicodeTest;
public class StringTest {
	public static void main(String[] args) {
		String normal = "🃏 🃏is 大王";
		System.out.println("the sentence is: " + normal);
		int[] codePoints = normal.codePoints().toArray();
		System.out.println("the Unicode of the string is: " + Arrays.toString(codePoints));
		System.out.println("the first index of 🃏 is: "+normal.indexOf("🃏"));
		System.out.println("the first index of 🃏 which is lager than 1 is : "+normal.indexOf("🃏", 1));
		System.out.println("the first index of 32 is: "+normal.indexOf(32));
		System.out.println("the first index of 32 which is lager than 3 is : "+normal.indexOf(32,3));
		System.out.println("the first index of 50 is:"+normal.indexOf(50));
	}
}

/*验证结果:*/
the sentence is: 🃏 🃏is 大王
the Unicode of the string is: [127183, 32, 127183, 105, 115, 32, 22823, 29579]
the first index of 🃏 is: 0
the first index of 🃏 which is lager than 1 is : 3
the first index of 32 is: 2
the first index of 32 which is lager than 3 is : 7
the first index of 50 is:-1
  1. int lastIndexOf(String str)
  2. int lastIndexOf(String str, int fromIndex)
  3. int lastIndexOf(int cp)
  4. int lastIndexOf(int cp, int fromIndex)
    与indexOf方法理解相同,不同的地方在于lastIndexOf方法是从后往前进行匹配运算的。
/*验证代码:*/
package test.string;
import java.util.Arrays;
import test.UnicodeTest;
public class StringTest {
	public static void main(String[] args) {
		String normal = "🃏 🃏is 大王";
		System.out.println("the sentence is: " + normal);
		int[] codePoints = normal.codePoints().toArray();
		System.out.println("the Unicode of the string is: " + Arrays.toString(codePoints));
		System.out.println("the last index of 🃏 is: "+normal.lastIndexOf("🃏"));
		System.out.println("the last index of 🃏 which is less than 1 is : "+normal.lastIndexOf("🃏", 2));
		System.out.println("the last index of 32 is: "+normal.lastIndexOf(32));
		System.out.println("the last index of 32 which is less than 3 is : "+normal.lastIndexOf(32,3));
		System.out.println("the last index of 50 is:"+normal.lastIndexOf(50));
	}
}

/*验证结果:*/
the sentence is: 🃏 🃏is 大王
the Unicode of the string is: [127183, 32, 127183, 105, 115, 32, 22823, 29579]
the last index of 🃏 is: 3
the last index of 🃏 which is less than 1 is : 0
the last index of 32 is: 7
the last index of 32 which is less than 3 is : 2
the last index of 50 is:-1
  1. int length()
    返回组成字符串的UTF16的代码单元数量。通常字符由一个代码单元构成,辅助字符由两个代码单元构成。针对有特殊字符时应注意结果与显示的字符数量不同。
  2. int codePointCount(int startIndex, int endIndex)
    返回endIndex与startIndex之间字符的数量计算时不包含endIndex。当出现特殊字符时startIndex或者endIndex为代码单元对中一个时都算一个字符。
/*验证代码:*/
package test.string;

import java.util.Arrays;

import test.UnicodeTest;

public class StringTest {

	public static void main(String[] args) {
		String normal = "🃏 🃏is 大王";
		System.out.println("the sentence is: " + normal);
		int[] codePoints = normal.codePoints().toArray();
		System.out.println("the Unicode of the string is: " + Arrays.toString(codePoints));		
		System.out.println("normal.codePointCount(1, 10): "+normal.codePointCount(1, 10));
		System.out.println("normal.codePointCount(0, 10): "+normal.codePointCount(0, 10));
	}
}

/*验证结果:*/
the sentence is: 🃏 🃏is 大王
the Unicode of the string is: [127183, 32, 127183, 105, 115, 32, 22823, 29579]
normal.codePointCount(1, 10): 8
normal.codePointCount(0, 10): 8
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

爱学习_程序员

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值