char charAt(int index)
返回给定位置的代码单元的字符。
这里指的是代码单元,java字符串由char值序列组成,而char数据类型是一个采用UTF-16编码表示Unicode码点的代码单元。
当Unicode码点值范围在U+0~U+FFFF(不包括U+D800~U+D8FF)时(即正常字符时),对应的字符由一个代码单元表示。
当Unicode码点值范围在U+10000~U+10FFFF之间时(即辅助字符时),对应的字符由两个代码单元表示。
针对辅助字符时,使用charAt方法会出现一些错误,需要加以判断。
举例如下:
package test;
import com.sun.webkit.ThemeClient;
public class StringTest{
public static void main(String[] args) {
String normal="🃏 is大王";
System.out.println("the sentence is: "+normal);
System.out.println("the length of the sentence: "+normal.length());
System.out.println("the second code point unit is: "+normal.charAt(1));
}
}
结果:
the sentence is: 🃏 is大王
the length of the sentence: 7
the second code point unit is: ?
这里第二个代码单元是特殊字符“大王”的第二个代码单元,而不是我们自认为的“空格”!
解决方法:
将字符串的Unicode码输出,通过获得的Unicode码来创建新的字符串,来获得正确的字符串。
package test.string;
import java.util.Arrays;
public class StringTest {
public static void main(String[] args) {
// TODO Auto-generated method stub
String normal="🃏 is 大王";
System.out.println("the sentence is: "+normal);
System.out.println("the length of the sentence: "+normal.length());
System.out.println("the second code point unit is: "+normal.charAt(1));
int[] codePoints=normal.codePoints().toArray();
System.out.println("the Unicode of the string is: "+Arrays.toString(codePoints));
System.out.println("the corresponding string is: "+new String(codePoints, 0, codePoints.length));
}
}
结果:
the sentence is: 🃏 is 大王
the length of the sentence: 8
the second code point unit is: ?
the Unicode of the string is: [127183, 32, 105, 115, 32, 22823, 29579]
the corresponding string is: 🃏 is 大王
int codePointAt(int index)
返回指定位置代码单元的UTF16字节码。
理解方法同charAt方法。
代码为:
package test.string;
import java.util.Arrays;
import test.UnicodeTest;
public class StringTest {
public static void main(String[] args) {
// TODO Auto-generated method stub
String normal="🃏 is 大王";
System.out.println("the sentence is: "+normal);
System.out.println("the length of the sentence: "+normal.length());
System.out.println("the second code point unit is: "+normal.charAt(1));
int[] codePoints=normal.codePoints().toArray();
System.out.println("the Unicode of the string is: "+Arrays.toString(codePoints));
System.out.println("the corresponding string is: "+new String(codePoints, 0, codePoints.length));
String firstChar=UnicodeTest.changeToUTF16(codePoints[0]);
System.out.println(UnicodeTest.formatString(firstChar));
System.out.println("the second code point is: "+normal.codePointAt(1));
}
}
结果:
the sentence is: 🃏 is 大王
the length of the sentence: 8
the second code point unit is: ?
the Unicode of the string is: [127183, 32, 105, 115, 32, 22823, 29579]
the corresponding string is: 🃏 is 大王
11011000 01111100 11011100 11001111
the second code point is: 56527
由结果可见第二个字节单元对应的代码单元就是:🃏字符对应的UTF16码对中的后一个代码单元。
3. int offsetByCodePoints(int startIndex, int cpCount)
cpCount为字符数量。当startIndex为特殊字符UTF16代码对的第一个代码单元时得出结果有与字面理解不同。
这个方法使用时存在bug,从结果看存在index=length的情况
验证代码:
/*验证代码1:*/
package test.string;
import java.util.Arrays;
import test.UnicodeTest;
public class StringTest {
public static void main(String[] args) {
String normal;
normal = "this";
for(int j=0;j<normal.length();j++) {
System.out.println("----------------" + "startIndex = "+j + "------------------");
for (int i=0; i <= normal.length(); i++) {
try {
System.out.println("normal.offsetByCodePoints("+j+", "+i+") : "+normal.offsetByCodePoints(j, i));
} catch (Exception e) {
System.out.println("normal.offsetByCodePoints("+j+", "+i+") : out of range");
}
}
}
/*验证结果1: */
----------------startIndex = 0------------------
normal.offsetByCodePoints(0, 0) : 0
normal.offsetByCodePoints(0, 1) : 1
normal.offsetByCodePoints(0, 2) : 2
normal.offsetByCodePoints(0, 3) : 3
normal.offsetByCodePoints(0, 4) : 4
----------------startIndex = 1------------------
normal.offsetByCodePoints(1, 0) : 1
normal.offsetByCodePoints(1, 1) : 2
normal.offsetByCodePoints(1, 2) : 3
normal.offsetByCodePoints(1, 3) : 4
normal.offsetByCodePoints(1, 4) : out of range
----------------startIndex = 2------------------
normal.offsetByCodePoints(2, 0) : 2
normal.offsetByCodePoints(2, 1) : 3
normal.offsetByCodePoints(2, 2) : 4
normal.offsetByCodePoints(2, 3) : out of range
normal.offsetByCodePoints(2, 4) : out of range
----------------startIndex = 3------------------
normal.offsetByCodePoints(3, 0) : 3
normal.offsetByCodePoints(3, 1) : 4
normal.offsetByCodePoints(3, 2) : out of range
normal.offsetByCodePoints(3, 3) : out of range
normal.offsetByCodePoints(3, 4) : out of range
/*验证代码2:*/
package test.string;
import java.util.Arrays;
import test.UnicodeTest;
public class StringTest {
public static void main(String[] args) {
String normal;
normal = "🃏 is";
for(int j=0;j<normal.length();j++) {
System.out.println("----------------" + "startIndex = "+j + "------------------");
for (int i=0; i <= normal.length(); i++) {
try {
System.out.println("normal.offsetByCodePoints("+j+", "+i+") : "+normal.offsetByCodePoints(j, i));
} catch (Exception e) {
System.out.println("normal.offsetByCodePoints("+j+", "+i+") : out of range");
}
}
}
}
/*验证结果2: */
----------------startIndex = 0------------------
normal.offsetByCodePoints(0, 0) : 0
normal.offsetByCodePoints(0, 1) : 2
normal.offsetByCodePoints(0, 2) : 3
normal.offsetByCodePoints(0, 3) : 4
normal.offsetByCodePoints(0, 4) : 5
normal.offsetByCodePoints(0, 5) : out of range
----------------startIndex = 1------------------
normal.offsetByCodePoints(1, 0) : 1
normal.offsetByCodePoints(1, 1) : 2
normal.offsetByCodePoints(1, 2) : 3
normal.offsetByCodePoints(1, 3) : 4
normal.offsetByCodePoints(1, 4) : 5
normal.offsetByCodePoints(1, 5) : out of range
----------------startIndex = 2------------------
normal.offsetByCodePoints(2, 0) : 2
normal.offsetByCodePoints(2, 1) : 3
normal.offsetByCodePoints(2, 2) : 4
normal.offsetByCodePoints(2, 3) : 5
normal.offsetByCodePoints(2, 4) : out of range
normal.offsetByCodePoints(2, 5) : out of range
----------------startIndex = 3------------------
normal.offsetByCodePoints(3, 0) : 3
normal.offsetByCodePoints(3, 1) : 4
normal.offsetByCodePoints(3, 2) : 5
normal.offsetByCodePoints(3, 3) : out of range
normal.offsetByCodePoints(3, 4) : out of range
normal.offsetByCodePoints(3, 5) : out of range
----------------startIndex = 4------------------
normal.offsetByCodePoints(4, 0) : 4
normal.offsetByCodePoints(4, 1) : 5
normal.offsetByCodePoints(4, 2) : out of range
normal.offsetByCodePoints(4, 3) : out of range
normal.offsetByCodePoints(4, 4) : out of range
normal.offsetByCodePoints(4, 5) : out of range
IntStream codePoints()
返回组成该字符串的字符数组的Unicode码值的整数流。
/*验证代码:*/
package test.string;
import java.util.Arrays;
import test.UnicodeTest;
public class StringTest {
public static void main(String[] args) {
// TODO Auto-generated method stub
String normal="🃏 🃏is 大王";
System.out.println("the sentence is: "+normal);
int[] codePoints=normal.codePoints().toArray();
System.out.println("the Unicode of the string is: "+Arrays.toString(codePoints));
}
}
/*验证结果:*/
the sentence is: 🃏 🃏is 大王
the Unicode of the string is: [127183, 32, 127183, 105, 115, 32, 22823, 29579]
boolean isEmpty()
只有字符串长度为0时,返回true,否则都返回false。boolean isBlank()
当字符串为空或者只包含空白字符时,返回true,否则返回false。
/*验证代码:*/
package test.string;
public class StringTest {
public static void main(String[] args) {
String empty = "";
System.out.println(empty+" is empty: "+empty.isEmpty());
System.out.println(empty+" is blank: "+empty.isBlank());
String blank = " ";
System.out.println(blank+" is empty: "+blank.isEmpty());
System.out.println(blank+" is blank: "+blank.isBlank());
String nor=" this is ";
System.out.println(nor+" is blank: "+nor.isBlank());
System.out.println(nor+" is empty: "+nor.isEmpty());
}
}
/*验证结果:*/
is empty: true
is blank: true
is empty: false
is blank: true
this is is blank: false
this is is empty: false
boolean startsWith(String prefix)
boolean endsWith(String suffix)
当字符串一prefix或者suffix结尾时,返回true,否则返回false。
/*验证代码:*/
package test.string;
public class StringTest {
public static void main(String[] args) {
String normal = "🃏 🃏is 大王";
String prefix="🃏";
String suffix="大王";
System.out.println(normal+" start with "+prefix+" ? "+normal.startsWith(prefix));
System.out.println(normal+" end with "+suffix+" ? "+normal.endsWith(suffix));
System.out.println(normal+" end with "+prefix+" ? "+normal.endsWith(prefix));
System.out.println(normal+" start with "+suffix+" ? "+normal.startsWith(suffix));
}
}
/*验证结果:*/
🃏 🃏is 大王 start with 🃏 ? true
🃏 🃏is 大王 end with 大王 ? true
🃏 🃏is 大王 end with 🃏 ? false
🃏 🃏is 大王 start with 大王 ? false
int indexOf(String str)
int indexOf(String str, int fromIndex)
int indexOf(int cp)
int indexOf(int cp, int fromIndex)
查询第一个与str或cp相同的子字符串相同的索引值;或者查询从fromIndex索引后第一个与str或cp相同的子字符串相同的索引值。不存在时返回-1。
/*验证代码:*/
package test.string;
import java.util.Arrays;
import test.UnicodeTest;
public class StringTest {
public static void main(String[] args) {
String normal = "🃏 🃏is 大王";
System.out.println("the sentence is: " + normal);
int[] codePoints = normal.codePoints().toArray();
System.out.println("the Unicode of the string is: " + Arrays.toString(codePoints));
System.out.println("the first index of 🃏 is: "+normal.indexOf("🃏"));
System.out.println("the first index of 🃏 which is lager than 1 is : "+normal.indexOf("🃏", 1));
System.out.println("the first index of 32 is: "+normal.indexOf(32));
System.out.println("the first index of 32 which is lager than 3 is : "+normal.indexOf(32,3));
System.out.println("the first index of 50 is:"+normal.indexOf(50));
}
}
/*验证结果:*/
the sentence is: 🃏 🃏is 大王
the Unicode of the string is: [127183, 32, 127183, 105, 115, 32, 22823, 29579]
the first index of 🃏 is: 0
the first index of 🃏 which is lager than 1 is : 3
the first index of 32 is: 2
the first index of 32 which is lager than 3 is : 7
the first index of 50 is:-1
int lastIndexOf(String str)
int lastIndexOf(String str, int fromIndex)
int lastIndexOf(int cp)
int lastIndexOf(int cp, int fromIndex)
与indexOf方法理解相同,不同的地方在于lastIndexOf方法是从后往前进行匹配运算的。
/*验证代码:*/
package test.string;
import java.util.Arrays;
import test.UnicodeTest;
public class StringTest {
public static void main(String[] args) {
String normal = "🃏 🃏is 大王";
System.out.println("the sentence is: " + normal);
int[] codePoints = normal.codePoints().toArray();
System.out.println("the Unicode of the string is: " + Arrays.toString(codePoints));
System.out.println("the last index of 🃏 is: "+normal.lastIndexOf("🃏"));
System.out.println("the last index of 🃏 which is less than 1 is : "+normal.lastIndexOf("🃏", 2));
System.out.println("the last index of 32 is: "+normal.lastIndexOf(32));
System.out.println("the last index of 32 which is less than 3 is : "+normal.lastIndexOf(32,3));
System.out.println("the last index of 50 is:"+normal.lastIndexOf(50));
}
}
/*验证结果:*/
the sentence is: 🃏 🃏is 大王
the Unicode of the string is: [127183, 32, 127183, 105, 115, 32, 22823, 29579]
the last index of 🃏 is: 3
the last index of 🃏 which is less than 1 is : 0
the last index of 32 is: 7
the last index of 32 which is less than 3 is : 2
the last index of 50 is:-1
int length()
返回组成字符串的UTF16的代码单元数量。通常字符由一个代码单元构成,辅助字符由两个代码单元构成。针对有特殊字符时应注意结果与显示的字符数量不同。int codePointCount(int startIndex, int endIndex)
返回endIndex与startIndex之间字符的数量。计算时不包含endIndex。当出现特殊字符时startIndex或者endIndex为代码单元对中一个时都算一个字符。
/*验证代码:*/
package test.string;
import java.util.Arrays;
import test.UnicodeTest;
public class StringTest {
public static void main(String[] args) {
String normal = "🃏 🃏is 大王";
System.out.println("the sentence is: " + normal);
int[] codePoints = normal.codePoints().toArray();
System.out.println("the Unicode of the string is: " + Arrays.toString(codePoints));
System.out.println("normal.codePointCount(1, 10): "+normal.codePointCount(1, 10));
System.out.println("normal.codePointCount(0, 10): "+normal.codePointCount(0, 10));
}
}
/*验证结果:*/
the sentence is: 🃏 🃏is 大王
the Unicode of the string is: [127183, 32, 127183, 105, 115, 32, 22823, 29579]
normal.codePointCount(1, 10): 8
normal.codePointCount(0, 10): 8