文章目录
一.字符与字符串
字符串内部包含一个字符数组,String 可以和 char[] 相互转换.
1. 字符数组转为字符串【new String()】
public class StringTest {
public static void main(String[] args) {
CharToString();
}
/*
* 1.字符与字符串
*1. 字符数组转为字符串【new String()】
* */
private static void CharToString() {
char[] value = {'a','b','c','d'};
String str = new String(value);
System.out.println(str);
str = new String(value, 1, 2);// 从偏移量为 1 的位置开始,取 2 个字符,构建 String 对象
System.out.println(str);
}
}
new String(字符数组)源码
/**
* Allocates a new {@code String} so that it represents the sequence of
* characters currently contained in the character array argument. The
* contents of the character array are copied; subsequent modification of
* the character array does not affect the newly created string.
*
* @param value
* The initial value of the string
*/
public String(char value[]) { //会调用Arrays.copyOf(数组, 长度)进行拷贝给一个新的数组
this.value = Arrays.copyOf(value, value.length);
}
new String(字符数组, 起始索引, 拷贝个数)
public String(char value[], int offset, int count) {//起始索引 拷贝个数
if (offset < 0) {
throw new StringIndexOutOfBoundsException(offset);
}
if (count <= 0) {
if (count < 0) {//条件判断:是否符合拷贝资格,否则就抛出数组越界的异常
throw new StringIndexOutOfBoundsException(count);
}
if (offset <= value.length) {
this.value = "".value;
return;
}
}
// Note: offset or count might be near -1>>>1. //注意:偏移量或计数可能接近-1>>>1。
if (offset > value.length - count) {
throw new StringIndexOutOfBoundsException(offset + count);
}
this.value = Arrays.copyOfRange(value, offset, offset+count);//拷贝数组
}
new String(“字符串”)源码
public String(String original) {
this.value = original.value;
this.hash = original.hash;
}
...
/**
* String底层是使用字符数组存储的
*/
private final char value[];//我们用的字符串的本质就是一个数组,由于被 final修饰,所以不可更改
/**
* 用于缓存字符串的哈希值,默认为0
*/
private int hash; //引用对象的地址是一个整数的hash值
private static final long serialVersionUID = -6849794470754667710L; //序列化ID
...
发现这些都是
String
类的成员数据
我们用的字符串的本质就是一个数组,由于被final
修饰,所以不可更改
2. 字符串转为字符数组【toCharArray&.charAt】
private static void StringToChar(){
String str = "hello";
for (int i = 0; i < str.length(); i++) {
System.out.print(str.charAt(i));
}
System.out.println();
char[] chars = str.toCharArray();// 将字符串以字符数组的方式进行存储
System.out.println(chars);
}
结果:
Connected to the target VM, address: '127.0.0.1:57981', transport: 'socket'
hello
hello
Disconnected from the target VM, address: '127.0.0.1:57981', transport: 'socket'
charAt源码
/**
* Returns the {@code char} value at the 返回值在指定的索引。
* specified index. An index ranges from {@code 0} to 索引范围从0到长度-1.
* {@code length() - 1}. The first {@code char} value of the sequence 首个字母在索引为0的位置,
* is at index {@code 0}, the next at index {@code 1}, 下个索引是1,
* and so on, as for array indexing. 等等,就像数组索引。
*
* <p>If the {@code char} value specified by the index is a 索引指定的值是代理项,
* <a href="Character.html#unicode">surrogate</a>, the surrogate 返回代理项值。
* value is returned.
*
* @param index the index of the {@code char} value.
* @return the {@code char} value at the specified index of this string.
* The first {@code char} value is at index {@code 0}.
* @exception IndexOutOfBoundsException if the {@code index}
* argument is negative or not less than the length of this
* string.
*/
public char charAt(int index) {
if ((index < 0) || (index >= value.length)) {
throw new StringIndexOutOfBoundsException(index);
}
return value[index];
}
问题:System.out.println(chars);
为什么字符数组可以直接打印呢?
我们按住 ctrl+鼠标左键进入源码查看
/**
* Prints an array of characters and then terminate the line. This method 打印一个字符数组,然后终止该行。
* behaves as though it invokes <code>{@link #print(char[])}</code> and
* then <code>{@link #println()}</code>.
* 这种方法其行为就像调用print(char[])
* @param x an array of chars to print.
*/
public void println(char x[]) {
synchronized (this) {
print(x);
newLine();
}
}
发现 char[] 数组可以像 double, String 这样的基础数据类型直接输出而不需要调用 Arrays.toString()
方法
而对于其它类型的数组则需要调用 Arrays.toString()
方法
/**
* Prints an Object and then terminate the line. This method calls
* at first String.valueOf(x) to get the printed object's string value,
* then behaves as
* though it invokes <code>{@link #print(String)}</code> and then
* <code>{@link #println()}</code>.
*
* @param x The <code>Object</code> to be printed.
*/
public void println(Object x) {
String s = String.valueOf(x); //x.toString()
synchronized (this) {
print(s);
newLine();
}
}
二. 字节与字符串
1. 字节转换为字符串【new String()】
private static void ByteToString(){
byte[] bytes = {97, 98, 99, 100};
String str = new String(bytes);
System.out.println(str);
str = new String(bytes, 1, 2);
System.out.println(str);
str = new String(bytes, 1);//弃用
System.out.println(str);
}
结果:
abcd
bc
šŢţŤ
new String(字节数组)源码
/**
* Constructs a new {@code String} by decoding the specified array of bytes
* using the platform's default charset. The length of the new {@code
* String} is a function of the charset, and hence may not be equal to the
* length of the byte array.
*
* <p> The behavior of this constructor when the given bytes are not valid
* in the default charset is unspecified. The {@link
* java.nio.charset.CharsetDecoder} class should be used when more control
* over the decoding process is required.
*
* @param bytes
* The bytes to be decoded into characters
*
* @since JDK1.1
*/
public String(byte bytes[]) {
this(bytes, 0, bytes.length);
}
发现是当前类的一个重写方法,传入了一个(字节数组, 默认的0索引, 字节数组长度)
再点击 this 查看当前这个重写方法
public String(byte bytes[], int offset, int length) {
checkBounds(bytes, offset, length);
this.value = StringCoding.decode(bytes, offset, length);
}
在进入函数 decode 中查看发现是一个带有 decode功能编码格式的StringCodeing 类,里边的其它函数来实现各种格式的编码【gbk, utf-8】
static char[] decode(byte[] ba, int off, int len) {
String csn = Charset.defaultCharset().name();
try {
// use charset name decode() variant which provides caching.
return decode(csn, ba, off, len);
} catch (UnsupportedEncodingException x) {
warnUnsupportedCharset(csn);
}
try {
return decode("ISO-8859-1", ba, off, len);
} catch (UnsupportedEncodingException x) {
// If this code is hit during VM initialization, MessageUtils is
// the only way we will be able to get any kind of error message.
MessageUtils.err("ISO-8859-1 charset not available: "
+ x.toString());
// If we can not find ISO-8859-1 (a required encoding) then things
// are seriously wrong with the installation.
System.exit(1);
return null;
}
}
明文规定@Deprecated弃用的函数最好别用,否则实际使用中会出现意料之外的错误
//@deprecated This method does not properly convert bytes into
@Deprecated
public String(byte ascii[], int hibyte) {
this(ascii, hibyte, 0, ascii.length);
}
方法被@Deprecated注解,说明是已经弃用的方法。所以最好不要用,一面如上述代码打印未知内容
2. 字符串转为字节数组【getBytes】
private static void StringToByte(){
String str = "abcdef";
byte[] bytes = str.getBytes();
System.out.println(Arrays.toString(bytes));
str = "学习";
try {
bytes = str.getBytes("utf-8");// 1 个汉字 == 3个字节
System.out.println(Arrays.toString(bytes));
bytes = str.getBytes("gbk");// 1 个汉子 == 2 个字节
System.out.println(Arrays.toString(bytes));
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
}
[97, 98, 99, 100, 101, 102] 一个字母一个字节
[-27, -83, -90, -28, -71, -96]
[-47, -89, -49, -80]
str.getBytes()源码
static byte[] encode(char[] ca, int off, int len) {
String csn = Charset.defaultCharset().name();
try {
// use charset name encode() variant which provides caching.
return encode(csn, ca, off, len);
} catch (UnsupportedEncodingException x) {
warnUnsupportedCharset(csn);
}
try {
return encode("ISO-8859-1", ca, off, len);
} catch (UnsupportedEncodingException x) {
// If this code is hit during VM initialization, MessageUtils is
// the only way we will be able to get any kind of error message.
MessageUtils.err("ISO-8859-1 charset not available: "
+ x.toString());
// If we can not find ISO-8859-1 (a required encoding) then things
// are seriously wrong with the installation.
System.exit(1);
return null;
}
}
当转换为字节数组的时候调用的是encode
方法而不是字节转换为字符串的时候decode
方法
指定编码格式
str.getBytes(charsetName)源码
/**
* Encodes this {@code String} into a sequence of bytes using the named
* charset, storing the result into a new byte array.
*使用命名字符集将此{@code String}编码为字节序列,并将结果存储到新的字节数组中。
* <p> The behavior of this method when this string cannot be encoded in
* the given charset is unspecified. The {@link当字符串不能在给定字符集中编码时,此方法的行为未指定。
* java.nio.charset.CharsetEncoder} class should be used when more control
* over the encoding process is required.
*当需要对编码过程进行更多控制时,应使用{@linkjava.nio.charset.CharsetEncoder}类。
* @param charsetName
* The name of a supported {@linkplain java.nio.charset.Charset
* charset}
*
* @return The resultant byte array 结果字节数组
*
* @throws UnsupportedEncodingException
* If the named charset is not supported 如果不支持命名字符集
*
* @since JDK1.1
*/
public byte[] getBytes(String charsetName)
throws UnsupportedEncodingException {
if (charsetName == null) throw new NullPointerException();
return StringCoding.encode(charsetName, value, 0, value.length);
}
三.小结
那么何时使用 byte[], 何时使用 char[] 呢?
- byte[] 是把 String 按照一个字节一个字节的方式处理, 这种适合在网络传输, 数据存储这样的场景下使用. 更适合 针对二进制数据来操作.
- char[] 是吧 String 按照一个字符一个字符的方式处理, 更适合针对文本数据来操作, 尤其是包含中文的时候.
四. 字符串常见操作
1. 字符串比较【equals, compareTo】
private static void learn_equals(){
String str = "hello";
String str1 = new String("Hello");
System.out.println(str == str1);// false
System.out.println(str.equals(str1));//false
System.out.println(str.equalsIgnoreCase(str1));//true
}
equals()源码分析
/**
* Compares this string to the specified object. The result is {@code
* true} if and only if the argument is not {@code null} and is a {@code
* String} object that represents the same sequence of characters as this
* object.
*
* @param anObject 要将此{@code String}与之进行比较的对象
* The object to compare this {@code String} against
*
* @return {@code true} if the given object represents a {@code String}
* equivalent to this string, {@code false} otherwise
*
* @see #compareTo(String)
* @see #equalsIgnoreCase(String)
*/
public boolean equals(Object anObject) {
if (this == anObject) { //当前引用的hash值和参数的hash值一样就代表已经相等
return true;
}
if (anObject instanceof String) { //是否继承自String类
String anotherString = (String)anObject;//向下转型
int n = value.length;
if (n == anotherString.value.length) {
char v1[] = value;//为了不影响原始数据在新的数组中以比较两个对象的每一个元素是否相等
char v2[] = anotherString.value;
int i = 0;
while (n-- != 0) {
if (v1[i] != v2[i])
return false;
i++;
}
return true;//while循环走完之后说明:内部的元素都相等
}
}
return false;//如果不是继承自String类,则直接说明不是同一个数据类型
}
equalsIgnoreCase()源码分析
public boolean equalsIgnoreCase(String anotherString) {
return (this == anotherString) ? true //三元运算
: (anotherString != null) //判断一些常见的特殊情况
&& (anotherString.value.length == value.length)
&& regionMatches(true, 0, anotherString, 0, value.length);
} //两个字符串比较完之后的bool值
查看一下regionMatches
如何比较的
public boolean regionMatches(boolean ignoreCase, int toffset,
String other, int ooffset, int len) {
char ta[] = value; //当前对象的数组
int to = toffset;
char pa[] = other.value; //形参的数组
int po = ooffset;
// Note: toffset, ooffset, or len might be near -1>>>1. 注意:toffset、ooffset或len可能接近-1>>>1。
if ((ooffset < 0) || (toffset < 0) //特殊情况的判断
|| (toffset > (long)value.length - len)
|| (ooffset > (long)other.value.length - len)) {
return false;
}
while (len-- > 0) {//为了不影响原始数据,将重新保存的数据逐一对比
char c1 = ta[to++];
char c2 = pa[po++];
if (c1 == c2) {
continue;//相等,则跳过本次循环
}
if (ignoreCase) {//可能因为大小写不同导致的不相等
// If characters don't match but case may be ignored,
// try converting both characters to uppercase.
// If the results match, then the comparison scan should
// continue.
char u1 = Character.toUpperCase(c1);
char u2 = Character.toUpperCase(c2);
if (u1 == u2) {//全部转为大写再比较
continue;
}
// Unfortunately, conversion to uppercase does not work properly
// for the Georgian alphabet, which has strange rules about case
// conversion. So we need to make one last check before
// exiting.
if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) {
continue;//全部转为小写再比较 ?为什么还要一次小写比较
}
}
return false;//一次continue也没有执行:说明无论是大写还是小写均不相等
}
return true;//经过if的筛选后,就会忽略掉大小写
}
private static void learn_compareTo(){
String str = "AB";
String str1 = "ABc";
String str2 = "ABC";
System.out.println(str.compareTo(str1));//-1
System.out.println(str.compareTo(str2));//-1
}
compareTo(str1)源码
public int compareTo(String anotherString) {
int len1 = value.length;
int len2 = anotherString.value.length;//为了不影响原始数据,重新定义数据后再进行操作
int lim = Math.min(len1, len2);
char v1[] = value;
char v2[] = anotherString.value;
int k = 0;
while (k < lim) {
char c1 = v1[k];//重新赋值的数组再进行每个元素的逐一对比,如果不相等就返回它们的Unicode差值;
char c2 = v2[k];//如果相等就进入下一个循环,最后跳出它们的长度差
if (c1 != c2) {
return c1 - c2;
}
k++;
}//也说明了仅仅靠compareTo的返回值来判断各个元素的关系是错误的
return len1 - len2;//因为也可能是长度的差值而不是数值上的差值
}
- 字符串查找【contains, indexOf】
private static void learn_contains(){
String str = "abcdefgh";
System.out.println(str.contains(str)); //true
System.out.println(str.contains("cdef"));//true
System.out.println(str.indexOf("c"));//2
System.out.println(str.indexOf("abd"));//-1
}
五. StringBuffer && StringBuilder
首先来回顾下String类的特点:
任何的字符串常量都是String对象,而且String的常量一旦声明不可改变,如果改变对象内容,改变的是其引用的指 向而已。
通常来讲String的操作比较简单,但是由于String的不可更改特性,为了方便字符串的修改,提供StringBuffer和 StringBuilder类。
StringBuffer 和 StringBuilder 大部分功能是相同的,我们主要介绍StringBuffer 在String中使用"+"来进行字符串连接,但是这个操作在StringBuffer类中需要更改为append()方法:
public synchronized StringBuffer append(各种数据类型 b)
面试题:请解释String、StringBuffer、StringBuilder的区别:
String 是不可变对象,StringBuilder 和 StringBuffer 的内容可以修改
StringBuilder 和 StringBuffer 大部分功能类似的
StringBuffer 采用线程同步处理是安全的;StringBuilder 未采用同步处理,属于不安全操作
个人的一些想法记录,欢迎大家指正!