【Java】java.lang.String类源代码翻译及解析1

最新推荐文章于 2022-12-28 21:37:17 发布

陶洲川

最新推荐文章于 2022-12-28 21:37:17 发布

阅读量1.2k

点赞数 2

分类专栏：【基础】【JavaEE】文章标签： String 字符串 strings Java lang

【基础】同时被 2 个专栏收录

229 篇文章 1 订阅

订阅专栏

【JavaEE】

55 篇文章 0 订阅

订阅专栏

String类表示一个字符串（character strings）。Java programs中的所有的string字母，像”abc”，都是作为该类的实例来实现的；

Strings是一个恒定的，不变的；在被创建后，他们的值不允许被改变；String buffers支持值可变的strings；

因为String对象是不可变的，所以他们可以被共享；例如

String str="abc";

就等价于

char data[] = {'a','b','c'};
String str = new String(data);

这里有一些更多的例子关于strings是如何使用的：

     System.out.println("abc");
     String cde = "cde";
     System.out.println("abc" + cde);
     String c = "abc".substring(2,3);
     String d = cde.substring(1, 2);

String类包含独立的顺序的字母方法检查，比较strings，查询strings，substring，创建一个复制string的大写，或者小写；因为映射是基于Unicode标准版本的，尤其是java.lang.Character类；

Java语言提供特殊的string串联操作支持，并且提供转换其他类为strings；String串联实现了StringBuilder和StringBuffer类和其方法；
String类转换实现了toString方法，通过Object定义并且被Java中的所有类继承；此外，string串联和转换的信息可以在Gosling，Joy和Steele中查看；

一个String类代表了一个UTF-16格式的字符串，这个字符串补充了被替代的字符对；

下标的值表示char的单元编码，因此，一个追加的字符在String中使用两个位置；

String类提供一个处理Unicode编码点的方法，例如character，此外还适用于对于处理Unicode编码单元，例如char；

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence {
    /** 该值用于字符串存储. */
    private final char value[];

    /** string的hash编码缓存 */
    private int hash; // 默认是0

    /** 从JDK1.0.2版本开始，使用序列版本UID来增强互用性*/
    private static final long serialVersionUID = -6849794470754667710L;

    /**
     *String类在序列化流协议中是特殊的
     * 一个String实例被写入进一个ObjectOutputStream 根据
     * <a href="{@docRoot}/../platform/serialization/spec/output.html">
     * 对象序列化规范, 6.2部分, 名为："Stream Elements"</a>
     */
    private static final ObjectStreamField[] serialPersistentFields =
        new ObjectStreamField[0];

    /**
     * 初始化一个新建的String对象，因此它代表一个空的字符序列注意使用构造器是非必须的
     * 因为String是一成不变的；
     */
    public String() {
        this.value = "".value;
    }

    /**
     * 初始化一个新创建的String对象，因此它代表与参数相同的字符序列
     * 换句话说，新创建的string是参数string. 除非明确了是original副本
     * 使用构造器是非必须的，因为Strings类是不可变化的；
     *
     * @param  original
     *         A {@code String}
     */
    public String(String original) {
        this.value = original.value;
        this.hash = original.hash;
    }

    /**
     * 分配一个新建的String类，因此他代表包含字符数据参数的暂时的字符的序列
     * 字符数组的内容是被赋值过的，随后的修改的字符串数组不会影响刚刚新建的string
     * @param  value
     *         The initial value of the string
     */
    public String(char value[]) {
        this.value = Arrays.copyOf(value, value.length);
    }

    /**
     * 分配一个新建的包含从字符串数据参数的子数组的字符String类
     * offset参数是子数组的第一个字符的下标 count参数指定了子数组的长度
     * 子数组副本的内容是被复制过的
     * 随后字符数组的修改不会影响新建的string；
     * @param  value
     *         Array that is the source of characters
     *
     * @param  offset
     *         The initial offset
     *
     * @param  count
     *         The length
     *
     * @throws  IndexOutOfBoundsException
     *          If the {@code offset} and {@code count} arguments index
     *          characters outside the bounds of the {@code value} array
     */
    public String(char value[], int offset, int count) {
        if (offset < 0) {
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count <= 0) {
            if (count < 0) {
                throw new StringIndexOutOfBoundsException(count);
            }
            if (offset <= value.length) {
                this.value = "".value;
                return;
            }
        }
        // Note: offset or count might be near -1>>>1.
        if (offset > value.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }
        this.value = Arrays.copyOfRange(value, offset, offset+count);
    }

    /**
     * 分配一个新建的包含Unicode编码的子数组字符的参数的String类
     * offset参数是子数组的第一个编码点的下标；count参数指定了子数组的长度
     * 子数组的内容被转换成char字符
     *  随后int数组的修改不会影响新创建的string
     * @param  codePoints
     *         Array that is the source of Unicode code points
     *
     * @param  offset
     *         The initial offset
     *
     * @param  count
     *         The length
     *
     * @throws  IllegalArgumentException
     *          If any invalid Unicode code point is found in {@code
     *          codePoints}
     *
     * @throws  IndexOutOfBoundsException
     *          If the {@code offset} and {@code count} arguments index
     *          characters outside the bounds of the {@code codePoints} array
     *
     * @since  1.5
     */
    public String(int[] codePoints, int offset, int count) {
        if (offset < 0) {
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count <= 0) {
            if (count < 0) {
                throw new StringIndexOutOfBoundsException(count);
            }
            if (offset <= codePoints.length) {
                this.value = "".value;
                return;
            }
        }
        // 注意: offset或 count 可能接近-1>>>1.
        if (offset > codePoints.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }

        final int end = offset + count;

        // 过程1: 精确计算char[]的大小
        int n = count;
        for (int i = offset; i < end; i++) {
            int c = codePoints[i];
            if (Character.isBmpCodePoint(c))
                continue;
            else if (Character.isValidCodePoint(c))
                n++;
            else throw new IllegalArgumentException(Integer.toString(c));
        }

        // 过程2: 分配和填充char[]
        final char[] v = new char[n];

        for (int i = offset, j = 0; i < end; i++, j++) {
            int c = codePoints[i];
            if (Character.isBmpCodePoint(c))
                v[j] = (char)c;
            else
                Character.toSurrogates(c, v, j++);
        }

        this.value = v;
    }

    /**
     * 分配一个新建的String，从一个8位的integer类型的值的子数组构造而来；
     * offset参数是子数组的第一个字节的下标；count参数指定了子数组的长度；
     * 每一个子数组的字节被转换成char因为被上面的方法指定了；
     * @deprecated：这个方法不是正式的转换字节为字符，所以弃用； 
     * 因为JDK1.1, 最优先的是通过String的字符串集合类，字符集合名或者使用平台默认的
     * 字符集合的构造器来创建；
     * @param  ascii
     *         The bytes to be converted to characters
     *
     * @param  hibyte
     *         The top 8 bits of each 16-bit Unicode code unit
     *
     * @param  offset
     *         The initial offset
     * @param  count
     *         The length
     *
     * @throws  IndexOutOfBoundsException
     *          If the {@code offset} or {@code count} argument is invalid
     *
     * @see  #String(byte[], int)
     * @see  #String(byte[], int, int, java.lang.String)
     * @see  #String(byte[], int, int, java.nio.charset.Charset)
     * @see  #String(byte[], int, int)
     * @see  #String(byte[], java.lang.String)
     * @see  #String(byte[], java.nio.charset.Charset)
     * @see  #String(byte[])
     */
    @Deprecated
    public String(byte ascii[], int hibyte, int offset, int count) {
        checkBounds(ascii, offset, count);
        char value[] = new char[count];

        if (hibyte == 0) {
            for (int i = count; i-- > 0;) {
                value[i] = (char)(ascii[i + offset] & 0xff);
            }
        } else {
            hibyte <<= 8;
            for (int i = count; i-- > 0;) {
                value[i] = (char)(hibyte | (ascii[i + offset] & 0xff));
            }
        }
        this.value = value;
    }
    ...(未完待续)
}

检查是否越界方法：判读由一个byte[]中截取来的String是否可以截取过来：

//测试类
public class StringTest {
    public static void main(String[] args) {

        byte[] byteArray= new byte[]{'a','b','c'};
        checkBounds(byteArray,1,1);

    }
//java.lang.String类中的源码
    private static void checkBounds(byte[] bytes, int offset, int length) {
        if (length < 0)
            throw new StringIndexOutOfBoundsException(length);
        if (offset < 0)
            throw new StringIndexOutOfBoundsException(offset);
        if (offset > bytes.length - length)
            throw new StringIndexOutOfBoundsException(offset + length);
    }
}