String源码学习

最新推荐文章于 2019-05-03 14:31:46 发布

我叫什么名

最新推荐文章于 2019-05-03 14:31:46 发布

阅读量277

点赞数

分类专栏： java 文章标签：源码

本文链接：https://blog.csdn.net/u012536192/article/details/53573465

版权

java 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

最近在很多博客和论坛上看到String的一些详解，才发现自己对String的认识只停留在final 中，实在是太渣了，严重反省！！！还是得花时间整理一下，加深印象，并且看看源码会发现一些神奇的事儿，好啦，开始整理了。

String 类实现的接口

首先String类是final，也就是说不可改变，所以频繁赋值、拼接操作实际上会产生新的String对象。

java.io.Serializable
实现序列化接口，即String对象可被序列化为字节序列，这些字节序列可以被完全存储以备以后重新生成原来的对象。(深拷贝)

Comparable
实现了 Comparable接口，即两个String可比较。String类中实现了 int compareTo(String anotherString)方法，对两个String对象比较。

public int compareTo(String anotherString) {
    int len1 = value.length;
    int len2 = anotherString.value.length;
    int lim = Math.min(len1, len2);//取最短的string对象长度比较
    char v1[] = value;
    char v2[] = anotherString.value;

    int k = 0;
    while (k < lim) {//逐一比较每个字符
        char c1 = v1[k];
        char c2 = v2[k];
        if (c1 != c2) {
            return c1 - c2; 
        }
        k++;
    }
    return len1 - len2; //前面如果字符都相等，比较长度
     }

CharSequence
刚开始看到这个接口的时候，还不知道这个接口有啥用，从字面意思上来说是字符序列。值得注意的是String, StringBuilder和StringBuffer都实现了该接口，并且这三个本质上就是通过char数组实现的。
1. int length();
  在String中实现该方法
  public int length() { return value.length; //数组value[]的长度 }
2. char charAt(int index);
  在Sting类中实现该方法，作用是，返回string中第n个字符
  public char charAt(int index) { if ((index < 0) || (index >= value.length)) { throw new StringIndexOutOfBoundsException(index); } return value[index]; }
3. CharSequence subSequence(int start, int end);
  在String类中实现该方法
  public CharSequence subSequence(int beginIndex, int endIndex) { return this.substring(beginIndex, endIndex); }

String的构造函数

在了解其构造函数前，先看看String类的成员变量。
/* The value is used for character storage. /
private final char value[];
//String本质上就是char数组，值得注意的是，String是final类型，所以要求char也应该是final

/* Cache the hash code for the string /
private int hash; // Default to 0

1 无参构造。什么都没有传入的情况下,默认Sting的长度为1，并且只有”字符。

  public String() {
        this.value = new char[0];
    }

2 使用String对象创建新的String对象

   public String(String original) {
        this.value = original.value;
        this.hash = original.hash;
    }

3 通过char数组，复制新的char数组

   public String(char value[]) {
        this.value = Arrays.copyOf(value, value.length);
    }

4 复制一段char数组的值，从数组下标offset开始，提取count个字符

    public String(char value[], int offset, int count) {
        if (offset < 0) {
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count < 0) {
            throw new StringIndexOutOfBoundsException(count);
        }
        // Note: offset or count might be near -1>>>1.
        if (offset > value.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }
        this.value = Arrays.copyOfRange(value, offset, offset+count);
    }

5 int []数组方式
你可能会疑惑，为什么使用int[]，是要将int转换成string 吗？看源码你会发现，其实是将unicode代码点转换成对应的字符。
例如int[] ary ={65,66,67,68,97,97};
String str = new String(ary, 2, 3);
这个就是从数组里面index为2的下标开始，拿3个元素出来，取其每个元素对应的char，然后组成一个新的字符串。
上面的例子就是拿67,68,97,他们对应的char为’C’,’D’,’a’

      public String(int[] codePoints, int offset, int count) {
        if (offset < 0) {
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count < 0) {
            throw new StringIndexOutOfBoundsException(count);
        }
        // Note: offset or count might be near -1>>>1.
        if (offset > codePoints.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }

        final int end = offset + count;

        // Pass 1: Compute precise size of char[]
        int n = count;
        for (int i = offset; i < end; i++) {
            int c = codePoints[i];
            if (Character.isBmpCodePoint(c))
                continue;
            else if (Character.isValidCodePoint(c))
                n++;
            else throw new IllegalArgumentException(Integer.toString(c));
        }

        // Pass 2: Allocate and fill in char[]
        final char[] v = new char[n];

        for (int i = offset, j = 0; i < end; i++, j++) {
            int c = codePoints[i];
            if (Character.isBmpCodePoint(c))
                v[j] = (char)c;
            else
                Character.toSurrogates(c, v, j++);
        }

        this.value = v;
    }

以上源代码提及了Character.isBmpCodePoint(c) 和Character.isValidCodePoint(c)，这两个分别是什么意思呢？
这个就要涉及到unicode编码了，相关资料可参看http://blog.csdn.net/mazhimazh/article/details/17708001
总之，就是unicode的有效代码点在U+0000到U+10FFFF之间，超过这个范围就无法表示一个unicode字符。一个char类型字符是16位的，代码点在U+0000到U+FFFF之间，可用char表示一个完整字符，但是在这个范围外，就无法用char表示一个字符了。但是增补字符是代码点在 U+10000 至 U+10FFFF 范围之间的字符，也就是那些使用原始的 Unicode 的 16 位设计无法表示的字符。因此，每一个 Unicode 字符要么属于 BMP（BMP Basic Multilingual Plane 基本多语言面），要么属于增补字符。
所以以上源代码，使用Character.isBmpCodePoint(c)和Character.isValidCodePoint(c)处理unicode字符。
6 public String(byte ascii[], int hibyte, int offset, int count)已作废。这个方法已过时。该方法无法将字节正确地转换为字符(可能原因是，byte无法辨别存入的是什么编码，而以下编码默认是使用unicode的编码，这样无法正确转为字符)。我们来看看源码

 //hibyte 每个 16 位 Unicode 代码单元的前 8 位
    @Deprecated
    public String(byte ascii[], int hibyte, int offset, int count) {
        checkBounds(ascii, offset, count);
        char value[] = new char[count];

        if (hibyte == 0) {
            for (int i = count; i-- > 0;) {
                value[i] = (char)(ascii[i + offset] & 0xff);
                //因为byte为8位，&0xff取有效的八位转为char
            }
        } else {
            hibyte <<= 8;
            for (int i = count; i-- > 0;) {
                value[i] = (char)(hibyte | (ascii[i + offset] & 0xff));
                //这样的效果是，将unicode高八位和byte的八位合起来
            }
        }
        this.value = value;
    }

7 使用String charsetName字符集对byte解码

   public String(byte bytes[], int offset, int length, String charsetName)
            throws UnsupportedEncodingException {
        if (charsetName == null)
            throw new NullPointerException("charsetName");
        checkBounds(bytes, offset, length);
        this.value = StringCoding.decode(charsetName, bytes, offset, length);
    }
 // 相同的构造方法还有
 public String(byte bytes[], String charsetName)
            throws UnsupportedEncodingException {
        this(bytes, 0, bytes.length, charsetName);
    }

8 与7类似的是，使用Charset charse字符集，对byte解码

        public String(byte bytes[], int offset, int length, Charset charset) {
        if (charset == null)
            throw new NullPointerException("charset");
        checkBounds(bytes, offset, length);
        this.value =  StringCoding.decode(charset, bytes, offset, length);
    }
//相同的构造方法还有
    public String(byte bytes[], int offset, int length, Charset charset) {
        if (charset == null)
            throw new NullPointerException("charset");
        checkBounds(bytes, offset, length);
        this.value =  StringCoding.decode(charset, bytes, offset, length);
    }

9 还有不标明使用字符集类型的，默认使用ISO-8859-1

     public String(byte bytes[], int offset, int length) {
        checkBounds(bytes, offset, length);
        this.value = StringCoding.decode(bytes, offset, length);
    }
    //相同的还有
    public String(byte bytes[]) {
        this(bytes, 0, bytes.length);
    }

10 支持StringBuffer，不过需要线程安全的构造

public String(StringBuffer buffer) {
        synchronized(buffer) {
            this.value = Arrays.copyOf(buffer.getValue(), buffer.length());
        }
    }

11 支持StringBuilder

public String(StringBuilder builder) {
  this.value = Arrays.copyOf(builder.getValue(),builder.length());
    }

我叫什么名

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
String源码学习

String 源码学习
复制链接

扫一扫

专栏目录