JDK学习之AbstractStringBuilder接口&&StringBuffer&&StringBuilder

最新推荐文章于 2021-04-05 13:17:13 发布

人非生而知之者

最新推荐文章于 2021-04-05 13:17:13 发布

阅读量524

点赞数 1

分类专栏： JAVASE JDK 文章标签： JAVASE JDK AbStractStringBuilde

本文链接：https://blog.csdn.net/u013815649/article/details/50417216

版权

JAVASE 同时被 2 个专栏收录

2 篇文章 0 订阅

订阅专栏

JDK

2 篇文章 0 订阅

订阅专栏

今天暂时没事，看了码农网上一篇文章关于StringBuiler的StringBuffer，附上原文链接http://www.codeceo.com/article/java-stringbuilder-performance.html，下面是让我非常感激的内容：

初始长度好重要，值得说四次。

StringBuilder的内部有一个char[]，不断的append()就是不断的往char[]里填东西的过程。

new StringBuilder() 时char[]的默认长度是16，然后，如果要append第17个字符，怎么办？

用System.arraycopy成倍复制扩容！！！！

这样一来有数组拷贝的成本，二来原来的char[]也白白浪费了要被GC掉。可以想见，一个129字符长度的字符串，经过了16，32，64, 128四次的复制和丢弃，合共申请了496字符的数组，在高性能场景下，这几乎不能忍。

所以，合理设置一个初始值多重要。

一直以来只知道字符串拼接要用StringBuilder，不要用＋，只知道最肤浅的内容，非常汗颜，决定今天开始慢慢看JDK源码，就从StringBuilder开始吧！

StringBuilder和StringBuffer都继承了AbstractStringBuilder，而AbstractStringBuilder是一个接口并且实现了Appendable和 CharSequence接口，下面先看看这两个接口：

package java.lang;

public interface CharSequence {
    /**
     * Returns the length of this character sequence.  The length is the number
     */
    int length();

     /**
     * Returns the <code>char</code> value at the specified index.  An index ranges from zero
     * to <tt>length() - 1</tt>.  The first <code>char</code> value of the sequence is at
     * index zero, the next at index one, and so on, as for array
     * indexing. </p>
     */
     char charAt(int index);

     /**
     * Returns a new <code>CharSequence</code> that is a subsequence of this sequence.
     * The subsequence starts with the <code>char</code> value at the specified index and
     * ends with the <code>char</code> value at index <tt>end - 1</tt>.  The length
     * (in <code>char</code>s) of the
     * returned sequence is <tt>end - start</tt>, so if <tt>start == end</tt>
     * then an empty sequence is returned. </p>   */
     CharSequence subSequence(int start, int end);

     /**
     * Returns a string containing the characters in this sequence in the same
     * order as this sequence.  The length of the string will be the length of
     * this sequence. </p>
     */
     public String toString();
}

只是选了一些注释，伟大的作者什么的都省了！！毕竟注释太多了吖！这是一个字符序列接口，可以返回序列的长度，某个位置的字符，也可以返回子序列，当然还有toString（）。

下面是Appendable接口：

package java.lang;

import java.io.IOException;

/**
 * An object to which <tt>char</tt> sequences and values can be appended.  The
 * <tt>Appendable</tt> interface must be implemented by any class whose
 * instances are intended to receive formatted output from a {@link
 * java.util.Formatter}.
 */
public interface Appendable {
     /**
     * Appends the specified character sequence to this <tt>Appendable</tt>.
     *
     * <p> Depending on which class implements the character sequence
     * <tt>csq</tt>, the entire sequence may not be appended.  For
     * instance, if <tt>csq</tt> is a {@link java.nio.CharBuffer} then
     * the subsequence to append is defined by the buffer's position and limit.
     *
     * @param  csq
     *         The character sequence to append.  If <tt>csq</tt> is
     *         <tt>null</tt>, then the four characters <tt>"null"</tt> are
     *         appended to this Appendable.
     *
     * @return  A reference to this <tt>Appendable</tt>
     *
     * @throws  IOException
     *          If an I/O error occurs
     */
     Appendable append(CharSequence csq) throws IOException;

     /**
     * Appends a subsequence of the specified character sequence to this
     * <tt>Appendable</tt>.
     *
     * <p> An invocation of this method of the form <tt>out.append(csq, start,
     * end)</tt> when <tt>csq</tt> is not <tt>null</tt>, behaves in
     * exactly the same way as the invocation
     *
     * <pre>
     *     out.append(csq.subSequence(start, end)) </pre>
     *
     * @param  csq
     *         The character sequence from which a subsequence will be
     *         appended.  If <tt>csq</tt> is <tt>null</tt>, then characters
     *         will be appended as if <tt>csq</tt> contained the four
     *         characters <tt>"null"</tt>.
     *
     * @param  start
     *         The index of the first character in the subsequence
     *
     * @param  end
     *         The index of the character following the last character in the
     *         subsequence
     *
     * @return  A reference to this <tt>Appendable</tt>
     *
     * @throws  IndexOutOfBoundsException
     *          If <tt>start</tt> or <tt>end</tt> are negative, <tt>start</tt>
     *          is greater than <tt>end</tt>, or <tt>end</tt> is greater than
     *          <tt>csq.length()</tt>
     *
     * @throws  IOException
     *          If an I/O error occurs
     */
     Appendable append(CharSequence csq, int start, int end) throws IOException;

      /**
     * Appends the specified character to this <tt>Appendable</tt>.
     *
     * @param  c
     *         The character to append
     *
     * @return  A reference to this <tt>Appendable</tt>
     *
     * @throws  IOException
     *          If an I/O error occurs
     */
     Appendable append(char c) throws IOException;
}

注释写的真是炒鸡棒！不忍删。这个接口就是An object to which <tt>char</tt> sequences and values can be appended，Java专门把can be appended中的append拿出来了写了一个接口，感觉非常的细致和规矩，原谅我的表达能力。下面终于到AbstractStringBuilder了，在java中如果要了解某个东西，需要抽丝拨茧，毕竟要继承和实现那么一堆东西！！

容我先吐一口血。。。。。本来还想吐槽这个类的描述，然而这点槽早就消逝在看代码的路途上了。写下笔记：

里面有这么几个方法：ensureCapacity(int minimumCapacity)、ensureCapacityInternal（int minimumCapacity），expandCapacity（int minimumCapacity）。意思就是需要确保当前的容量也就是value.length至少与这个minimumCapacity相等。如果比这个参数小，则这个内部的数组（也就是value）需要重新分配。也就是需要expandCapacity（int minimumCapacity）：

 void expandCapacity(int minimumCapacity) {
        int newCapacity = value.length * 2 + 2;
        if (newCapacity - minimumCapacity < 0)
            newCapacity = minimumCapacity;
        if (newCapacity < 0) {
            if (minimumCapacity < 0) // overflow
                throw new OutOfMemoryError();
            newCapacity = Integer.MAX_VALUE;
        }
        value = Arrays.copyOf(value, newCapacity);
    }

首先把原来的length *2 + 2，如果还小与minimumCapacity，就直接让新的capacity = minimumCapacity,然后利用Arrays.copyOf(value, newCapacity)进行扩展。原本以为不需要再整些别的类中的东西了，真是天真，容我再吐口血，下面是copeOf方法：

public static char[] copyOf(char[] original, int newLength) {
        char[] copy = new char[newLength];
        System.arraycopy(original, 0, copy, 0,
                         Math.min(original.length, newLength));
        return copy;
    }

这个函数首先创建一个newLength的char数组，然后调用arraycopy函数吧原数组中的内容，选择原数组length和newLength中的最小个数的内容，复制到新的数组copy中，然后返回。终于找到这个arraycopy了，去看看这个函数的内部实现：

public static native void arraycopy(Object src, int srcPos,Object dest, int destPos,int length);

找到是一个native方法，怎么办，没事，我们去百度!

http://www.360doc.com/content/14/0713/19/1073512_394157835.shtml

然后找到了这篇文章，意识到这种方法还需要c来实现突然有点小失落，但是毕竟效率在那放着，然而c之后调用了汇编，原谅我的目！瞪！口！呆！！！本来还想深度学习一下来着。。。。。

好了，下一段笔记（关于append）：

在AbstractStringBuilder中append重载了很多，像append Sting类型，StringBuffer类型，CharSequence类型等等，但大多数其实都是一个思想，举个栗子：

public AbstractStringBuilder append(String str) {
       if(str == null) str = "null";
       int len = str.length();
       ensureCapacityInternal(count + len);
       str.getChars(0, len, value, count);
       count += len;
       return this;
}

几乎都是先判空，然后ensureCapacityInternal确保当前数组的容量（该扩展的时候扩展），然后调用getChars函数添加str，我们再来看看这个getChars函数：

public void getChars(int srcBegin, int srcEnd, char[] dst, int dstBegin)
    {
        if (srcBegin < 0)
            throw new StringIndexOutOfBoundsException(srcBegin);
        if ((srcEnd < 0) || (srcEnd > count))
            throw new StringIndexOutOfBoundsException(srcEnd);
        if (srcBegin > srcEnd)
            throw new StringIndexOutOfBoundsException("srcBegin > srcEnd");
        System.arraycopy(value, srcBegin, dst, dstBegin, srcEnd - srcBegin);
    }

就是先判断一个是否有越界，然后就调用了arraycopy这个函数，把value数组从srcBegin位置开始一共数srcEnd-srcBegin个对象复制给dst数组，当然是从dst的dstBegin开始。

对了，其中有个delete函数非常有意思：

public AbstractStringBuilder delete(int start, int end) {
        if (start < 0)
            throw new StringIndexOutOfBoundsException(start);
        if (end > count)
            end = count;
        if (start > end)
            throw new StringIndexOutOfBoundsException();
        int len = end - start;
        if (len > 0) {
            System.arraycopy(value, start+len, value, start, count-end);
            count -= len;
        }
        return this;
    }

它也是调用了arraycopy函数，但是确实是实现了delete的功能，真是精妙（是我太蠢吗），越是学习越是感觉到自己的无知。

在AbstractStringBuilder中重载比较多的函数有一个是insert（）函数，这里选一个比较复杂，看懂这个其他的也就显而易见了：

public AbstractStringBuilder insert(int index, char[] str, int offset,
                                        int len)
    {
        if ((index < 0) || (index > length()))
            throw new StringIndexOutOfBoundsException(index);
        if ((offset < 0) || (len < 0) || (offset > str.length - len))
            throw new StringIndexOutOfBoundsException(
                "offset " + offset + ", len " + len + ", str.length "
                + str.length);
        ensureCapacityInternal(count + len);
        System.arraycopy(value, index, value, index + len, count - index);
        System.arraycopy(str, offset, value, index, len);
        count += len;
        return this;
    }

把str从offset位置开始向value的index位置插入len个字符。JDK中都是先判断是否越界，然后确保数组的容量，第一个arraycopy把value从index位置把value中的对象向后挪len个位置给str，第二个arraycopy把str中需要复制的字符复制过去。

源码中还有一个精妙的reverse（）反转字符串的函数：

public AbstractStringBuilder reverse() {
        boolean hasSurrogate = false;
        int n = count - 1;
        for (int j = (n-1) >> 1; j >= 0; --j) {
            char temp = value[j];
            char temp2 = value[n - j];
            if (!hasSurrogate) {
                hasSurrogate = (temp >= Character.MIN_SURROGATE && temp <= Character.MAX_SURROGATE)
                    || (temp2 >= Character.MIN_SURROGATE && temp2 <= Character.MAX_SURROGATE);
            }
            value[j] = temp2;
            value[n - j] = temp;
        }
        if (hasSurrogate) {
            // Reverse back all valid surrogate pairs
            for (int i = 0; i < count - 1; i++) {
                char c2 = value[i];
                if (Character.isLowSurrogate(c2)) {
                    char c1 = value[i + 1];
                    if (Character.isHighSurrogate(c1)) {
                        value[i++] = c1;
                        value[i] = c2;
                    }
                }
            }
        }
        return this;
    }

最基本的就是把第一个对象个最后一个互换，但是这个函数还判断了一下每个字符是否在Character.MIN_SURROGATE(\ud800)和Character.MAX_SURROGATE(\udfff)之间。如果发现整个字符串中含有这种情况，则再次从头至尾遍历一次，同时判断value[i]是否满足Character.isLowSurrogate()，如果满足的情况下，继续判断value[i+1]是否满足Character.isHighSurrogate()，如果也满足这种情况，则将第i位和第i+1位的字符互换。可能有的人会疑惑，为什么要这么做，因为Java中的字符已经采用Unicode代码，每个字符可以放下一个汉字。为什么还要这么做？
一个完整的 Unicode 字符叫代码点CodePoint，而一个 Java char 叫代码单元 code unit。String 对象以UTF-16保存 Unicode 字符，需要用2个字符表示一个超大字符集的汉字，这这种表示方式称之为 Surrogate，第一个字符叫 Surrogate High，第二个就是 Surrogate Low。具体需要注意的事宜如下：
判断一个char是否是Surrogate区的字符，用Character的 isHighSurrogate()/isLowSurrogate()方法即可判断。从两个Surrogate High/Low 字符，返回一个完整的 Unicode CodePoint 用 Character.toCodePoint()/codePointAt()方法。
一个Code Point，可能需要一个也可能需要两个char表示，因此不能直接使用 CharSequence.length()方法直接返回一个字符串到底有多少个汉字，而需要用String.codePointCount()/Character.codePointCount()。
要定位字符串中的第N个字符，不能直接将N作为偏移量，而需要从字符串头部依次遍历得到，需要用String/Character.offsetByCodePoints() 方法。
从字符串的当前字符，找到上一个字符，也不能直接用offset-- 实现，而需要用 String.codePointBefore()/Character.codePointBefore()，或用 String/Character.offsetByCodePoints()
从当前字符，找下一个字符，不能直接用 offset++实现，需要判断当前 CodePoint的长度后，再计算得到，或用String/Character.offsetByCodePoints()。

（上面那段摘自http://www.jb51.net/article/37399.htm）

本来打算把StringBuilder和StringBuffer写一写的，没想到一个AbstractStringBuilder就把我整的够呛，向Java设计人员们致敬。改题目去，我选择狗带！

额，今天抽空看了下StringBuilder和StringBuffer，原来里面大部分方法都是调用了super也就是AbstractStringBuilder中的方法，StringBuilder和StringBuffer的区别就是由于StringBuffer为了保证在多线程中保证安全，在大部分方法都添加了Java的内置关键字synchronized。没啥。。。。。。。。