Java并发学习(十一)-LongAdder和LongAccumulator探究

本文链接：https://blog.csdn.net/anLA_/article/details/78680080

Java8在atomic包新增了5个类，分别是Striped64，LongAdder，LongAccumulator，DoubleAdder，DoubleAccumulator。其中，Sriped64作为父类，其他分别是long和double的具体实现。
下面首先从父类Striped64这个类开始讲，其几个类都是遵从它的结构进行实现的。

What is Striped64

Striped64，就向一个AtomicLong，里面维持一个volatile的base，还有一个cell数组，cell数组主要是存储线程需要增加或减少的值，它能够将竞争的线程分散到自己内部的私有cell数组里面，所以当并发量很大的时候，线程会被部分分发去访问内部的cell数组。这里先看代码结构：

伪共享

在学习Striped64代码时候，遇到了一个新的知识点，就是伪共享(False Sharing)：

@sun.misc.Contended static final class Cell

Cell的函数头定义是这样的，加了：@sun.misc.Contended 这个有什么用呢？
在cpu执行指令时，通常会把指令载入到缓冲行（cache line）中，cpu的缓存系统中是以缓存行（cache line）为单位存储的，缓存行是2的整数幂个连续字节，一般为32-256个字节。最常见的缓存行大小是64个字节，cache line是cache和memory之间数据传输的最小单元。但是由于系统可能会把两个不同的cpu处理的指令放到一起。
也就是，着一个cache line，既有cpu1所需执行的指令，又有cpu2所需执行的指令。
这样一来，当出现竞争时，如果cpu1获得了所有权，缓存子系统将会使核心2中对应的缓存行失效。当cpu2获得了所有权然后执行更新操作，核心1就要使自己对应的缓存行失效。这会来来回回的经过L3缓存，大大影响了性能。如果互相竞争的核心位于不同的插槽，就要额外横跨插槽连接，问题可能更加严重。

这个问题在Java7中，是通过填充long字段来实现cache line的独立性,，也就是本cache line里面就只有我的变量：

public class VolatileLongPadding {
        volatile long p0, p1, p2, p3, p4, p5, p6;
        volatile long v = 0L;
        volatile long q0, q1, q2, q3, q4, q5, q6;
}

所以，加入@sun.misc.Contended 就是为了解决这个问题，让当前线程执行操作变量处于一个独立的cache line里面。

具体可以参阅：http://ifeve.com/falsesharing/)

Striped64代码结构

@SuppressWarnings("serial")
abstract class Striped64 extends Number {
    /**
     * 用@sun.misc.Contended来杜绝为共享。用来保存冲突时需要增加的格子。cell
     * CAS方式。
     */
    @sun.misc.Contended
    static final class Cell {
        volatile long value;

        Cell(long x) {
            value = x;
        }

        //CAS操作
        final boolean cas(long cmp, long val) {
            return UNSAFE.compareAndSwapLong(this, valueOffset, cmp, val);
        }

        // Unsafe mechanics
        private static final sun.misc.Unsafe UNSAFE;
        private static final long valueOffset;
        static {
            try {
                UNSAFE = sun.misc.Unsafe.getUnsafe();
                Class<?> ak = Cell.class;
                valueOffset = UNSAFE.objectFieldOffset(ak.getDeclaredField("value"));
            } catch (Exception e) {
                throw new Error(e);
            }
        }
    }

    /** Number of CPUS, to place bound on table size
     * cpu的个数，绑定的table。 */
    static final int NCPU = Runtime.getRuntime().availableProcessors();

    //cells数组，大小为2的倍数
    transient volatile Cell[] cells;

    //基础的值。不冲突下直接在base上增加,通过CAS更改。
    transient volatile long base;

    //判断cells是否有线程在使用的变量，通过CAS去锁定。
    transient volatile int cellsBusy;
    ...

上述代码主要变量：
- cells:用于存储冲突的线程
- base:基础的数值，当没有冲突或冲突很少时，就会在base上操作，而不用加入cell，也就是AtomicLong的原理
- cellsBusy:判断cells是否被一个线程使用了，如果一个线程使用，就不自旋一次换个probe来进行。

Striped64里面有两个主要的方法：longAccumulate和doubleAccumulate ，两个方法方法实现几乎一模一样，这里主要以longAccumulate为例分析：

先说说主要思路，如果能够通过CAS修改base成功，那么直接退出(并发里量不大)，否则去cells里面占一个非空的坑(并发量大)，并把要操作的值赋值保存在一个Cell里面。

所以这样，只能去寻找没有值的cell，如果有值，就说明以前有线程也是由于竞争base失败从而来cells这里报个到，到时候当去总数时才能加自己。

接下来看具体实现代码：

    /**
     * 里面可以重新改变table大小，或者创建新的cells。
     * @param x 增加的long值
     * @param fn 函数式编程，代表一个一个待执行操作的函数
     * @param wasUncontended
    final void longAccumulate(long x, LongBinaryOperator fn, boolean wasUncontended) {
        int h;
        if ((h = getProbe()) == 0) {
            //如果当前线程没有初始化，就初始化下当前线程。
            ThreadLocalRandom.current(); // force initialization
            h = getProbe();
            wasUncontended = true;
        }
        boolean collide = false; // True if last slot nonempty
        for (;;) {
            Cell[] as;
            Cell a;
            int n;
            long v;
            if ((as = cells) != null && (n = as.length) > 0) {
                //cell有值。
                if ((a = as[(n - 1) & h]) == null) {
                    //进入这个方法，就说明这个位置没线程，所以你可以进来。进来后再看能不能获取到cell锁。
                    if (cellsBusy == 0) { // Try to attach new Cell
                        //新建一个cell，并且尝试加进去
                        Cell r = new Cell(x); // Optimistically create
                        if (cellsBusy == 0 && casCellsBusy()) {
                            boolean created = false;
                            try { // Recheck under lock
                                Cell[] rs;
                                int m, j;
                                if ((rs = cells) != null && (m = rs.length) > 0 && rs[j = (m - 1) & h] == null) {
                                    rs[j] = r;
                                    created = true;
                                }
                            } finally {
                                cellsBusy = 0;
                            }
                            if (created)
                                break;
                            continue; // Slot is now non-empty
                        }
                    }
                    collide = false;  
                } else if (!wasUncontended) // CAS already known to fail
                    wasUncontended = true; // Continue after rehash
                else if (a.cas(v = a.value, ((fn == null) ? v + x : fn.applyAsLong(v, x))))
                    break;
                else if (n >= NCPU || cells != as)
                    collide = false; // At max size or stale
                else if (!collide)
                    collide = true;
                else if (cellsBusy == 0 && casCellsBusy()) {
                    try {
                        //扩容操作。
                        if (cells == as) { // Expand table unless stale
                            Cell[] rs = new Cell[n << 1];
                            for (int i = 0; i < n; ++i)
                                rs[i] = as[i];
                            cells = rs;
                        }
                    } finally {
                        cellsBusy = 0;
                    }
                    collide = false;
                    continue; // Retry with expanded table
                }
                h = advanceProbe(h);
            } else if (cellsBusy == 0 && cells == as && casCellsBusy()) {
                //这是cell初始化的过程
                //直接修改base不成功，所以来修改cells做文章。
                //cell为null，但是cellsBusy=0，但是有，加入一个cell中。
                boolean init = false;
                try { // Initialize table
                    if (cells == as) {
                        Cell[] rs = new Cell[2];             //最开始cells的大小为2
                        rs[h & 1] = new Cell(x);             //给要增加的x，做一个cell的坑。
                        cells = rs;
                        init = true;
                    }
                } finally {
                    //释放锁
                    cellsBusy = 0;
                }
                if (init)
                    break;
            } else if (casBase(v = base, ((fn == null) ? v + x : fn.applyAsLong(v, x))))

                //cell为null并且cellsBusy为1，也就是说，现在有人用cells，我就去尝试更新base吧，接用CAS这个base来实现
                break; // Fall back on using base
        }
    }

在doubleAccumulate中，基本代码与longAccumulate一致，但是，Cell里面定义的value是long类型的，所以代码中进行了如下转化，把double保存在long中：

Cell r = new Cell(Double.doubleToRawLongBits(x))

接下来看两组具体实现的子类。

LongAdder和LongAccumulator

按照Doug Lea的描述，LongAdder性能优于AtomicLong，从上面分析可以知道，当并发量不高时，其实二者性能差不多，但是一旦并发量超过一定限度，毫无疑问CAS会失败，对于AtomicLong则没有其他解决方法，而LongAdder则可以通过cells数组来进行部分的“分流”操作。

进入正题，说说LongAdder，首先看看它里面方法：

public void add(long x)：增加x
public void increment() ：自增
public void decrement() ：自减
public long sum() ：求和
public void reset()：重置cell数组
public long sumThenReset()：求和并重置

这里主要看看add方法，其他都好理解：

    public void add(long x) {
        Cell[] as; long b, v; int m; Cell a;
        if ((as = cells) != null || !casBase(b = base, b + x)) {
        //如果cells不为null，或者CAS base变量失败，说明冲突了，
        //置uncontended为true
            boolean uncontended = true;
            if (as == null || (m = as.length - 1) < 0 ||
                (a = as[getProbe() & m]) == null ||
                !(uncontended = a.cas(v = a.value, v + x)))
                //当as为null，或as的长度小于等于1，或a为null，，

                //判断符合调用父类方法进行相加操作
                longAccumulate(x, null, uncontended);
        }
    }

思路还是很简单的，而对于LongAccumulator，则是支持各种自定义运算的long运算。因为传入了一个方法：

    public LongAccumulator(LongBinaryOperator accumulatorFunction,
                           long identity) {
        this.function = accumulatorFunction;
        base = this.identity = identity;
    }

程序员可以自己定义二元的LongBinaryOperator并进行运算。LongAdder是相加，则LongAccumulator则是进行自己定义的操作。

DoubleAdder和DoubleAccumulator

对于DoubleAdder和DoubleAccumulator，则是将其转化为long类型后进行运算的：

//long转为double
public static native double longBitsToDouble(long bits);
//double转为long
public static native long doubleToRawLongBits(double value);

具体实现思路与long类型实现一致，不再赘述。

参考资料：
JDK1.8
http://ifeve.com/falsesharing/
http://ifeve.com/java8-striped64-and-longadder/
http://ifeve.com/atomiclong-and-longadder/
http://blog.csdn.net/zqz_zqz/article/details/70665941
http://budairenqin.iteye.com/blog/2048257