LongAdder源码分析

最新推荐文章于 2022-08-16 17:25:08 发布

God works

最新推荐文章于 2022-08-16 17:25:08 发布

阅读量164

点赞数 1

分类专栏： study 文章标签： java 多线程

本文链接：https://blog.csdn.net/qq_43689029/article/details/113954574

版权

study 专栏收录该内容

16 篇文章 0 订阅

订阅专栏

LongAdder源码分析

LongAdder是个啥
AtomicLong和LongAdder多线程环境下做累加操作性能对比
LongAdder底层原理分析
LongAdder源码分析

LongAdder是个啥

见名知义我们能知道这是个单位为Long也就是8个字节的累加器，另外它是在多线程环境下安全的累加器，所以说它底层要么用的锁要么用的CAS对吧，如果直接用lock或者synchronized的话细粒度太粗了，所以没错它底层用的CAS。提到CAS，这里又是Long，那么我们很容易想到有个类叫AtomicLong，那这两个类有什么关系呢？

另外提一嘴，CAS操作是不保证可见性的，所以不管是LongAdder还是AtomicLong，底层的共享变量都是有加volatile进行修饰的，具体证明可以自行编写代码进行验证。还有就是CAS如果不知道是啥的话可以看看Synchronized锁升级中的预备知识。

我们先思考一下，LongAdder的功能是什么？是多线程环境下进行安全的累加对吧，那AtomicLong也可以做到多线程环境下安全的累加啊，那为什么还要用LongAdder呢？LongAdder比AtomicLong在哪个地方要🐂了？这是我们主要要关注的。

AtomicLong和LongAdder多线程环境下做累加操作性能对比

我们先来看看如何使用AtomicLong在多线程环境下进行累加操作，这里是设计了4个线程，每个线程循环50次的加1操作，所以最后结果应该是200。

public void addAtomicLong() {
        AtomicLong atomicLong = new AtomicLong(0);
        List<Thread> list = new ArrayList<>(4);
        long start = System.nanoTime();
        // 4个线程，每个线程加50
        for (int i = 0; i < 4; i++) {
            Thread t = new Thread(() -> {
                for (int j = 0; j < 50; j++) {
                    atomicLong.addAndGet(1);
                }
            });
            list.add(t);
            t.start();
        }
        for (Thread t : list) {
            try {
                t.join();
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
        long end = System.nanoTime();
        System.out.println("总耗时：" + (end - start) / 1000_000 + "ms");
        System.out.println("累加结果：" + atomicLong.get());
    }

运行结果：

总耗时：270ms
累加结果：200

我们再来看看LongAdder在同样4个线程循环50次加1的情况下耗时多少时间。

public void addLongAdder() {
        LongAdder longAdder = new LongAdder();
        List<Thread> list = new ArrayList<>(4);
        long start = System.nanoTime();
        // 4个线程，每个线程加50
        for (int i = 0; i < 4; i++) {
            Thread t = new Thread(() -> {
                for (int j = 0; j < 50; j++) {
                    longAdder.add(1);
                }
            });
            list.add(t);
            t.start();
        }
        for (Thread t : list) {
            try {
                t.join();
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
        long end = System.nanoTime();
        System.out.println("总耗时：" + (end - start) / 1000_000 + "ms");
        System.out.println("累加结果：" + longAdder.sum());
    }

运行结果：

总耗时：2ms
累加结果：200

经过多次运行后都能得到AtomicLong的总耗时远大于LongAdder的结论。这是因为AtomicLong的底层是使用死循环去进行CAS修改值的，只有在修改成功后才会跳出循环，这样的死循环是非常消耗时间的，而LongAdder底层虽然也是CAS操作，而且也是使用死循环去进行相加操作，那为什么LongAdder就比AtomicLong快这么多呢？这就需要我们去看LongAdder源码了。

注意：这里不分析AtomicLong的底层源码，只起抛砖引玉的作用，具体请自行查看AtomicLong源码。

LongAdder底层原理分析

public class LongAdder extends Striped64 implements Serializable

我们能发现LongAdder是继承了Striped64这个类的，Striped64是jdk8添加的用来支持累加器的一个组件，所以说其实LongAdder累加器的核心是Striped64这个组件，而Striped64的核心是Cell类：

@sun.misc.Contended static final class Cell {
        volatile long value;
        Cell(long x) { value = x; }
        final boolean cas(long cmp, long val) {
            return UNSAFE.compareAndSwapLong(this, valueOffset, cmp, val);
        }

        // Unsafe mechanics
        private static final sun.misc.Unsafe UNSAFE;
        private static final long valueOffset;
        static {
            try {
                UNSAFE = sun.misc.Unsafe.getUnsafe();
                Class<?> ak = Cell.class;
                valueOffset = UNSAFE.objectFieldOffset
                    (ak.getDeclaredField("value"));
            } catch (Exception e) {
                throw new Error(e);
            }
        }
    }

事实上Striped64类底层维护了一个cells就是Cell数组，每次一个线程来进行相加操作时对cells的某个cell的值进行相加，最后再把每个cells中的cell的值进行相加就可以了，这就是为什么LongAdder比AtomicLong快的原因之一，因为很明显AtomicLong等同于对一个cell进行操作，而LongAdder是对多个cell进行操作，当一个线程对cell[0]进行相加操作的时候并不妨碍另一个线程对cell[1]进行操作对吧。
上面说cells数组是快的原因之一，那么还有另一个原因是什么呢？
不知道你有没有注意到@sun.misc.Contended这个注解，这个注解是让一个cell独占一个缓存行，一个缓存行一般是64个字节，CPU按缓存行为单位来处理数据，CPU有三级缓存，L1 L2 L3，结构如下图（图源自马士兵老师的PPT）：
存储器结构
所以当一个CPU修改了L1缓存中的一个缓存行的数据后，另一个CPU的这个缓存行也就无效了，这样一个CPU随便修改一个数据，另一个CPU就要重新去内存中获取数据，就会浪费很多时间，所以这里其实也是用空间换时间。

注意：缓存行这里我说的不是很清楚，具体推荐去看看马士兵老师讲的一节课，讲的很好，附上BV号：BV1Bp4y1W7Qb，第三个视频。

LongAdder源码分析

我们先看这张图：
LongAdder
我们重点是add(long x)方法：

public void add(long x) {
        Cell[] as; long b, v; int m; Cell a;
        // (as = cells) != null 表示cells数组已经被其他线程创建了
        // !casBase(b = base, b + x) 表示先尝试修改一下基本值，如果有竞争的话就会修改失败
        if ((as = cells) != null || !casBase(b = base, b + x)) {
            // uncontended表示无竞争的
            boolean uncontended = true;
            // as == null 说明cells数组没有被创建，也就是说修改基本值的时候失败了
            // (m = as.length - 1) < 0 说明cells数组还没有初始化
            // (a = as[getProbe() & m]) == null getProbe()是根据线程获取一个值，用这个值来进行hash并且这个桶还没有被创建
            // !(uncontended = a.cas(v = a.value, v + x)) 是尝试一次cas去修改这个桶里的值，如果修改失败就说明有其他线程竞争修改这个桶的值
            if (as == null || (m = as.length - 1) < 0 ||
                (a = as[getProbe() & m]) == null ||
                !(uncontended = a.cas(v = a.value, v + x)))
                // 核心方法
                longAccumulate(x, null, uncontended);
        }
    }

然后我们点进去看看longAccumulate(x, null, uncontended)，因为如果有其他线程竞争或者cells还没有被创建都会进入到这个方法：

final void longAccumulate(long x, LongBinaryOperator fn,
                              boolean wasUncontended) {
        int h;
        if ((h = getProbe()) == 0) {
            ThreadLocalRandom.current(); // force initialization
            h = getProbe();
            wasUncontended = true;
        }
        boolean collide = false;                // True if last slot nonempty
        for (;;) {
            Cell[] as; Cell a; int n; long v;
            // 进入这个if说明累加单元数组cells已经被创建并且不为空
            if ((as = cells) != null && (n = as.length) > 0) {
                // 进入这个if说明要进行累加的累加单元还不存在
                if ((a = as[(n - 1) & h]) == null) {
                    // 还没有加锁
                    if (cellsBusy == 0) {       // Try to attach new Cell
                        // 本线程创建新的累加单元并将要累加的值传入
                        Cell r = new Cell(x);   // Optimistically create
                        // 还没有加锁并且本线程加锁成功
                        if (cellsBusy == 0 && casCellsBusy()) {
                            // 假设还没有创建累加单元
                            boolean created = false;
                            try {               // Recheck under lock            锁定下重新检查
                                Cell[] rs; int m, j;
                                // 在锁定的情况下再次检查没有别的线程新建了累加单元
                                if ((rs = cells) != null &&
                                    (m = rs.length) > 0 &&
                                    rs[j = (m - 1) & h] == null) {
                                    // 将之前本线程创建的累加单元放入
                                    rs[j] = r;
                                    // 已经创建了累加单元了
                                    created = true;
                                }
                            } finally {
                                cellsBusy = 0;
                            }
                            // 因为已经创建了并且放入了值，所以直接推出就可以
                            // 如果为false说明在加锁前就有别的线程创建了累加单元了，所以本线程就需要进行下一轮循环
                            if (created)
                                break;
                            continue;           // Slot is now non-empty         插槽现在非空
                        }
                    }
                    collide = false;
                }
                else if (!wasUncontended)       // CAS already known to fail
                    wasUncontended = true;      // Continue after rehash
                // 累加成功，直接跳出死循环
                else if (a.cas(v = a.value, ((fn == null) ? v + x :
                                             fn.applyAsLong(v, x))))
                    break;
                // cells数组的长度已经大于等于物理机的CPU数量了 或者 cells被改变了
                else if (n >= NCPU || cells != as)
                    collide = false;            // At max size or stale
                // 这个判断的目的是为了防止进行下一步的扩容
                else if (!collide)
                    collide = true;
                // 加锁
                else if (cellsBusy == 0 && casCellsBusy()) {
                    try {
                        // cells还没有被修改
                        if (cells == as) {      // Expand table unless stale
                            // 新创建rs，长度是cells的两倍
                            Cell[] rs = new Cell[n << 1];
                            // 将cells的每个桶的值赋给rs
                            for (int i = 0; i < n; ++i)
                                rs[i] = as[i];
                            // 将rs赋值给cells
                            cells = rs;
                        }
                    } finally {
                        // 解锁
                        cellsBusy = 0;
                    }
                    collide = false;
                    continue;                   // Retry with expanded table
                }
                // 给h重新赋值 == rehash
                h = advanceProbe(h);
            }
            // 到这里说明还没有被上锁 并且 cells还没有被初始化 并且 别的线程也还没有初始化cells 并且上锁成功
            else if (cellsBusy == 0 && cells == as && casCellsBusy()) {
                // 还没初始化
                boolean init = false;
                try {                           // Initialize table
                    // cells还没有被其他线程修改
                    if (cells == as) {
                        // 新建cell数组rs
                        Cell[] rs = new Cell[2];
                        // 新建累加单元：rs[0]或者rs[1]，并将要累加的值传入新建的累加单元
                        rs[h & 1] = new Cell(x);
                        // 将rs赋值给cells
                        cells = rs;
                        // 已初始化
                        init = true;
                    }
                } finally {
                    // 解锁
                    cellsBusy = 0;
                }
                // 如果这个线程初始化成功了说明已经要把累加的值放入cells了，就推出死循环
                if (init)
                    break;
            }
            // 到这里说明没有竞争了，就直接修改base这个基本值
            else if (casBase(v = base, ((fn == null) ? v + x :
                                        fn.applyAsLong(v, x))))
                break;                          // Fall back on using base
        }
    }