Netty源码性能分析MpscChunkedArrayQueue & MpscUnboundedArrayQueue & MpscArrayQueue & MpscLinkedAtomicQueue

柳擎

已于 2023-01-18 10:30:21 修改

阅读量868

点赞数 2

分类专栏： Netty 文章标签： java 算法

于 2023-01-15 13:28:21 首次发布

自己的博客

本文链接：https://blog.csdn.net/quyixiao/article/details/128647122

版权

Netty 专栏收录该内容

7 篇文章 2 订阅

订阅专栏

序：

写这篇博客的原因，在阅读Netty源码时，发现Netty并没有使用JDK自带的队列，而是自己引用了第三方包jctools中的队列，使用第三方包jctools队列，JDK传统队列和Netty中所使用的队列有什么区别呢？之前写过一篇博客 ArrayBlockingQueue&LinkedBlockingQueue&DelayQueue&SynchronousQueue&PriorityBlockingQueue源码解析是关于传统队列的源码解析，今天来分析Netty源码中所使用的队列，大概要去研究 MpscChunkedArrayQueue & MpscChunkedArrayQueue & MpscUnboundedArrayQueue & MpscUnboundedAtomicArrayQueue这几种队列的源码。

MpscChunkedArrayQueue源码解析

在研究MpscChunkedArrayQueue源码之前先来看一个例子，以这个例子为基础，进而研究 MpscChunkedArrayQueue的源码。

public static void main(String[] args) {
    MpscChunkedArrayQueue queue = new MpscChunkedArrayQueue(4);
    queue.offer(0);
    System.out.println(queue.poll());
}

创建一个队列，向队列中生产一个元素，再从队列中获取一个元素，并打印出拉取的元素值，例子很简单，但内部原理却很复杂，因此先理清MpscChunkedArrayQueue和其他类之间的关系。
在这里插入图片描述
接下来看MpscChunkedArrayQueue的构造函数。

public static final int MAX_POW2 = 1 << 30;

public MpscChunkedArrayQueue(int maxCapacity) {
    super(max(2, min(1024, roundToPowerOfTwo(maxCapacity / 8))), maxCapacity);
}      

public static int roundToPowerOfTwo(final int value) {
	// 如果value > 2 ^ 30 或 value < 0 时将抛出IllegalArgumentException异常
    if (value > MAX_POW2) {
        throw new IllegalArgumentException("There is no larger power of 2 int for value:" + value + " since it exceeds 2^31.");
    }
    if (value < 0) {
        throw new IllegalArgumentException("Given value:" + value + ". Expecting value >= 0.");
    }
	// int numberOfLeadingZeros(int i) 给定一个int类型数据，
	// 返回这个数据的二进制串中从最左边算起连续的“0”的总数量。因为int类型的数据长度为32所以高位不足的地方会以“0”填充。
	// 如 int x=1;  
	// int z=5;   
	// System.out.println(x+"的二进制表示为："+Integer.toBinaryString(x)+"最左边开始数起连续的0的个数为："+Integer.numberOfLeadingZeros(x));  
	// System.out.println(z+"的二进制表示为："+Integer.toBinaryString(z)+"最左边开始数起连续的0的个数为："+Integer.numberOfLeadingZeros(z));  
	// 1的二进制表示为：1最左边开始数起连续的0的个数为：31   
	// 5的二进制表示为：101最左边开始数起连续的0的个数为：29  
    final int nextPow2 = 1 << (32 - Integer.numberOfLeadingZeros(value - 1));
    return nextPow2;
}

那int nextPow2 = 1 << (32 - Integer.numberOfLeadingZeros(value - 1));这一行代码的用意是什么呢？

public static void main(String[] args) {
    for(int i = 0 ;i < 100 ;i ++){
        System.out.println("i = " +i + ", pow值 = "+ Pow2.roundToPowerOfTwo(i));
    }
}      

结果输出：

i = 0, pow值 = 1
i = 1, pow值 = 1
i = 2, pow值 = 2
i = 3, pow值 = 4
i = 4, pow值 = 4
i = 5, pow值 = 8
i = 6, pow值 = 8
i = 7, pow值 = 8
i = 8, pow值 = 8
i = 9, pow值 = 16
i = 10, pow值 = 16
... 
i = 31, pow值 = 32
i = 32, pow值 = 32
i = 33, pow值 = 64
...
i = 63, pow值 = 64
i = 64, pow值 = 64
i = 65, pow值 = 128
...
i = 99, pow值 = 128

通过上述例子，大家应该理解roundToPowerOfTwo()函数的用意了吧，你传入任意一个整数n ，如果 2 ^ i < n <= 2^j ，则返回2 ^ j，保证返回值一定是2 的幂次方，如上述例子中， 2 ^ 5 < 33 <= 2 ^ 6 ，则返回 2 的 6 次方 64 。因此再来理解 min(1024, roundToPowerOfTwo(maxCapacity / 8))) 这行代码就容易了，传入maxCapacity = 4 ，则roundToPowerOfTwo(maxCapacity / 8) 等于 roundToPowerOfTwo(0) 等于1 ，而min(1024, 1 ) = 1 ，max(1,2) = 2 ，因此调用super(initialCapacity,maxCapacity) 的参数值分别为2和4，为什么这样设计呢？哈哈，我也不知道，接下来进入父类 MpscChunkedArrayQueueColdProducerFields的构造方法，已知，当maxCapacity为4时， initialCapacity 为2 。

MpscChunkedArrayQueueColdProducerFields(int initialCapacity, int maxCapacity) {
    super(initialCapacity);
    RangeUtil.checkGreaterThanOrEqual(maxCapacity, 4, "maxCapacity");
    RangeUtil.checkLessThan(roundToPowerOfTwo(initialCapacity), roundToPowerOfTwo(maxCapacity),
            "initialCapacity");
    // maxQueueCapacity = maxCapacity *  2 ，这里只不过通过左移的方式来计算，提升效率  
    maxQueueCapacity = ((long) Pow2.roundToPowerOfTwo(maxCapacity)) << 1;
}

接下来继续看MpscChunkedArrayQueueColdProducerFields的super()方法。

public BaseMpscLinkedArrayQueue(final int initialCapacity) {
    RangeUtil.checkGreaterThanOrEqual(initialCapacity, 2, "initialCapacity");

    int p2capacity = Pow2.roundToPowerOfTwo(initialCapacity);
    // leave lower bit of mask clear
    // 保证最低位是0 
    long mask = (p2capacity - 1) << 1;
    // need extra element to point at next array
    E[] buffer = allocateRefArray(p2capacity + 1);
    // 生产者和消费者数组都指向同一个buffer
    producerBuffer = buffer;
    producerMask = mask;
    consumerBuffer = buffer;
    consumerMask = mask;
    // BaseMpscLinkedArrayQueueColdProducerFields的producerLimit属性值
    soProducerLimit(mask); // we know it's all empty to start with
}

public static <E> E[] allocateRefArray(int capacity) {
    return (E[]) new Object[capacity];
}

public static final Unsafe UNSAFE;

abstract class BaseMpscLinkedArrayQueueColdProducerFields<E> extends BaseMpscLinkedArrayQueuePad3<E> {
    private final static long P_LIMIT_OFFSET = fieldOffset(BaseMpscLinkedArrayQueueColdProducerFields.class, "producerLimit");

    private volatile long producerLimit;
    protected long producerMask;
    protected E[] producerBuffer;

    final long lvProducerLimit() {
        return producerLimit;
    }

    final boolean casProducerLimit(long expect, long newValue) {
        return UNSAFE.compareAndSwapLong(this, P_LIMIT_OFFSET, expect, newValue);
    }

    final void soProducerLimit(long newValue) {
    	// CAS操作
        UNSAFE.putOrderedLong(this, P_LIMIT_OFFSET, newValue);
    }
}

上面需要注意long mask = (p2capacity - 1) << 1;这一行代码，这一行代码什么意思呢？来看一个例子。打印出0~32中所有mask的值。
在这里插入图片描述

发现有一个共同的特点，就是最低位为0，好像写了那么多，感觉看不出什么东西，首先来了解producerLimit参数，这个参数的值等于mask，而mask的值为 (数组长度-2) * 2，通过 E[] buffer = allocateRefArray(p2capacity + 1); 这一行代码得知，数组的长度等于初始化容量 + 1，比如传入initialCapacity 为4，则数组的长度为5 ， mask 的值为 ( 4 - 3 ) * 2 = 6，为什么又要设置producerLimit的值为mask呢？MpscChunkedArrayQueue采用数组作为内部存储结构，那么它是如何实现扩容的呢？可能大家第一反应想到的是创建新数组，然后将老数据挪到新数组中去；但MpscChunkedArrayQueue采用了一种独特的方式，避免了数组的复制；
举例说明：
假设队列的初始化大小为4，则初始的buffer数组为4+1；为什么要＋1呢？因为最后一个元素需要存储下一个buffer的指针；假设队列中存储了8个元素，则数组的内容如下：
在这里插入图片描述
可以看到，每个buffer数组的大小都是固定的（之前的版本支持固定大小和非固定大小），也就是initialCapacity指定的大小；每个数组的最后一个实际保存的是个指针，指向下一个数组；读取数据时，如果遇到JUMP表示要从下一个buffer数组读取数据；
在这里插入图片描述
在研究源码之前，建议自己写个例子，如下，自己打断点调试调试。

public static void main(String[] args) {
    MpscChunkedArrayQueue queue = new MpscChunkedArrayQueue(4,16);
    queue.offer(0);
    queue.offer(1);
    queue.offer(2);
    queue.offer(3);
    queue.offer(4);
    queue.poll();
    queue.poll();
    queue.poll();
    queue.poll();
    queue.offer(5);
    queue.offer(6);
    queue.offer(7);
    queue.offer(8);
}

接下来进入offer方法的研究。

public boolean offer(final E e) {
    if (null == e) {
        throw new NullPointerException();
    }
    long mask;
    E[] buffer;
    long pIndex;
    while (true) {
        long producerLimit = lvProducerLimit();
        pIndex = lvProducerIndex();
        // lower bit is indicative of resize, if we see it we spin until it's cleared
        if ((pIndex & 1) == 1) {
            continue;
        }
        // pIndex is even (lower bit is 0) -> actual index is (pIndex >> 1)
        // mask/buffer may get changed by resizing -> only use for array access after successful CAS.

        mask = this.producerMask;
        buffer = this.producerBuffer;
        // a successful CAS ties the ordering, lv(pIndex) - [mask/buffer] -> cas(pIndex)

        // assumption behind this optimization is that queue is almost always empty or near empty
        // 如果生产者索引达到了最大值，防止追尾
        if (producerLimit <= pIndex) {
            int result = offerSlowPath(mask, pIndex, producerLimit);
            switch (result) {
                case CONTINUE_TO_P_INDEX_CAS:
                    break;
                case RETRY:
                    continue;
                case QUEUE_FULL:
                    return false;
                case QUEUE_RESIZE:
                    resize(mask, buffer, pIndex, e, null);
                    return true;
            }
        }
        // CAS更新生产者索引，更新成功了则跳出循环，说明数组中这个下标被当前这个生产者占有了
    	// 此时即使更新索引成功了，数组中依然还没有放入元素
   		// 如果更新失败，说明其它生产者（线程）先占用了这个位置，重新来过 
        if (casProducerIndex(pIndex, pIndex + 2)) {
            break;
        }
    }
    // INDEX visible before ELEMENT
    // 计算这个索引在数组中的下标偏移量
    final long offset = modifiedCalcCircularRefElementOffset(pIndex, mask);
    System.out.println(" offer =" + (offset-16)/4 + ", e =" +  e);
    // 将元素放到这个位置
    soRefElement(buffer, offset, e); // release element e
    // 入队成功
    return true;
}

先来看这两行代码。

long producerLimit = lvProducerLimit();
pIndex = lvProducerIndex();

从这两行代码中，就是get()方法调用而已经，也没有看到什么。

 # BaseMpscLinkedArrayQueueColdProducerFields
final long lvProducerLimit() {
	// producerLimit本身就是volatile修饰的
    // 所以不用像下面的consumerIndex一样通过UNSAFE.getLongVolatile()一样来读取
    return producerLimit;
}

 # BaseMpscLinkedArrayQueueProducerFields
public final long lvProducerIndex() {
	// producerIndex本身就用volatile修饰了
    return producerIndex;
}

请看下图，我相信你很好理解了。
在这里插入图片描述
因为对producerLimit，producerIndex的操作都是CAS的，在高并发情况下，满足多线程高并发需求。接下来看下面这段代码，作者这么写的用意是什么呢？

while(true){
	...
	// lower bit is indicative of resize, if we see it we spin until it's cleared
	if ((pIndex & 1) == 1) {
	    continue;
	}
	... 
}

从代码来看，就是pIndex & 1 的时候，将一直自旋等待，直到(pIndex & 1) != 1，之前的分析，pIndex 总是等于 pIndex + 2 ，什么时候pIndex 会变成奇数呢？请看下面加粗代码 else if (casProducerIndex(pIndex, pIndex + 1)) ，如果casProducerIndex(pIndex, pIndex + 1) 执行成功，说明当前线程抢锁成功，有资格对数组进行扩容，如果此时有其他线程也执行相同的casProducerIndex(pIndex, pIndex + 1) CAS操作，因为当前线程抢锁成功，其他线程肯定CAS 操作失败，因此需要在while循环中等待当前线程扩容完毕，才能进行offer()操作，当然这里没有像Lea Doug一样，通过线程的睡眠和唤醒来实现，而是直接while死循环等待，只要数组扩容成功，其他线程立即进行下面的处理，虽然对CPU性能有一定的浪费，但是速度肯定是最快的，没有线程的切换与唤醒。这可能也是Netty的核心思想吧，就是要快，内存，CPU资源先放一边不考虑。

private int offerSlowPath(long mask, long pIndex, long producerLimit) {
	// 获取消费者索引
    final long cIndex = lvConsumerIndex();
    //  在 MpscChunkedArrayQueue 队列中getCurrentBufferCapacity()方法
    // 数组缓冲的容量，(长度-1) * 2

    long bufferCapacity = getCurrentBufferCapacity(mask);
    // 消费索引  + 当前数组的容量 > 生产索引，代表当前数组已有部分元素被消费了，
    // 不会扩容，会使用已被消费的槽位。
    // cIndex + bufferCapacity => producerLimit
    if (cIndex + bufferCapacity > pIndex) {
        if (!casProducerLimit(producerLimit, cIndex + bufferCapacity)) {
            // retry from top
            // CAS失败，自旋重试
            return RETRY;
        } else {
            // continue to pIndex CAS
            // 重试  CAS修改 生产索引
            return CONTINUE_TO_P_INDEX_CAS;
        }
    }
    // full and cannot grow
    // 根据生产者和消费者索引判断Queue是否已满，无界队列永不会满
    else if (availableInQueue(pIndex, cIndex) <= 0) {
        // offer should return false;
        return QUEUE_FULL;
    }
    // grab index for resize -> set lower bit
    // CAS的方式将producerIndex加1，奇数代表正在resize
    else if (casProducerIndex(pIndex, pIndex + 1)) {
        // trigger a resize
        return QUEUE_RESIZE;
    } else {
        // failed resize attempt, retry from top
        // 因为存在两个线程同时执行casProducerIndex(pIndex, pIndex + 1)
        // 的并发操作，对于抢锁失败的线程，就只能等待抢锁成功的线程
        // 扩容完之后，才能继续向下执行，因此抢锁失败的线程将只能在外层 while()循环中
        // if ((pIndex & 1) == 1) {
        //        continue;
        //    } 持续等待了
        return RETRY;
    }
}

protected long getCurrentBufferCapacity(long mask) {
    return mask;
}

 # BaseMpscLinkedArrayQueueProducerFields
final boolean casProducerIndex(long expect, long newValue) {
	// CAS更新
    return UNSAFE.compareAndSwapLong(this, P_INDEX_OFFSET, expect, newValue);
}

 # BaseMpscLinkedArrayQueueConsumerFields
public final long lvConsumerIndex() {
	// 以volatile的形式加载consumerIndex
    // 此时，可以把consumerIndex想像成前面加了volatile
    // 会从内存读取最新的值
    return consumerIndex;
}

这段入队的方法看似简单，实则蕴含大量的底层知识和优化技巧，让我们来看几个问题：

为什么需要 producerLimit，拿 producerIndex 与 consumerIndex 直接比较行不行？
很多方法后面写了 LoadLoad、StoreStore，它们是什么意思？
Unsafe 的新方法 putOrderedObject () 和 getLongVolatile ()？

接着看看第二个问题：LoadLoad、StoreStore 是什么意思？

可以把 LoadLoad 看成是读屏障，表示每次都从主内存读取最新值，StoreStore 看成是写屏障，每次都把最新值写入到主内存。如果一个线程使用 StoreStore 屏障把最新值写入主内存，另一个线程只需要使用 LoadLoad 屏障就可以读取到最新值了，它们俩往往结合着来使用。

最后一个问题：Unsafe 的新方法 putOrderedObject () 和 getLongVolatile ()？

其实，在 Unsafe 中有五组相似的方法：

putOrderedXxx ()，使用 StoreStore 屏障，会把最新值更新到主内存，但不会立即失效其它缓存行中的数据，是一种延时更新机制；
putXxxVolatile ()，使用 StoreLoad 屏障，会把最新值更新到主内存，同时会把其它缓存行的数据失效，或者说会刷新其它缓存行的数据；
putXxx (obj, offset)，不使用任何屏障，更新对象对应偏移量的值；
getXxxVolatile ()，使用 LoadLoad 屏障，会从主内存获取最新值；
getXxx，不使用任何屏障，读取对象对应偏移量的值；

从性能方面来说的话，putOrderedXxx () 用得好的话，性能会比 putXxxVolatile () 高一些，但是，如果用的不好的话，可能会出现并发安全的问题，所以，个人请谨慎使用，即使使用了，也要做好并发安全的测试。

OK，基础知识也补齐了，如果还看不懂，不要紧，先跳过去，我们再来看看出队方法，等看完出队方法了，我们使用脑补法来模拟一下入队出队的实现。

接下来看 if (cIndex + bufferCapacity > pIndex) {这一行代码，什么场景下需要进行这样的判断呢？先看一种场景，先向队列中offer 3个元素，再poll()2个元素，在offer(5)中打断点进入offer()方法。
在这里插入图片描述
看断点结果。

为什么呢？请看下图。

在这里插入图片描述
因此有了下面这行代码， if (!casProducerLimit(producerLimit, cIndex + bufferCapacity)) {，CAS操作将producerLimit 改成 cIndex + bufferCapacity，为什么是将producerLimit 替换成cIndex + bufferCapacity呢？还是上面的例子，如cIndex = 8 ， bufferCapacity = 6 ，此时pIndex = 10，将 producerLimit 更新为 cIndex +bufferCapacity = 6 + 8 = 14，此时队列中有一个元素，（14 - 10 ）/ 2 = 2 ，是不是数组中还能存储2个有用元素。casProducerLimit(producerLimit, cIndex + bufferCapacity) 这行代码CAS 操作也有执行失败的情况，两个线程同时offer()操作，同时执行这一行代码，肯定有一个成功有一个失败，失败的线程重新外层循环，执行成功的线程，则直接将元素添加到数组中。

所以对于offerSlowPath()方法返回的4种情况来分析一下。

RETRY :表示继续外层循环。
CONTINUE_TO_P_INDEX_CAS ：跳出外层循环直接将元素保存到数据中。
QUEUE_FULL：队列已经满，如初始化MpscChunkedArrayQueue时，最大容量maxCapacity 为16，在没有被消费的情况下，向队列中添加16个元素后，则返回QUEUE_FULL，则不能再继续向队列中添加元素，当然此时的队列的结构时，假如 initialCapacity =4，则数组初始化长度为5，但数组要存储一个JUMP，还需要存储next数组的指针，因为一个数组真实有效的元素个数是3，因此16个有效元素就是6 个数组，其中最后一个数组只存储了一个有效元素。
QUEUE_RESIZE：当casProducerIndex(pIndex, pIndex + 1) 执行成功，此时(pIndex & 1) == 1 ，则表示当前线程抢锁成功，可直接进行数组扩容操作。

再次回顾MpscChunkedArrayQueue数组扩容机制，MpscChunkedArrayQueue采用数组作为内部存储结构，那么它是如何实现扩容的呢？可能大家第一反应想到的是创建新数组，然后将老数据挪到新数组中去；但MpscChunkedArrayQueue采用了一种独特的方式，避免了数组的复制；

举例说明：
假设队列的初始化大小为4，则初始的buffer数组为4+1；为什么要＋1呢？因为最后一个元素需要存储下一个buffer的指针；假设队列中存储了8个元素，则数组的内容如下：

在这里插入图片描述

接下来看数组扩容机制如何实现。

private void resize(long oldMask, E[] oldBuffer, long pIndex, E e, Supplier s) {
    assert (e != null && s == null) || (e == null || s != null);
    // 计算新数组的长度
    // 下一个Buffer的长度，MpscChunkedArrayQueue会构建一个相同长度的Buffer
    int newBufferLength = getNextBufferSize(oldBuffer);
    final E[] newBuffer;
    try {
    	// 创建新数组 
        newBuffer = allocateRefArray(newBufferLength);
    } catch (OutOfMemoryError oom) {
        assert lvProducerIndex() == pIndex + 1;
        soProducerIndex(pIndex);
        throw oom;
    }
	
	// 生产者buffer指向新数组
    producerBuffer = newBuffer;
    // 重新计算生产者 掩码，值为数组长度 - 4
    // 在初始化BaseMpscLinkedArrayQueue时，发现掩码为
    // int p2capacity = Pow2.roundToPowerOfTwo(initialCapacity);
    // long mask = (p2capacity - 1) << 1;
    // E[] buffer = allocateRefArray(p2capacity + 1);
    // 假如 initialCapacity = 4 ，则
    // mask = 6
    // buffer = 4 + 1 = 5 
    // newMask = newBufferLength长度等于旧数组长度 = 5 
    // 5 - 2 << 1 <=>  3 << 1 = 6 
    // 如果新数组长度不变的尾部，则新掩码和旧掩码值一样
    // 如果是MpscChunkedArrayQueue ，计算新的Mask，Buffer长度不变的情况下，Mask也不变
    final int newMask = (newBufferLength - 2) < 1;
    producerMask = newMask;

	// JUMP对象所放的位置 , 计算pIndex在旧数组的偏移
    final long offsetInOld = modifiedCalcCircularRefElementOffset(pIndex, oldMask);
    // 计算pIndex在新数组的偏移，添加到队列的元素放在新数组的位置，大家发现一个规率没有 
    // 如果新旧数组长度一样时，JUMP在旧数组的索引位置和 元素放到新数组的
    // 索引位置相等，这一点需要注意，在看poll()代码时需要用到
    final long offsetInNew = modifiedCalcCircularRefElementOffset(pIndex, newMask);
	
	// 都是CAS操作，设置新加入的元素到新数组
    soRefElement(newBuffer, offsetInNew, e == null ? s.get() : e);// element in new array
    // 旧数组的最后一个位置指向新的数组，将指向新数组的指针存储于旧数组的length-1的索引位置
    soRefElement(oldBuffer, nextArrayOffset(oldMask), newBuffer);// buffer linked


    // ASSERT code
    final long cIndex = lvConsumerIndex();
    // maxQueueCapacity = ((long) Pow2.roundToPowerOfTwo(maxCapacity)) << 1
    // 这里需要注意 ，maxQueueCapacity的初始值为 maxCapacity * 2 
    // 因此availableInQueue除2，表示队列中还能offer()元素个数
    final long availableInQueue = availableInQueue(pIndex, cIndex);
    RangeUtil.checkPositive(availableInQueue, "availableInQueue");

    // Invalidate racing CASs
    // 更新limit
    // We never set the limit beyond the bounds of a buffer
    // 设置 producerLimit的值，这里为什么选 newMask, availableInQueue的最小值
    // 作为pIndex的增量呢？
    // 还是之前的例子，数组大小为5， mask为6
    // 第一种情况 ：当队列中添加了3个有效元素，此时再次调用offer()方法向数组中添加元素 
    // 此时数组需要扩容，availableInQueue = 16 - (6 - 0) = 10 ，pIndex=6 
    // newMask = 6 ,因此此时producerLimit = 12，而不是6 + 10 = 16 
    // 为什么呢？ 当 producerLimit = 12时，表示除了本次添加的元素外，新数组还可以
    // 添加两个元素， 因为本次添加1个元素后，pIndex = 8，再添加一个元素
    // pIndex=10，再添加一个元素 pIndex = 12，当再添加一个元素时， 在
    // offer()方法的 【if (producerLimit <= pIndex) { 】这一行代码就会判断， 12 <= 12
    // 此时会继续触发扩容机制 。 
    //  第二种情况：假如队列的最大存储元素个数为16，此时队列中已经添加了15个元素 
    // 此时第16个元素加入到队列中，当然都没有被消费
    // 此时 newMask = 6 ，pIndex = 14 ，availableInQueue=2 
    // 此时 producerLimit 被设置为 16，当前元素添加队列后pIndex = pIndex + 2 
    // 因此再有元素想添加到队列，此时会返回QUEUE_FULL ，队列已经满，不能再添加元素到队列中
    soProducerLimit(pIndex + Math.min(newMask, availableInQueue));

    // make resize visible to the other producers
    // pIndex = pIndex + 2
    // 更新pIndex 
    soProducerIndex(pIndex + 2);

    // INDEX visible before ELEMENT, consistent with consumer expectation

    // make resize visible to consumer
    // pIndex在旧数组的位置设置一个固定值-JUMP，来告诉要跳到下一个数组
    // 将JUMP 存储在旧数组的offsetInOld位置，当消费者查找到元素为
    // JUMP，证明数组已经扩容，则根据next数组指针找到下一个数组
    // 在next数组相同的位置的元素，就是本次要消费的元素 
    soRefElement(oldBuffer, offsetInOld, JUMP);
}

protected long availableInQueue(long pIndex, long cIndex) {
    return maxQueueCapacity - (pIndex - cIndex);
}

public static final long REF_ARRAY_BASE;
public static final int REF_ELEMENT_SHIFT;

static {
    final int scale = UnsafeAccess.UNSAFE.arrayIndexScale(Object[].class);
    if (4 == scale) {
        REF_ELEMENT_SHIFT = 2;
    } else if (8 == scale) {
        REF_ELEMENT_SHIFT = 3;
    } else {
        throw new IllegalStateException("Unknown pointer size: " + scale);
    }
    REF_ARRAY_BASE = UnsafeAccess.UNSAFE.arrayBaseOffset(Object[].class);
}

上面有两个重要方法modifiedCalcCircularRefElementOffset(),nextArrayOffset() 可能令人费解。来看一个他的代码实现，什么鬼？看不懂。但这里告诉你REF_ARRAY_BASE的默认值为16，REF_ELEMENT_SHIFT的值为1 。

public static long modifiedCalcCircularRefElementOffset(long index, long mask) {
    return REF_ARRAY_BASE + ((index & mask) << (REF_ELEMENT_SHIFT - 1));
}

public static long nextArrayOffset(long mask) {
 	return modifiedCalcCircularRefElementOffset(mask + 2, Long.MAX_VALUE);
}

先来看数组索引存放的位置，再来看JUMP对象存储的位置实现，首先来看next数组索引计算，Long.MAX_VALUE的二进制表示为111111111111111111111111111111111111111111111111111111111111111，

private static long nextArrayOffset(long mask) {
    return modifiedCalcCircularRefElementOffset(mask + 2, Long.MAX_VALUE);
}

public static long modifiedCalcCircularRefElementOffset(long index, long mask) {
    return REF_ARRAY_BASE + ((index & mask) << (REF_ELEMENT_SHIFT - 1));
}

这两个方法调用可以看成是 REF_ARRAY_BASE + (mask + 2 ) << 1 ，而mask的值通过如下方式计算而来的。
int p2capacity = Pow2.roundToPowerOfTwo(initialCapacity);
long mask = (p2capacity - 1) << 1 ，因此next数组的的偏移量为REF_ARRAY_BASE + ((p2capacity - 1) << 1 ) + 2 ) << 1，而REF_ARRAY_BASE表示引用数组本身的偏移量，因此数组的索引相对于REF_ARRAY_BASE的偏移量为((p2capacity - 1) << 1 ) + 2 ) << 1 <=> (2 * p2capacity - 2 + 2 ) << 2 <=> 4 * p2capacity，而每一个数组元素偏移量为4，因此next数组指针在数组的索引位置始终为p2capacity，又因为数组本身的长度是p2capacity + 1，从BaseMpscLinkedArrayQueue的构造方法中的【 E[] buffer = allocateRefArray(p2capacity + 1) 】这一行代码中可以看出，因此next数组指针永远在旧数组length - 1 的索引位置。再来看JUMP对象索引位置计算，我们可以看个例子。

import static org.jctools.util.UnsafeRefArrayAccess.REF_ARRAY_BASE;
import static org.jctools.util.UnsafeRefArrayAccess.REF_ELEMENT_SHIFT;

public class LinkedArrayQueueUtilTest {


    public static void main(String[] args) {
        int initialCapacity = 4;
        int p2capacity = Pow2.roundToPowerOfTwo(initialCapacity);
        System.out.println("REF_ARRAY_BASE="+REF_ARRAY_BASE);
        System.out.println("Long.MAX_VALUE的二进制 ：" + Long.toBinaryString(Long.MAX_VALUE));
        System.out.println("数组长度 =  " + (p2capacity + 1) + " 最大索引 = " + p2capacity);
        long mask = (p2capacity - 1) << 1;
        long arrayIndex = ((((p2capacity - 1) << 1) + 2) << 1) >> 2;

        System.out.println("数组存放的位置  " + arrayIndex + ", p2capacity = " + p2capacity);
        System.out.println("mask的值是 =" + mask + " ,mask的二进制 " + Long.toBinaryString(mask));
        System.out.println("mask + 2 的二进制 =  " + Long.toBinaryString(mask + 2));
        int j = 0 ;
        for (int i = 0; i < 100; i = i + 2) {
            long offset = (modifiedCalcCircularRefElementOffset(i, mask) - REF_ARRAY_BASE) / 4;
            if (i % mask == 0 && i > 0) {
                System.out.println("i= " + i + " jump =" + offset);
                System.out.println("i= " + i + " arrayOffset =" + (nextArrayOffset(mask) - REF_ARRAY_BASE) / 4);
                System.out.println("================================");
                System.out.println("i= " + i + " offset =" + offset + " value = " + (j ++));
            }else{
                System.out.println("i= " + i + " offset =" + offset + " value = " + ((j ++)));
            }
        }
    }
    
    private static long nextArrayOffset(long mask) {
        // Long.MAX_VALUE的二进制  111111111111111111111111111111111111111111111111111111111111111
        return modifiedCalcCircularRefElementOffset(mask + 2, Long.MAX_VALUE);
    }
    public static long modifiedCalcCircularRefElementOffset(long index, long mask) {
        return REF_ARRAY_BASE + ((index & mask) << (REF_ELEMENT_SHIFT - 1));
    }
}

这个例子，假如数组的长度为5，在没有消费的情况下，模拟向队列中插入100个元素，从而来观察modifiedCalcCircularRefElementOffset()方法的行为。
在这里插入图片描述
大家发现没有，数组的长度为5，next数组的索引一定存储在索引为4的位置，连续扩容4个数组，JUMP的索引位置分别是3,2,1,0。大家还发现一个规率没有，当向数组中插入3时，本来在第一个数组的索引位置是3，但此时数组容量大到限制，因为每个数组中实际最多存储的有效元素为 5 - 2 = 3 ，因为2个位置被JUMP和next数组索引占用，因此当数组长度为5时，最多存储3个有效元素，当3添加到队列时，在第一个数组中已经存储了0，1，2，因此需要扩容，扩容后，此时3存储在第二个数组的索引位置为3，同理，当6插入第2个数组时，此时第二个数组实际存储元素的个数为3，因此需要扩容，扩容前将JUMP存储在第2个数组索引为2的位置，同理将6存储在第3个数组索引为2的位置，当数组扩容时，当前元素本应存储在前一个数组的索引位置被存储了JUMP对象，而当前元素顺移到next数组相同索引位置，这个特点一定要记住，在分析poll()代码时会用到。其他情况，自己去分析验证。这就是modifiedCalcCircularRefElementOffset()方法的表现效果，但数学上的含义，我也说不清楚，只知道这个现象，如果知道为什么掩码需要这样设计？可以给我留言。我在博客中补充。

2023-01-18

modifiedCalcCircularRefElementOffset()之前关于这个方法在数学上的含义，经过一段时间的理解，现在终于明白了。

public class LinkedArrayQueueUtilTest2 {

    public static void main(String[] args) {
        int initialCapacity = 8;
        int p2capacity = Pow2.roundToPowerOfTwo(initialCapacity);
        System.out.println("REF_ARRAY_BASE=" + REF_ARRAY_BASE);
        System.out.println("数组长度 =  " + (p2capacity + 1) + " 最大索引 = " + p2capacity);
        long mask = (p2capacity - 1) << 1;
        System.out.println(mask);
        System.out.println(Long.toBinaryString(mask));
        for (long i = 0; i < 100; i = i + 2) {
            BinaryStr binaryStr =  getPreIndex(i, mask);
            System.out.println("index = " + i +(i <10?"  ":" ")+ binaryStr.getIndexStr());
            System.out.println("mask  = " + mask +" "+ binaryStr.getMaskStr());
            System.out.println("=====(index & mask)===" + (i & mask));
            long offset = (modifiedCalcCircularRefElementOffset(i, mask) - REF_ARRAY_BASE) / 4;
            System.out.println(offset);
        }
    }
    
    public static BinaryStr getPreIndex(long index,long mask){
        String indexStr = Long.toBinaryString(index);
        String maskStr = Long.toBinaryString(mask);
        int cha = indexStr.length() - maskStr.length();
        if(cha !=0){
            StringBuilder sb = new StringBuilder();
            int x = cha > 0 ? cha : -cha;
            for(int i = 0 ;i <x ;i ++){
                sb.append("0");
            }
            if(cha > 0 ){
                maskStr = sb.toString() + maskStr;
            }else{
                indexStr = sb.toString() + indexStr;
            }
        }
        return new BinaryStr(indexStr ,maskStr);
    }
    
    // 默认 REF_ELEMENT_SHIFT = 2 ，因此可以写成REF_ARRAY_BASE + (index & mask) << 1  等价于 REF_ARRAY_BASE + (index & mask) * 2 
    public static long modifiedCalcCircularRefElementOffset(long index, long mask) {
        return REF_ARRAY_BASE + ((index & mask) << (REF_ELEMENT_SHIFT - 1));
    }
}

在这里插入图片描述
我们挑选pIndex=26来分析，pIndex=26对应的二进制为11010，而mask=14对应的二进制始终都是01110，因此modifiedCalcCircularRefElementOffset()方法中(index & mask) 一定是一个0~14之间的偶数，因为每次pIndex 总是等于 pIndex + 2 ，而0 ~ 14刚好对应的数组索引是0 ~ 7，如数组的索引为[0,1,2,3,4,5,6,7]，因为pIndex = pIndex + 2 ，pIndex对应的值是[0,2,4,6,8,10,12,14] ，又因为int scale = UnsafeAccess.UNSAFE.arrayIndexScale(Object[].class);
计算出scale = 4，也就是Object对象数组，元素在偏移量为4，同样又是因为pIndex = pIndex + 2 ，实际上相等了pIndex = index * 2 = index << 1
所以，Object 在数组中的偏移量 offset = REF_ARRAY_BASE + ((pIndex & mask) << (REF_ELEMENT_SHIFT - 1)) = 基础地址 + (pIndex & mask) << 1 = 基础地址 + index << 2 = 基础地址 + index * 2 ，实际上可以推出
(pIndex & mask) / 2 = index (index是数组索引)

接下来看连续向队列中插入13个元素后。队列数组的结构。

public static void main(String[] args) {
    MpscChunkedArrayQueue queue = new MpscChunkedArrayQueue(4,16);
    queue.offer("e0");
    queue.offer("e1");
    queue.offer("e2");
    queue.offer("e3");
    queue.offer("e4");
    queue.offer("e5");
    queue.offer("e6");
    queue.offer("e7");
    queue.offer("e8");
    queue.offer("e9");
    queue.offer("e10");
    queue.offer("e11");
    queue.offer("e12");
    queue.poll();
    queue.poll();
    queue.poll();
    queue.poll();
}

队列数组结构

在这里插入图片描述

接下来看poll()方法的实现。

public E poll() {
	// 存储元素的数组
    final E[] buffer = consumerBuffer;
    // 读取consumerIndex的值，注意这里是lp不是lv
    final long cIndex = lpConsumerIndex();
    final long mask = consumerMask;
	// 计算在数组中的偏移量
    final long offset = modifiedCalcCircularRefElementOffset(cIndex, mask);
    // 取元素，前面通过StoreStore写入的，这里通过LoadLoad取出来的就是最新值
    Object e = lvRefElement(buffer, offset);
    if (e == null) {
        long pIndex = lvProducerIndex();
        // isEmpty?
        //队列中一个元素都没有了，则直接返回
        if ((cIndex - pIndex) / 2 == 0) {
            return null;
        }
        // poll() == null iff queue is empty, null element is not strong enough indicator, so we must
        // spin until element is visible.
        // 我觉得代码执行到这里有两种情况
        // 第一种，假如cIndex = 0，从来没有消费过
        // 第一步生产者代码 casProducerIndex(pIndex, pIndex + 2) ，将producerIndex = producerIndex + 2 = 2 。
        // 第二步，消费者代码执行 
        // offset = modifiedCalcCircularRefElementOffset(cIndex, mask) 
        // e = lvRefElement(buffer, offset) ，并且 e == null 
        // long pIndex = lvProducerIndex(); 
        // (cIndex - pIndex) / 2 等价于 0 - 2 / 2 = -1 ，执行下面代码块，进入死循环
        // 直到生产者代码soRefElement(buffer, offset, e) 执行完毕
        // 此时e !=null ，退出下面死循环，这样做的目的，是因为生产者代码 
        // casProducerIndex(pIndex, pIndex + 2) 和 soRefElement(buffer, offset, e) 不是原子性，因此消费者代码需要用这种补偿机制，但从代码设计角度上来看
        // Netty 这样做也是为了提升代码的性能。
        
        // 第二种情况 ：
        // 两个消费者线程同时执行到final long offset = modifiedCalcCircularRefElementOffset(cIndex, mask);这一行代码
        // 线程1 执行了 
        // Object e = lvRefElement(buffer, offset);
        // soRefElement(buffer, offset, null); 此时 offset位置的元素被置空
        // 此时线程2才执行
        // Object e = lvRefElement(buffer, offset); ,e == null 
        // 队列中还有其他元素  (cIndex - pIndex) / 2 !=0 
        // 则线程2会进入下面代码块，一直死循环等待，直接offset 位置被设置了元素 
        do {
            e = lvRefElement(buffer, offset);
        }
        while (e == null);
    }
	// 如果当前元素是JUMP，是到next数组中查找元素 
     if (e == JUMP) {
        final E[] nextBuffer = nextBuffer(buffer, mask);
        return newBufferPoll(nextBuffer, cIndex);
    }

    System.out.println(" poll =" + (offset-16)/4 + ", e =" +  e);
	// 将消费的元素从数组中移除 
	// 更新取出的位置元素为null，注意是sp，不是so
    soRefElement(buffer, offset, null); // release element null
    // release cIndex
    // cIndex = cIndex+2
    // 修改consumerIndex的索引为新值，使用StoreStore屏障，直接更新到主内存 
    soConsumerIndex(cIndex + 2); 
    // 返回出队的元素
    return (E) e;
}

对于上面代码块 e = lvRefElement(buffer, offset); 执行的第二种情况大家可能好奇，为什么offset处的元素被移除了，还会填充呢？来看个例子。
在这里插入图片描述

还是初始化数组的长度为5，每个数组业务元素为3 ，队列中最大可放入元素为16，来看上面这个例子，先3次offer()调用，分别向队列中加入e0,e1,e2 三个元素，数组下标分别为0,1,2 ，此时向调用两次poll()方法，e0,e1分别被消费掉，对应的索引0,1也被空闲出来，再向队列中加入元素e4，而计算出的索引为3，并且数组中此时元素个数为1，因此e4被成功加入，队列中有两个元素[{index=2,e=e2},{index=3,e=e4}] ，当e5加入时，此时队列中元素个数只有2个，因此不需要扩容，而e5计算出来的索引为0，此时数组的结构为[{index=2,e=e2},{index=3,e=e4},{index=0,e=e5}] ，大家发现没有，index=0的e0元素被消费置空后，只要数组中元素个数没有达到扩容条件，index=0处还是可以存储其他元素的，此时e6加入队列，因为数组中元素个数已经达到了3个，因此需要扩容，扩容时创建一个新数组2，而e6计算出的索引为1，因此将旧数组index=1位置放入JUMP, index=4 放入指向数组2的指针，在数组2的index=1的位置放入e6 ,此时数组1和数组2的结构如下，array1
[{index=2,e=e2},{index=3,e=e4},{index=0,e=e5},{index=1,e = JUMP},{index=4, e = &array2}] , array2[{index=1,e=e6}]，如果再向队列中加入e7,则array2是怎样结构呢,此时 array2[{index=1,e=e6},{index=2,e = e7 }]，如果此时再调用4次 queue.poll()方法，情形是怎样子的呢？

在这里插入图片描述
前面三次poll()方法很简单，就不再赘述了，我们只分析第4次queue.poll()，当第4次调用queue.poll()时，通过cIndex计算出的Index 为1，因此需要从index 为1的索引位置取值，而此时数组1的结构为 array1
[{index=2,e=null},{index=3,e=null},{index=0,e=null},{index=1,e = JUMP},{index=4, e = &array2}] , array2[{index=1,e=e6}]，index为1的值为JUMP，因此会触发下面这段代码执行，
if (e == JUMP) {
final E[] nextBuffer = nextBuffer(buffer, mask);
return newBufferPoll(nextBuffer, cIndex);
}
上面这段代码是到数组2中寻找元素。因此先通过array1[array1.length -1 ]找到array2的引用，然后取出 array2[index] 的元素就是本次要消费的元素，而数组2的结构如下 array2[{index=1,e=e6},{index=2,e = e7 }]，array2[1] = e6 ，不就是本次要消费的元素嘛，当代码执行完，array2的结构是 array2[{index=1,e=null },{index=2,e = e7 }]，而array1 已经无可用元素，array1是可以销毁的了。通过这样分析，我相信再来理解newBufferPoll()和newBufferPoll()方法就很容易了。

private E[] nextBuffer(final E[] buffer, final long mask) {
    final long offset = nextArrayOffset(mask);
    final E[] nextBuffer = (E[]) lvRefElement(buffer, offset);
    consumerBuffer = nextBuffer;
    consumerMask = (length(nextBuffer) - 2) << 1;
    soRefElement(buffer, offset, BUFFER_CONSUMED);
    return nextBuffer;
}


private E newBufferPoll(E[] nextBuffer, long cIndex) {
    final long offset = modifiedCalcCircularRefElementOffset(cIndex, consumerMask);
    final E n = lvRefElement(nextBuffer, offset);
    if (n == null) {
        throw new IllegalStateException("new buffer must have at least one element");
    }

    System.out.println(" 消费数组递增 ");
    System.out.println(" poll Index =" + (offset-16)/4 + ", e =" +  n);
    soRefElement(nextBuffer, offset, null);
    soConsumerIndex(cIndex + 2);
    return n;
}

其实代码分析到这里，我相信大家对MpscChunkedArrayQueue的设计巧妙有目共睹了，里面用到了大量的CAS操作，虽然存在一定的内存空间浪费，但性能上确追求到了极致，连数组扩容时拷贝元素的开销都不愿意付出，可见对性能要求的苛刻。当然啦，我觉得设计得最好的还是modifiedCalcCircularRefElementOffset()函数，巧妙的将元素分配到数组的空位中，不存在冲突，但这个函数为什么这样设计，我也弄不明白，就是觉得他的算法经典，如果有明白的，可以告诉我，我补充到博客中。当然，MpscChunkedArrayQueue的源码快解析完了，不过大家发现下面这些代码的意义是什么不？
在这里插入图片描述
定义了那么多的变量，但是没有用。
当然MpscChunkedArrayQueue性能优化还有其他方面，如下

CacheLine Padding

LinkedBlockingQueue的head和last是相邻的，ArrayBlockingQueue的takeIndex和putIndex是相邻的;而我们都知道CPU将数据加载到缓存实际上是按照缓存行加载的，因此可能出现明明没有修改last，但由于出列操作修改了head，导致整个缓存行失效，需要重新进行加载；

//此处我将多个类中的变量合并到了一起，便于查看
long p01, p02, p03, p04, p05, p06, p07;
long p10, p11, p12, p13, p14, p15, p16, p17;
protected long producerIndex;
long p01, p02, p03, p04, p05, p06, p07;
long p10, p11, p12, p13, p14, p15, p16, p17;
protected long maxQueueCapacity;
protected long producerMask;
protected E[] producerBuffer;
protected volatile long producerLimit;
protected boolean isFixedChunkSize = false;
long p0, p1, p2, p3, p4, p5, p6, p7;
long p10, p11, p12, p13, p14, p15, p16, p17;
protected long consumerMask;
protected E[] consumerBuffer;
protected long consumerIndex;

可以看到生产者索引和消费者索引中间padding了18个long变量，18*8=144，而一般操作系统的cacheline为64,可以通过如下方式查看缓存行大小:

cat /sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size
在这里插入图片描述

减少锁的使用,使用CAS＋自旋：

由于使用锁会造成线程切换，消耗资源；因此MpscChunkedArrayQueue并未使用锁，而是使用自旋；和Disruptor的BusySpinWaitStrategy比较类似，如果系统比较繁忙，自旋效率会很适合；当然它也会造成CPU使用率比较高，所以建议使用时将这些线程绑定到特定的CPU;

MpscUnboundedArrayQueue 源码解析

在这里插入图片描述
从结构上来看，MpscChunkedArrayQueueColdProducerFields 就是比MpscUnboundedArrayQueue多继承了一个MpscChunkedArrayQueueColdProducerFields。

abstract class MpscChunkedArrayQueueColdProducerFields<E> extends BaseMpscLinkedArrayQueue<E> {
    protected final long maxQueueCapacity;

    MpscChunkedArrayQueueColdProducerFields(int initialCapacity, int maxCapacity) {
        super(initialCapacity);
        RangeUtil.checkGreaterThanOrEqual(maxCapacity, 4, "maxCapacity");
        RangeUtil.checkLessThan(roundToPowerOfTwo(initialCapacity), roundToPowerOfTwo(maxCapacity),
                "initialCapacity");
        maxQueueCapacity = ((long) Pow2.roundToPowerOfTwo(maxCapacity)) << 1;
    }
}

MpscChunkedArrayQueueColdProducerFields代码就多了一行maxQueueCapacity = ((long) Pow2.roundToPowerOfTwo(maxCapacity)) << 1;代码。maxQueueCapacity只不过计算了一下maxQueueCapacity = maxCapacity * 2 而已。
MpscUnboundedArrayQueue源码如下。

public class MpscUnboundedArrayQueue<E> extends BaseMpscLinkedArrayQueue<E> {
    byte b000, b001, b002, b003, b004, b005, b006, b007;//  8b
    byte b010, b011, b012, b013, b014, b015, b016, b017;// 16b
    byte b020, b021, b022, b023, b024, b025, b026, b027;// 24b
    byte b030, b031, b032, b033, b034, b035, b036, b037;// 32b
    byte b040, b041, b042, b043, b044, b045, b046, b047;// 40b
    byte b050, b051, b052, b053, b054, b055, b056, b057;// 48b
    byte b060, b061, b062, b063, b064, b065, b066, b067;// 56b
    byte b070, b071, b072, b073, b074, b075, b076, b077;// 64b
    byte b100, b101, b102, b103, b104, b105, b106, b107;// 72b
    byte b110, b111, b112, b113, b114, b115, b116, b117;// 80b
    byte b120, b121, b122, b123, b124, b125, b126, b127;// 88b
    byte b130, b131, b132, b133, b134, b135, b136, b137;// 96b
    byte b140, b141, b142, b143, b144, b145, b146, b147;//104b
    byte b150, b151, b152, b153, b154, b155, b156, b157;//112b
    byte b160, b161, b162, b163, b164, b165, b166, b167;//120b
    byte b170, b171, b172, b173, b174, b175, b176, b177;//128b

    public MpscUnboundedArrayQueue(int chunkSize) {
        super(chunkSize);
    }


    @Override
    protected long availableInQueue(long pIndex, long cIndex) {
        return Integer.MAX_VALUE;
    }

    @Override
    public int capacity() {
        return MessagePassingQueue.UNBOUNDED_CAPACITY;
    }

    @Override
    public int drain(Consumer<E> c) {
        return drain(c, 4096);
    }

    @Override
    public int fill(Supplier<E> s) {
        return MessagePassingQueueUtil.fillUnbounded(this, s);
    }

    @Override
    protected int getNextBufferSize(E[] buffer) {
        return length(buffer);
    }

    @Override
    protected long getCurrentBufferCapacity(long mask) {
        return mask;
    }
}

MpscUnboundedArrayQueue的源码唯一需要注意就是availableInQueue()方法，为什么呢？在offerSlowPath()方法中用到过。

private int offerSlowPath(long mask, long pIndex, long producerLimit) {
    final long cIndex = lvConsumerIndex();
    long bufferCapacity = getCurrentBufferCapacity(mask);

    if (cIndex + bufferCapacity > pIndex) {
        if (!casProducerLimit(producerLimit, cIndex + bufferCapacity)) {
            return RETRY;
        } else {
            return CONTINUE_TO_P_INDEX_CAS;
        }
    }

    else if (availableInQueue(pIndex, cIndex) <= 0) {
        return QUEUE_FULL;
    }

    else if (casProducerIndex(pIndex, pIndex + 1)) {
        return QUEUE_RESIZE;
    } else {
        return RETRY;
    }
}

而MpscUnboundedArrayQueue的availableInQueue()这个方法返回Integer.MAX_VALUE，可以说 MpscUnboundedArrayQueue队列使用时，永远不会出现QUEUE_FULL的情况，因此offer()方法向队列中添加元素永远时成功的。关于drain(),fill()方法，在真正Netty()用到时，再来分析。

MpscArrayQueue源码解析

先来看MpscArrayQueue的类结构。
在这里插入图片描述
因为MpscChunkedArrayQueue类的结构和MpscArrayQueue区别很大，这里就不来做类比了。但看懂了MpscChunkedArrayQueue源码，再来理解MpscArrayQueue源码就很简单了。先来看MpscArrayQueue构造函数。

public MpscArrayQueue(final int capacity) {
    super(capacity);
}

MpscArrayQueueL3Pad(int capacity) {
    super(capacity);
}

MpscArrayQueueConsumerIndexField(int capacity) {
    super(capacity);
}

MpscArrayQueueL2Pad(int capacity) {
    super(capacity);
}

MpscArrayQueueProducerLimitField(int capacity) {
    super(capacity);
    this.producerLimit = capacity;
}



MpscArrayQueueMidPad(int capacity) {
    super(capacity);
}


MpscArrayQueueProducerIndexField(int capacity) {
    super(capacity);
}

MpscArrayQueueL1Pad(int capacity) {
    super(capacity);
}

ConcurrentCircularArrayQueue(int capacity) {
    int actualCapacity = Pow2.roundToPowerOfTwo(capacity);
    mask = actualCapacity - 1;
    buffer = allocateRefArray(actualCapacity);
}

那么多层调用，最关键的就是上面加粗代码，经过roundToPowerOfTwo函数的处理，capacity 总是变成2的幂次方，这一点和HashMap的数组长度计算一样。直接创建(E[]) new Object[capacity]; 作为buffer。写了那么多，就是创建了一个对象数组。

接下来看MpscArrayQueue的offer()方法。

public boolean offer(final E e) {
    if (null == e) {
        throw new NullPointerException();
    }

    // use a cached view on consumer index (potentially updated in loop)
    final long mask = this.mask;
    long producerLimit = lvProducerLimit();
    long pIndex;
    do {
        pIndex = lvProducerIndex();
        
        if (pIndex >= producerLimit) {
            final long cIndex = lvConsumerIndex();
            // producerLimit - cIndex 大于数组长度，则表示循环数组中有空位
            producerLimit = cIndex + mask + 1;
			// 循环数组已满
            if (pIndex >= producerLimit) {
                return false; // FULL :(
            } else {
                // update producer limit to the next index that we must recheck the consumer index
                // this is racy, but the race is benign，重新修改producerLimit的值
                soProducerLimit(producerLimit);
            }
        }
    }
    // 这里用while循环，就是处理锁竟争
    while (!casProducerIndex(pIndex, pIndex + 1));
    /*
     * NOTE: the new producer index value is made visible BEFORE the element in the array. If we relied on
     * the index visibility to poll() we would need to handle the case where the element is not visible.
     */

    // Won CAS, move on to storing
    final long offset = calcCircularRefElementOffset(pIndex, mask);
    soRefElement(buffer, offset, e);
    return true; // AWESOME :)
}


public static long calcCircularRefElementOffset(long index, long mask) {
    return REF_ARRAY_BASE + ((index & mask) << REF_ELEMENT_SHIFT);
}

calcCircularRefElementOffset() 方法就是循环数组索引计算，如 (index & mask) 表示在循环数组的索引，假设循环数组的长度为8 ，mask = 7 ，那么index = 8 时，8 & 7 = 0 等价于 1000 & 0111 = 0000。
而当index = 9时，计算出数组索引位置为 9 & 7 = 1001 & 0111 = 1，当然calcCircularRefElementOffset()计算出的值为 REF_ARRAY_BASE + ((index & mask) << REF_ELEMENT_SHIFT) = 16 + 在数组处的索引 << 2 等价于 16 + 在数组处的索引 * 2 。因此在MpscArrayQueue 循环数组的结构如下。
在这里插入图片描述
可能还有小伙伴会想，producerLimit是long类型，如果producerLimit的值大于Long.MAX_VALUE时，会不会队列中存不进元素了。请看下图

Long.MAX_VALUE + 2 始终是大于 Long.MAX_VALUE + 1的，因此不用担心出现Long越界问题。其原理和循环数组一样。接下来看poll()方法。

public E poll() {
    final long cIndex = lpConsumerIndex();
    final long offset = calcCircularRefElementOffset(cIndex, mask);
    // Copy field to avoid re-reading after volatile load
    final E[] buffer = this.buffer;

    // If we can't see the next available element we can't poll
    E e = lvRefElement(buffer, offset);
    if (null == e) {
        /*
         * NOTE: Queue may not actually be empty in the case of a producer (P1) being interrupted after
         * winning the CAS on offer but before storing the element in the queue. Other producers may go on
         * to fill up the queue after this element.
         */
        if (cIndex != lvProducerIndex()) {
            do {
                e = lvRefElement(buffer, offset);
            }
            while (e == null);
        } else {
            return null;
        }
    }
    spRefElement(buffer, offset, null);
    soConsumerIndex(cIndex + 1);
    return e;
}

在出队列时，同样考虑到了生产者的下面两行代码不是原子性问题。

casProducerIndex(pIndex, pIndex + 1)

final long offset = calcCircularRefElementOffset(pIndex, mask);
soRefElement(buffer, offset, e);

如果生产者 pIndex = pIndex + 1 执行成功，但soRefElement(buffer, offset, e) 还没有执行，导致offset位置的元素为空，被消费者代码检测到了。则消费者会执行代码块
do {
e = lvRefElement(buffer, offset);
}
while (e == null);
一直死循环等待生产者将元素添加到队列中，如果生产者执行soRefElement(buffer, offset, e) 成功，此时数组offset位置元素不为空，被消费者监控到，退出while()循环，此时消费者取得生产者刚刚加入到offset处的元素。会发现Netty中使用MpscArrayQueue队列时，同样直接使用while循环，没有等待和唤醒的过程，也是对性能要求到极致的一个体现。

MpscLinkedAtomicQueue源码解析

先来看MpscLinkedAtomicQueue的类结构。

在这里插入图片描述
对于MpscLinkedAtomicQueue队列的理解，还是看个例子。

public static void main(String[] args) {
	MpscLinkedAtomicQueue queue = new MpscLinkedAtomicQueue();
	queue.offer("e0");
	queue.offer("e1");
	queue.offer("e2");
	queue.offer("e3");
	queue.offer("e4");
	System.out.println(queue.poll());
	System.out.println(queue.poll());
	
	
	
	
	
	queue.offer("e5");
	queue.offer("e6");
	queue.offer("e7");
	
	queue.poll();
	queue.poll();
	queue.poll();
	queue.poll();
}

对于MpscLinkedAtomicQueue还是从构造函数，offer()，poll()方法开始分析，不过从MpscLinkedAtomicQueue的名字来看，就知道它是一个链表结构。先来看MpscLinkedAtomicQueue的构造方法

public MpscLinkedAtomicQueue() {
	// 创建一个LinkedQueueAtomicNode节点
    LinkedQueueAtomicNode<E> node = newNode();
    // 消费者指向node节点
    spConsumerNode(node);
    // 生产者指向Node节点
    xchgProducerNode(node);
}


protected final LinkedQueueAtomicNode<E> newNode() {
    return new LinkedQueueAtomicNode<E>();
}

 abstract class BaseLinkedAtomicQueueConsumerNodeRef<E> extends BaseLinkedAtomicQueuePad1<E> {

    private static final AtomicReferenceFieldUpdater<BaseLinkedAtomicQueueConsumerNodeRef, LinkedQueueAtomicNode> C_NODE_UPDATER = AtomicReferenceFieldUpdater.newUpdater(BaseLinkedAtomicQueueConsumerNodeRef.class, LinkedQueueAtomicNode.class, "consumerNode");

    private volatile LinkedQueueAtomicNode<E> consumerNode;

    final void spConsumerNode(LinkedQueueAtomicNode<E> newValue) {
        C_NODE_UPDATER.lazySet(this, newValue);
    }

    @SuppressWarnings("unchecked")
    final LinkedQueueAtomicNode<E> lvConsumerNode() {
        return consumerNode;
    }

    final LinkedQueueAtomicNode<E> lpConsumerNode() {
        return consumerNode;
    }
}

abstract class BaseLinkedAtomicQueueProducerNodeRef<E> extends BaseLinkedAtomicQueuePad0<E> {

    private static final AtomicReferenceFieldUpdater<BaseLinkedAtomicQueueProducerNodeRef, LinkedQueueAtomicNode> P_NODE_UPDATER = AtomicReferenceFieldUpdater.newUpdater(BaseLinkedAtomicQueueProducerNodeRef.class, LinkedQueueAtomicNode.class, "producerNode");

    private volatile LinkedQueueAtomicNode<E> producerNode;

    final void spProducerNode(LinkedQueueAtomicNode<E> newValue) {
        P_NODE_UPDATER.lazySet(this, newValue);
    }

    final void soProducerNode(LinkedQueueAtomicNode<E> newValue) {
        P_NODE_UPDATER.lazySet(this, newValue);
    }

    final LinkedQueueAtomicNode<E> lvProducerNode() {
        return producerNode;
    }

    final boolean casProducerNode(LinkedQueueAtomicNode<E> expect, LinkedQueueAtomicNode<E> newValue) {
        return P_NODE_UPDATER.compareAndSet(this, expect, newValue);
    }

    final LinkedQueueAtomicNode<E> lpProducerNode() {
        return producerNode;
    }

    protected final LinkedQueueAtomicNode<E> xchgProducerNode(LinkedQueueAtomicNode<E> newValue) {
        return P_NODE_UPDATER.getAndSet(this, newValue);
    }
}

生产者和消费者都指向了新创建的node节点,node节点是什么结构呢？

public final class LinkedQueueAtomicNode<E> extends AtomicReference<LinkedQueueAtomicNode<E>> {
    /**
     *
     */
    private static final long serialVersionUID = 2404266111789071508L;
    private E value;

    LinkedQueueAtomicNode() {
    }

    LinkedQueueAtomicNode(E val) {
        spValue(val);
    }

    /**
     * Gets the current value and nulls out the reference to it from this node.
     *
     * @return value
     */
    public E getAndNullValue() {
        E temp = lpValue();
        spValue(null);
        return temp;
    }

    public E lpValue() {
        return value;
    }

    public void spValue(E newValue) {
        value = newValue;
    }

    public void soNext(LinkedQueueAtomicNode<E> n) {
        lazySet(n);
    }

    public void spNext(LinkedQueueAtomicNode<E> n) {
        lazySet(n);
    }

    public LinkedQueueAtomicNode<E> lvNext() {
        return get();
    }
}


public class AtomicReference<V> implements java.io.Serializable {
    private static final long serialVersionUID = -1848883965231344442L;

    private static final Unsafe unsafe = Unsafe.getUnsafe();
    private static final long valueOffset;

    static {
        try {
            valueOffset = unsafe.objectFieldOffset
                (AtomicReference.class.getDeclaredField("value"));
        } catch (Exception ex) { throw new Error(ex); }
    }

    private volatile V value;


    public AtomicReference(V initialValue) {
        value = initialValue;
    }


    public AtomicReference() {
    }

    public final V get() {
        return value;
    }


    ...
}

接下来看LinkedQueueAtomicNode结构，发现有点怪没有，LinkedQueueAtomicNode本身有一个private value属性，继承AtomicReference时，在AtomicReference内部也定义了private volatile V value属性，相当于LinkedQueueAtomicNode 有两个value属性，那两个value属性分别用来存什么呢？从后面的代码中可以得知，LinkedQueueAtomicNode自身的value属性用来存储加入队列中的元素，而父类AtomicReference的value属性用来存储链表中next节点的引用。为什么要用AtomicReference来存储next节点的定义呢？

【AtomicReference的定义。】

AtomicReference是作用是对”对象”进行原子操作。提供了一种读和写都是原子性的对象引用变量。原子意味着多个线程试图改变同一个AtomicReference(例如比较和交换操作)将不会使得AtomicReference处于不一致的状态。接下来看offer()方法。

public boolean offer(final E e) {
    if (null == e) {
        throw new NullPointerException();
    }
    final LinkedQueueAtomicNode<E> nextNode = newNode(e);
    // 以原子方式设置为给定值，并返回旧值，先获取当前对象，在设置新的对象
    // 将producerNode 设置为nextNode
    final LinkedQueueAtomicNode<E> prevProducerNode = xchgProducerNode(nextNode);
    // Should a producer thread get interrupted here the chain WILL be broken until that thread is resumed
    // and completes the store in prev.next. This is a "bubble".
    // 将 prevProducerNode 在 AtomicReference.value属性设置为nextNode
    prevProducerNode.soNext(nextNode);
    return true;
}

下图中模拟了e1,e2 元素插入过程，其中值为null的为头节点。
在这里插入图片描述
在MpscLinkedAtomicQueue队列创建时，生产者和消费者都指向了头节点。向队列中插入e1后，生产者的producerNode指向了e1新节点，而消费者依然指向头节点，e2 插入和e1的原理一样。

queue.offer("e0");
queue.offer("e1");
queue.offer("e2");
queue.offer("e3");
queue.offer("e4");
System.out.println(queue.poll());

看一下queue.poll()时队列的结构。

在这里插入图片描述
此时看poll()方法的实现。

public E poll() {
    final LinkedQueueAtomicNode<E> currConsumerNode = lpConsumerNode();
    LinkedQueueAtomicNode<E> nextNode = currConsumerNode.lvNext();
    if (nextNode != null) {
        return getSingleConsumerNodeValue(currConsumerNode, nextNode);
    } else if (currConsumerNode != lvProducerNode()) {
        nextNode = spinWaitForNextNode(currConsumerNode);
        // got the next node...
        return getSingleConsumerNodeValue(currConsumerNode, nextNode);
    }
    return null;
}

从队列中消费元素分两种情况，第一种情况，如果next节点为不为空，第二种情况next()节点为空。当第一种情况，则调用下面代码。

protected E getSingleConsumerNodeValue(LinkedQueueAtomicNode<E> currConsumerNode, LinkedQueueAtomicNode<E> nextNode) {
    // we have to null out the value because we are going to hang on to the node
    // 将获取LinkedQueueAtomicNode.value属性值，并设置LinkedQueueAtomicNode.value
    // 值为空
    final E nextValue = nextNode.getAndNullValue();
    // Fix up the next ref of currConsumerNode to prevent promoted nodes from keeping new ones alive.
    // We use a reference to self instead of null because null is already a meaningful value (the next of
    // producer node is null).
    // 设置LinkedQueueAtomicNode.AtomicReference.value 为自身
    // 这样做的目的是生成一个自己指向自己的一个对象，
    // 让currConsumerNode成为孤岛元素 ，从而让JVM 回收currConsumerNode ，也就是促使之前的头节点被回收
    currConsumerNode.soNext(currConsumerNode);
    // 让当前节点成为新的头节点
    spConsumerNode(nextNode);
    // currConsumerNode is now no longer referenced and can be collected
    return nextValue;
}

下图就是getSingleConsumerNodeValue()方法调用的具体情况。
在这里插入图片描述
对于第二种情况，先来看一个图。

在这里插入图片描述
这个图什么意思呢？本来consumerNode和producerNode都指向头节点，但此时生产者调offer()方法中的这两行代码。

final LinkedQueueAtomicNode<E> nextNode = newNode(e);
final LinkedQueueAtomicNode<E> prevProducerNode = xchgProducerNode(nextNode);

但这一行代码没有调用

prevProducerNode.soNext(nextNode);

此时的场景就万变也上图模式，producerNode指向了新节点e2，consumerNode依然是指向头节点，此时刚好poll()方法调用，执行了
final LinkedQueueAtomicNode<E> currConsumerNode = lpConsumerNode();
LinkedQueueAtomicNode<E> nextNode = currConsumerNode.lvNext();

就出现了第二种情况，consumerNode 的next节点为空，但consumerNode不等于producerNode节点，因此对于这种情况会调用spinWaitForNextNode()方法来处理。

LinkedQueueAtomicNode<E> spinWaitForNextNode(LinkedQueueAtomicNode<E> currNode) {
    LinkedQueueAtomicNode<E> nextNode;
    while ((nextNode = currNode.lvNext()) == null) {
        // spin, we are no longer wait free
    }
    return nextNode;
}

而spinWaitForNextNode()方法也很简单，写一个while()死循环，一直等到生产者prevProducerNode.soNext(nextNode);这一行代码执行完毕，此时consumerNode将指向新生成的节点e2。此时场景如下图所示。
在这里插入图片描述
因此会继续调用spinWaitForNextNode()方法处理e2节点即可。

当然，比如CAS操作， AtomicReferenceFieldUpdater.newUpdater()这些函数的使用，这里就不深入了，随便网上找找即可明白他的意思。

MpscGrowableAtomicArrayQueue源码解析

顾名思义，Growable表示增长的，Atomic表示原子的，那么它和MpscChunkedArrayQueue数组的区别在哪里呢？构造方法差不多，就不看了，先来看offer()方法的源码。

public boolean offer(final E e) {
    if (null == e) {
        throw new NullPointerException();
    }
    long mask;
    AtomicReferenceArray<E> buffer;
    long pIndex;
    while (true) {
        long producerLimit = lvProducerLimit();
        pIndex = lvProducerIndex();
        // lower bit is indicative of resize, if we see it we spin until it's cleared
        if ((pIndex & 1) == 1) {
            continue;
        }
        // pIndex is even (lower bit is 0) -> actual index is (pIndex >> 1)
        // mask/buffer may get changed by resizing -> only use for array access after successful CAS.
        mask = this.producerMask;
        buffer = this.producerBuffer;
        // a successful CAS ties the ordering, lv(pIndex) - [mask/buffer] -> cas(pIndex)
        // assumption behind this optimization is that queue is almost always empty or near empty
        if (producerLimit <= pIndex) {
            int result = offerSlowPath(mask, pIndex, producerLimit);
            switch (result) {
                case CONTINUE_TO_P_INDEX_CAS:
                    break;
                case RETRY:
                    continue;
                case QUEUE_FULL:
                    return false;
                case QUEUE_RESIZE:
                    resize(mask, buffer, pIndex, e, null);
                    return true;
            }
        }
        if (casProducerIndex(pIndex, pIndex + 2)) {
            break;
        }
    }
    // INDEX visible before ELEMENT
    final int offset = modifiedCalcCircularRefElementOffset(pIndex, mask);
    // release element e
    soRefElement(buffer, offset, e);
    return true;
}

上面加粗代码就是区别所在，在MpscChunkedArrayQueue队列中，使用的是普通数组，而在MpscGrowableAtomicArrayQueue队列中，使用的是AtomicReferenceArray<E> buffer;数组（对数组操作是土原子的），刚好和MpscGrowableAtomicArrayQueue的名字一样，提供的是原子数组。再来看modifiedCalcCircularRefElementOffset()方法。

static int modifiedCalcCircularRefElementOffset(long index, long mask) {
    return (int) (index & mask) >> 1;
}

从这里可以看出，MpscGrowableAtomicArrayQueue中的offset实际计算出的是当前元素在队列中数组的索引值，并不是相对于数组内存地址的偏移量，而MpscChunkedArrayQueue的modifiedCalcCircularRefElementOffset()方法实际计算的是当前元素相对于数组在内存地址的偏移量。还有一点没有看到，就是一个是增长数组，那是什么意思呢？既然是可增长的数组，那数组的大小肯定是可变化的，因此看它的resize()方法。

private void resize(long oldMask, E[] oldBuffer, long pIndex, E e, Supplier<E> s) {
    assert (e != null && s == null) || (e == null || s != null);
    int newBufferLength = getNextBufferSize(oldBuffer);
    final E[] newBuffer;
    try {
        newBuffer = allocateRefArray(newBufferLength);
    } catch (OutOfMemoryError oom) {
        assert lvProducerIndex() == pIndex + 1;
        soProducerIndex(pIndex);
        throw oom;
    }

    producerBuffer = newBuffer;
    final int newMask = (newBufferLength - 2) << 1;
    producerMask = newMask;

    final long offsetInOld = modifiedCalcCircularRefElementOffset(pIndex, oldMask);
    final long offsetInNew = modifiedCalcCircularRefElementOffset(pIndex, newMask);

    soRefElement(newBuffer, offsetInNew, e == null ? s.get() : e);// element in new array
    soRefElement(oldBuffer, nextArrayOffset(oldMask), newBuffer);// buffer linked


    //System.out.println("数组扩容 ");
    //System.out.println(" offer Index=" + (offsetInNew-16)/4 + ", e =" +  e);

    // ASSERT code
    final long cIndex = lvConsumerIndex();
    final long availableInQueue = availableInQueue(pIndex, cIndex);
    RangeUtil.checkPositive(availableInQueue, "availableInQueue");

    // Invalidate racing CASs
    // We never set the limit beyond the bounds of a buffer
    soProducerLimit(pIndex + Math.min(newMask, availableInQueue));

    // make resize visible to the other producers
    soProducerIndex(pIndex + 2);

    // INDEX visible before ELEMENT, consistent with consumer expectation

    // make resize visible to consumer
    soRefElement(oldBuffer, offsetInOld, JUMP);
}

在扩容方法中，和MpscChunkedArrayQueue的最大区别就是getNextBufferSize()方法。
在这里插入图片描述
进入MpscGrowableAtomicArrayQueue的getNextBufferSize()方法。

protected int getNextBufferSize(AtomicReferenceArray<E> buffer) {
    final long maxSize = maxQueueCapacity / 2;
    RangeUtil.checkLessThanOrEqual(length(buffer), maxSize, "buffer.length");
    final int newSize = 2 * (length(buffer) - 1);
    return newSize + 1;
}

通过代码得知，newArrayLength = (oldArrayLength - 1 ) * 2 + 1 。在之前MpscChunkedArrayQueue分析过数组长度计算，如果我们传入期望数组长度的参数initialCapacity =4 ，则实际数组长度为 5，实际存储有效元素个数为3 ，因为预留了2个位置，第一个位置用于存储JUMP ，第二个位置用来存储next数组的引用，在MpscGrowableAtomicArrayQueue中，假如initialCapacity参数值为4，数组的初始长度为5，那么第一次数组扩容时新数组的容量为 (5 - 1 ) * 2 + 1 = 9 ，因此数组的长度变为9 ，接下来证实我们的分析，向队列中插入16个元素，再在poll()方法中打断点。
在这里插入图片描述
最终看队列中元素结构。

我相信你对MpscGrowableAtomicArrayQueue的源码肯定理解了，这些基础知识很重要，对深入理解Netty 源码有莫大的帮助。

总结：

当然网上也有对Netty中的队列和JDK 本身自带的队列进行对比，有兴趣可以看看这篇博客。 Netty中Queue的实现。

网上提供了一个性能测试的例子。

public class TestQueue {
    private static int PRD_THREAD_NUM;
    private static int C_THREAD_NUM = 1;

    private static int N = 1 << 20;
    private static ExecutorService executor;

    public static void main(String[] args) throws Exception {
        System.out.println("Producer\tConsumer\tcapacity \t LinkedBlockingQueue \t ArrayBlockingQueue \t MpscLinkedAtomicQueue \t MpscChunkedArrayQueue \t MpscArrayQueue");

        for (int j = 1; j < 8; j++) {
            PRD_THREAD_NUM = (int) Math.pow(2, j);
            executor = Executors.newFixedThreadPool(PRD_THREAD_NUM * 2);

            for (int i = 9; i < 12; i++) {
                int length = 1 << i;
                System.out.print(PRD_THREAD_NUM + "\t\t");
                System.out.print(C_THREAD_NUM + "\t\t");
                System.out.print(length + "\t\t");
                System.out.print(doTest2(new LinkedBlockingQueue<Integer>(length), N) + "/s\t\t");
                System.out.print(doTest2(new ArrayBlockingQueue<Integer>(length), N) + "/s\t\t");


                System.out.print(doTest2(new MpscLinkedAtomicQueue<Integer>(), N) + "/s\t\t");
                System.out.print(doTest2(new MpscChunkedArrayQueue<Integer>(length), N) + "/s\t\t");
                System.out.print(doTest2(new MpscArrayQueue<Integer>(length), N) + "/s");
                System.out.println();
            }

            executor.shutdown();
        }
    }

    private static class Producer implements Runnable {
        int n;
        Queue<Integer> q;

        public Producer(int initN, Queue<Integer> initQ) {
            n = initN;
            q = initQ;
        }

        public void run() {
            while (n > 0) {
                if (q.offer(n)) {
                    n--;
                }
            }
        }
    }

    private static class Consumer implements Callable<Long> {
        int n;
        Queue<Integer> q;

        public Consumer(int initN, Queue<Integer> initQ) {
            n = initN;
            q = initQ;
        }

        public Long call() {
            long sum = 0;
            Integer e = null;
            while (n > 0) {
                if ((e = q.poll()) != null) {
                    sum += e;
                    n--;
                }

            }
            return sum;
        }
    }

    private static long doTest2(final Queue<Integer> q, final int n)
            throws Exception {
        CompletionService<Long> completionServ = new ExecutorCompletionService<>(executor);

        long t = System.nanoTime();
        for (int i = 0; i < PRD_THREAD_NUM; i++) {
            executor.submit(new Producer(n / PRD_THREAD_NUM, q));
        }
        for (int i = 0; i < C_THREAD_NUM; i++) {
            completionServ.submit(new Consumer(n / C_THREAD_NUM, q));
        }

        for (int i = 0; i < 1; i++) {
            completionServ.take().get();
        }

        t = System.nanoTime() - t;
        return (long) (1000000000.0 * N / t); // Throughput, items/sec
    }
}

执行结果如下
在这里插入图片描述
整理一下打印结果

    Producer    Consumer    capacity     LinkedBlockingQueue     ArrayBlockingQueue      MpscLinkedAtomicQueue   MpscChunkedArrayQueue   MpscArrayQueue
        2       1           512             1419013 /s              2092977/s                   10754689/s              6945393/s        8308789/s
        2       1           1024            1865775 /s              2915530/s                   9365189/s               9729218/s        9186625/s
        2       1           2048            1997851 /s              4528850/s                   8341806/s               7941808/s        8104603/s
        4       1           512             1901023 /s              2263058/s                   13050792/s              5653715/s        3949259/s
        4       1           1024            1549132 /s              2569067/s                   13211153/s              5815541/s        5841197/s
        4       1           2048            2095500 /s              3098407/s                   13631351/s              6085800/s        5868342/s
        8       1           512             1424075 /s              1108101/s                   13793927/s              2989326/s        2384934/s
        8       1           1024            1406966 /s              1804808/s                   14840094/s              3535979/s        2303590/s
        8       1           2048            1944212 /s              1993756/s                   8565539/s               3710341/s        2976402/s
        16      1           512             1244381 /s              835881/s                    2307061/s               1645206/s        1412146/s
        16      1           1024            1037022 /s              875003/s                    13424806/s              1491496/s        1360235/s
        16      1           2048            1928913 /s              907676/s                    14094695/s              1374265/s        1533324/s
        32      1           512             495076  /s              359086/s                    14327902/s              1006259/s        850638/s
        32      1           1024            664415  /s              457947/s                    14186988/s              930900/s         688771/s
        32      1           2048            1647541 /s              604856/s                    13788375/s              846512/s         709221/s
        64      1           512             511937  /s              198511/s                    14171304/s              507598/s         513178/s
        64      1           1024            789706  /s              300840/s                    13914407/s              480519/s         441480/s
        64      1           2048            542532  /s              219969/s                    11922536/s              509954/s         602099/s
        128     1           512             116516  /s              87450/s                     6666914/s               265239/s         243085/s
        128     1           1024            314286  /s              120221/s                    13985255/s              253715/s         227099/s
        128     1           2048            1251260 /s              131878/s                    13773935/s              221466/s         401445/s

从上面可以看到：

Mpsc*Queue表现最好，而且性能表现也最稳定；
并发数较低的时候,基于数组的队列比基于链表的队列表现要好，，推测有可能是因为数组在内存中是连续分配的，因此加载的时候可以有效利用缓存行，减少读的次数；而链表在内存的地址不是连续的，随机读代价比较大；
并发数较高的时候，基于链表的队列比基于数组的队列表现要好；LinkedBlockingQueue因为入列和出列采用不同的锁，因此锁竞争应该比ArrayBlockingQueue小；而MpscLinkedAtomicQueue没有容量限制，使用AtomicReference提供的XCHG功能修改链接即可达到出列和入列的目的，效率特别高；
MpscChunkedArrayQueue相对于MpscArrayQueue，提供了动态扩容大能力；

关于JDK 队列的源码，在之前的博客都做了详细分析，这里就不再赘述，总之Netty 为了性能，无所不用其极。

源码地址：

https://gitee.com/quyixiao/JCTools.git

柳擎

关注

2
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Netty源码性能分析MpscChunkedArrayQueue & MpscUnboundedArrayQueue & MpscArrayQueue & MpscLinkedAtomicQueue

Netty 队列源码解析
复制链接

扫一扫