Disruptor原理浅析

Disruptor是一个高吞吐量的异步处理框架,LMAX号称用它可以1秒钟处理600万订单,从一个简单的例子来分析Disruptor,下面的代码是基于2.10.4版本,不同的版本API可能会有些变化,代码仅供参考。

private static void testDisruptor() {
	RingBuffer<MyEvent> ringBuffer = new RingBuffer<MyEvent>(
			new EventFactory<MyEvent>() {
				@Override
				public MyEvent newInstance() {
					return new MyEvent();
				}
			}, new SingleThreadedClaimStrategy(RING_SIZE),
			new BlockingWaitStrategy());

	// SequenceBarrier
	SequenceBarrier barrier1 = ringBuffer.newBarrier();

	// 注册一个EventProcessor
	BatchEventProcessor<MyEvent> processor1 = new BatchEventProcessor<MyEvent>(
			ringBuffer, barrier1, new EventHandler<MyEvent>() {
				@Override
				public void onEvent(MyEvent event, long sequence,
						boolean endOfBatch) throws Exception {
					System.out.println(event.s + "A:" + event.i
							+ ":Thread.id-"
							+ Thread.currentThread().getId());
				}
			});

	// 
	ringBuffer.setGatingSequences(Util.getSequencesFor(processor1));
	publishEvent(ringBuffer);
	EXECUTOR.execute(processor1);
}

private static void publishEvent(RingBuffer<MyEvent> ringBuffer) {
	final EventPublisher<MyEvent> eventPublisher = new EventPublisher<MyEvent>(
			ringBuffer);
	EXECUTOR_0.execute(new Runnable() {
		
		@Override
		public void run() {

			for (long i = 0; i < Long.MAX_VALUE; i++) {
				final long number = i;
				eventPublisher.publishEvent(new EventTranslator<MyEvent>() {

					@Override
					public void translateTo(MyEvent event, long sequence) {
						event.s = "a" + number;
						event.i = "b" + number;
					}
				});
			}
		}
	});
}
首先来看下RingBuffer,RingBuffer是Disruptor框架的核心,它巧妙的设计是Disrutpor高性能的关键。RingBuffer是基于数组的实现的环形队列,与基于链表的线程安全队列LinkedBlockingQueue相比,RingBuffer不会删除元素对象对GC友好,而且数组元素访问比链表更快。相比于同样基于数组的线程安全队列ArrayBlockingQueue,RingBuffer同样具有它的优势,来看看RingBuffer的代码,由于篇幅省略一些代码只列出比较关键的代码:

public final class RingBuffer<T> extends Sequencer
{
    ...
}

public class Sequencer
{
    /** Set to -1 as sequence starting point */
    public static final long INITIAL_CURSOR_VALUE = -1L;

    private final Sequence cursor = new Sequence(Sequencer.INITIAL_CURSOR_VALUE);
    private Sequence[] gatingSequences;

    private final ClaimStrategy claimStrategy;
    private final WaitStrategy waitStrategy;

    /**
     * Construct a Sequencer with the selected strategies.
     *
     * @param claimStrategy for those claiming sequences.
     * @param waitStrategy for those waiting on sequences.
     */
    public Sequencer(final ClaimStrategy claimStrategy, final WaitStrategy waitStrategy)
    {
        this.claimStrategy = claimStrategy;
        this.waitStrategy = waitStrategy;
    }


    /**
     * Claim the next event in sequence for publishing.
     *
     * @return the claimed sequence value
     */
    public long next()
    {
        if (null == gatingSequences)
        {
            throw new NullPointerException("gatingSequences must be set before claiming sequences");
        }

        return claimStrategy.incrementAndGet(gatingSequences);
    }
   
      ...

    private void publish(final long sequence, final int batchSize)
    {
        claimStrategy.serialisePublishing(sequence, cursor, batchSize);
        waitStrategy.signalAllWhenBlocking();
    }

}
生产者线程发布数据到RingBuffer时要走两个步骤,调用next方法维护并获取当前可插入数据的位置,调用publish插入数据并且发布插入数据的位置。next方法会进入ClaimStrategy接口的incrementAndGet方法,生产者是单线程写策略时实现类为SingleThreadedClaimStrategy,多线程写策略是AbstractMultithreadedClaimStrategy。看看单线程策略SingleThreadedClaimStrategy代码:

public final class SingleThreadedClaimStrategy
    implements ClaimStrategy
{
    private final int bufferSize;
    private final PaddedLong minGatingSequence = new PaddedLong(Sequencer.INITIAL_CURSOR_VALUE);
    private final PaddedLong claimSequence = new PaddedLong(Sequencer.INITIAL_CURSOR_VALUE);

    ...

    @Override
    public long incrementAndGet(final Sequence[] dependentSequences)
    {
        long nextSequence = claimSequence.get() + 1L;
        claimSequence.set(nextSequence);
        waitForFreeSlotAt(nextSequence, dependentSequences);

        return nextSequence;
    }

    @Override
    public long incrementAndGet(final int delta, final Sequence[] dependentSequences)
    {
        long nextSequence = claimSequence.get() + delta;
        claimSequence.set(nextSequence);
        waitForFreeSlotAt(nextSequence, dependentSequences);

        return nextSequence;
    }


    @Override
    public void serialisePublishing(final long sequence, final Sequence cursor, final int batchSize)
    {
        cursor.set(sequence);
    }
    
    ...

    private void waitForFreeSlotAt(final long sequence, final Sequence[] dependentSequences)
    {
        final long wrapPoint = sequence - bufferSize;
        if (wrapPoint > minGatingSequence.get())
        {
            long minSequence;
            while (wrapPoint > (minSequence = getMinimumSequence(dependentSequences)))
            {
                LockSupport.parkNanos(1L);
            }

            minGatingSequence.set(minSequence);
        }
    }
}
incrementAndGet就做了两件事情:1、写入位置(claimSequence变量)加1得到当前写入位置,因为是单线程策略,所以无需加任何锁;2、检查计算出的写入位置是否是可写的,什么意思呢?因为RingBuffer是环形的,一旦数据发布到某个位置之后,只有所有的消费者消费了这个位置的数据之后,这个位置才能被覆盖写入新的数据,所以如果生产者发布数据过快,那么非常有可能在数据写了一圈之后发现前面位置的数据还没有被消费,那么这时候生产者线程需要挂起等待消费者消费数据之后腾出位置。啰嗦了这么久看了一坨代码,那么现在总结一下RingBuffer和ArrayBlockingQueue比到底有啥优势,在 学学JUC(一)-- BlockingQueue分析过ArrayBlockingQueue,ArrayBlockingQueue有三个可变的状态:队列头指针(takeIndex)、队列尾指针(putIndex)、队列长度(count),在多线程并发时这三个变量都是潜在的竞争点。而RingBuffer只有数据写入位置(claimSequence)一个竞争点,没有长度变量,读指针由消费者维护(后面介绍),在单写策略下,甚至连写入位置这个竞争点都是不存在的。所以在高并发下,RingBuffer的性能肯定是要优于ArrayBlockingQueue的,这些优势对于LinkedBlockingQueue同样有效。
在单写策略SingleThreadedClaimStrategy代码中看到claimSequence变量是一个PaddedLong类型,来看看这是个什么东东:

public final class PaddedLong extends MutableLong
{
    public volatile long p1, p2, p3, p4, p5, p6 = 7L;

    ...
}

public class MutableLong
{
    private long value = 0L;

    ...
}

看到一段奇怪的代码,PaddedLong中定义了p1-p6六个看起来没什么用处的的long类型变量,为什么要定义这些不会被使用的变量呢?经过对谷歌度娘的一番骚扰后终于明白了原委,这是跟CPU高速缓存有关的,CPU缓存是以缓存行为单位的,缓存行的大小通常是64个字节,CPU在加载数据到缓存的时候会把地址相邻的变量都加载到缓存行把缓存行填满,这种方式会存在一个问题,当把总大小64字节的多个变量都加载到缓存行之后,只要其中一个变量发生变化,那么会导致整个缓存行失效,从而导致整个缓存行的数据都被刷出,这种现象有个专门的术语描述叫false sharing(伪共享),关于伪共享大神有比较详细的阐述 False Sharing。定义这6个long类型的变量的目的就是为了填充缓存行,让value字段单独占用一个缓存行消除伪共享(这里不得不跪拜大神对性能极致的追求,看来写出高质量的代码对计算机体系架构也是必须要了解的)。

对于claimSequence变量在多线程策略中是Sequence类型的,它也添加了缓存行填充处理,多写策略需要保证线程安全,它使用UnSafe工具调用CAS操作确保修改数据时线程安全。

public abstract class AbstractMultithreadedClaimStrategy implements ClaimStrategy
{
    private final int bufferSize;
    private final Sequence claimSequence = new Sequence(Sequencer.INITIAL_CURSOR_VALUE);
    ...
}

public class Sequence
{
    private static final Unsafe unsafe;
    private static final long valueOffset;

    static
    {
        unsafe = Util.getUnsafe();
        final int base = unsafe.arrayBaseOffset(long[].class);
        final int scale = unsafe.arrayIndexScale(long[].class);
        valueOffset = base + (scale * 7);
    }

    private final long[] paddedValue = new long[15];

    public Sequence()
    {
        setOrdered(-1);
    }

    public Sequence(final long initialValue)
    {
        setOrdered(initialValue);
    }

    public long get()
    {
        return unsafe.getLongVolatile(paddedValue, valueOffset);
    }

    public void set(final long value)
    {
        unsafe.putOrderedLong(paddedValue, valueOffset, value);
    }

    private void setOrdered(final long value)
    {
        unsafe.putOrderedLong(paddedValue, valueOffset, value);
    }

    public boolean compareAndSet(final long expectedValue, final long newValue)
    {
        return unsafe.compareAndSwapLong(paddedValue, valueOffset, expectedValue, newValue);
    }

    public String toString()
    {
        return Long.toString(get());
    }
    
    public long incrementAndGet()
    {
        return addAndGet(1L);
    }

    public long addAndGet(final long increment)
    {
        long currentValue;
        long newValue;

        do
        {
            currentValue = get();
            newValue = currentValue + increment;
        }
        while (!compareAndSet(currentValue, newValue));

        return newValue;
    }
}


上面分析的时候说到RingBuffer的读指针时在消费者中维护的,来看看消费者代码:

public final class BatchEventProcessor<T>
    implements EventProcessor
{
    private final AtomicBoolean running = new AtomicBoolean(false);
    private ExceptionHandler exceptionHandler = new FatalExceptionHandler();
    private final RingBuffer<T> ringBuffer;
    private final SequenceBarrier sequenceBarrier;
    private final EventHandler<T> eventHandler;
    private final Sequence sequence = new Sequence(Sequencer.INITIAL_CURSOR_VALUE);

    ...

    /**
     * It is ok to have another thread rerun this method after a halt().
     */
    @Override
    public void run()
    {
        if (!running.compareAndSet(false, true))
        {
            throw new IllegalStateException("Thread is already running");
        }
        sequenceBarrier.clearAlert();

        notifyStart();

        T event = null;
        long nextSequence = sequence.get() + 1L;
        while (true)
        {
            try
            {
                final long availableSequence = sequenceBarrier.waitFor(nextSequence);
                while (nextSequence <= availableSequence)
                {
                    event = ringBuffer.get(nextSequence);
                    eventHandler.onEvent(event, nextSequence, nextSequence == availableSequence);
                    nextSequence++;
                }

                sequence.set(nextSequence - 1L);
            }
            catch (final AlertException ex)
            {
               if (!running.get())
               {
                   break;
               }
            }
            catch (final Throwable ex)
            {
                exceptionHandler.handleEventException(ex, nextSequence, event);
                sequence.set(nextSequence);
                nextSequence++;
            }
        }

        notifyShutdown();

        running.set(false);
    }

    ...
}
sequence就是消费者的读指针,主要逻辑在run方法中,首先递增消费者的sequence,前面说到过生产者写操作对消费者的读位置时有依赖关系的,消费者对当前生产者的写入位置同样有依赖,消费者读数据时,当前数据所在的位置必须要有数据,如果当前位置还没有被写入数据,那么消费者线程需要等待,主要有几种等待策略:BlockingWaitStrategy(条件等待)、BusySpinWaitStrategy(忙等待)、SleepingWaitStrategy(休眠等待)、YieldingWaitStrategy(先忙等待,while循环检查条件100次,然后yield降低线程优先级)。消费者消费完数据之后更新读指针。因为BatchEventProcessor是批量处理,会一次性消费多个位置的数据再更新读指针,WorkProcessor也是个数据消费者,它消费一条数据后就会更新读指针,相比之下批量消费吞吐量更高,CAS操作也是有开销的。

在RingBuffer中还有个细节值得注意,队列的大小必须是2的次幂,这样做的目的是在计算位置对队列大小求模时可以直接用位操作,位操作显然比除法要快(抠的就是细节呀):

public RingBuffer(final EventFactory<T> eventFactory,
                  final ClaimStrategy claimStrategy,
                  final WaitStrategy waitStrategy)
{
    super(claimStrategy, waitStrategy);

    if (Integer.bitCount(claimStrategy.getBufferSize()) != 1)
    {
        throw new IllegalArgumentException("bufferSize must be a power of 2");
    }

    indexMask = claimStrategy.getBufferSize() - 1;
    entries = new Object[claimStrategy.getBufferSize()];

    fill(eventFactory);
}

public T get(final long sequence)
{
    return (T)entries[(int)sequence & indexMask];
}
通过分析Disruptor的代码,可以大致总结一下Disruptor和RingBuffer高性能的秘密:

1、RingBuffer基于数组,数组元素可重用不会被删除,与链表队列相比它对象创建少对GC友好,而且数组访问比链表快
2、减少队列的竞争点,只有在多线程生产者策略下才会对写入位置发生争用,读指针的维护转移到消费者线程,如果生产者是单线程策略那么整个RingBuffer都是无竞争的。而普通的队列会有头、尾、队列长度多个竞争点
3、整个RingBuffer在正常情况下访问都没有锁的痕迹,使用CAS操作代替锁,生产者消费者读写速度不匹配导致需要等待的情况下除外
4、采用缓存填充行消除缓存伪共享
5、数组大小强制2的次幂,通过位运算来求模提升计算速度



  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值