Disruptor Ring Buffer as a Blocking Queue

Author: Wang, Xinglang 

Abstract

For any concurrent multi-threaded system, distributed computing or otherwise,the inter-thread messaging component is an very important component. In Java, the JDK provided
ArrayBlockingQueue, LinkedBlockingQueue, TransferQueue. And Disruptor (http://lmaxexchange.github.io/disruptor/)
is very famous based on its high performance on its inter-thread messaging, but it does not expose as a BlockingQueue. This blog will introduce a new Blocking Queue based on its ring buffer and also with a benchmark result.

Why require Blocking Queue interface

Blocking queue interface is widely used by existed code, changing to Disruptor directly will cause big changes since disruptor want to control the whole thread scheduling. Second, Disruptor only call back when there is an event arrived, but it does not have a chance to let the application control the behavior when the queue is built-up and do some pro-active throttling.This blog will introduce a BlockingQueue implementation on top of RingBuffer, but there is a limitation,this queue can only be consumed by one consumer thread, but for producer, it can be single or multiple producer thread. This will be useful for the Actor Pattern, which use a blocking queue and one thread to drain queue. The reason is the offset of the consumer side can be hard to maintain if there are multiple consumer threads, multiple thread consumers should use Disruptor WorkerPool to replace the JDK Executor.

Implementation

The source code is available on
Github:https://github.com/xinglang/disruptorqueue/tree/master/disruptorqueue
Since this queue only supports one consumer, so let's call it SingleConsumerDisruptorQueue
The SingleConsumerDisruptorQueue will have a ring buffer and a sequence (consumedSeq) for the
cosnumer, the cosnumedSeq will be the gating sequence of the ring buffer. And there a knownPublishedSeq which used to remember the last known published sequence. Since it will be a
blocking queue, so the wait strategy will be BlockingWaitStrategy (Default one).

private final RingBuffer<Event<T>> ringBuffer;

private final Sequence consumedSeq;

private final SequenceBarrier barrier;

private long knownPublishedSeq;

public SingleConsumerDisruptorQueue(int bufferSize, boolean singleProducer) {

if (singleProducer) {

ringBuffer = RingBuffer.createSingleProducer(new Factory<T>(),

normalizeBufferSize(bufferSize));

} else {

ringBuffer = RingBuffer.createMultiProducer(new Factory<T>(),

normalizeBufferSize(bufferSize));

}

consumedSeq = new Sequence();

ringBuffer.addGatingSequences(consumedSeq);

barrier = ringBuffer.newBarrier();

long cursor = ringBuffer.getCursor();

consumedSeq.set(cursor);

knownPublishedSeq = cursor;

}

For the publish, just use ring buffer publish. And inside the ring buffer, there is a event holder which
acts as a value holder of the item.

@Override
public boolean offer(T e) {
long seq;
try {
seq = ringBuffer.tryNext();
} catch (InsufficientCapacityException e1) {
return false;
}
publish(e, seq);
return true;
}
private void publish(T e, long seq) {
Event<T> holder = ringBuffer.get(seq);
holder.setValue(e);
ringBuffer.publish(seq);
}

For the consume, there is a optimization since only one consumer thread. Each time when call the waitFor, it can get the last known published sequence, if the consumer sequence less than the last known published sequence, it does not need call the barrier waitFor method.

@Override

public T take() throws InterruptedException {
long l = consumedSeq.get() + 1;
while (knownPublishedSeq < l) {
try {
knownPublishedSeq = barrier.waitFor(l);
} catch (AlertException e) {
throw new IllegalStateException(e);
} catch (TimeoutException e) {
throw new IllegalStateException(e);
}
}
Event<T> eventHolder = ringBuffer.get(l);
consumedSeq.incrementAndGet();
return eventHolder.getValue();
}

Performace analysis

First of all, it can get all benefits from the ring buffer design:

  • Avoid false sharing
  • Pre-allocated ring buffer, no any instance created during publish/consume
  • Less context switch, the consumer can get a batch of events without interrupted

Below is a benchmark for the queue and LinkedBlockingQueue, ArrayBlockingQueue and Transfer Queue. The Benchmark run on a baremetal machine with Ubuntu, the benchmark use 1 consumer thread, and 1 to 4 producer thread, each round run 32M put/take, the object for put is a constant string, so there is no any GC overhead for the object creation.

Single Producer benchmark

 

$ perf stat java -jar disruptortest.jar type=dbq                          
Producers :1, buffer size: 262144, batch:0                                
SingleConsumerDisruptorQueue transfer rate : 19890 per ms, Used 1687ms for 33554432                                                                  
Performance counter stats for 'java -jar disruptortest.jar type=dbq':     
3729.421847 task-clock # 1.998 CPUs utilized   
1,891 context-switches # 0.001 M/sec           
                      76 CPU-migrations # 0.000 M/sec                            
9,357 page-faults # 0.003 M/sec      
9,434,280,791 cycles # 2.530 GHz [83.38%]  
5,489,619,603 stalled-cycles-frontend # 58.19% frontend cycles idle [83.35%] 
2,618,037,087 stalled-cycles-backend # 27.75% backend cycles idle [66.99%] 
10,797,968,145 instructions # 1.14 insns per cycle       
                                      # 0.51 stalled cycles per insn [83.55%]
1,742,973,721 branches # 467.358 M/sec [83.28%]
      10,213,770 branch-misses # 0.59% of all branches [83.12%]
1.866803438 seconds time elapsed   
            
$ perf stat java -jar disruptortest.jar type=abq                                 
Producers :1, buffer size: 262144, batch:0                                      
ArrayBlockingQueue transfer rate : 2694 per ms, Used 12451ms for 33554432    
Performance counter stats for 'java -jar disruptortest.jar type=abq':
22976.952946 task-clock # 1.824 CPUs utilized  
232,766 context-switches # 0.010 M/sec           
80 CPU-migrations # 0.000 M/sec    
68,531 page-faults # 0.003 M/sec     
58,643,663,103 cycles # 2.552 GHz [83.14%] 
51,767,105,241 stalled-cycles-frontend # 88.27% frontend cycles idle [83.32%]
47,084,355,024 stalled-cycles-backend # 80.29% backend cycles idle [66.51%]
   12,035,035,540 instructions # 0.21 insns per cycle        
                                        # 4.30 stalled cycles per insn [83.44%]
 2,016,738,256 branches # 87.772 M/sec [83.56%]
        20,147,764 branch-misses # 1.00% of all branches [83.49%]
12.596555382 seconds time elapsed                                         
$ perf stat java -jar disruptortest.jar type=lbq                                  
Producers :1, buffer size: 262144, batch:0                                        
LinkedBlockingQueue transfer rate : 1132 per ms, Used 29632ms for 33554432          
Performance counter stats for 'java -jar disruptortest.jar type=lbq':             
58707.942294 task-clock # 1.968 CPUs utilized 
82,377 context-switches # 0.001 M/sec         
97 CPU-migrations # 0.000 M/sec   
133,543 page-faults # 0.002 M/sec     
151,825,969,348 cycles # 2.586 GHz [83.27%] 
139,833,905,165 stalled-cycles-frontend # 92.10% frontend cycles idle [83.40%]
131,712,244,095 stalled-cycles-backend # 86.75% backend cycles idle [66.67%]
10,997,843,405 instructions # 0.07 insns per cycle    
                                          # 12.71 stalled cycles per insn [83.26%]
  1,701,879,665 branches # 28.989 M/sec [83.31%]
         23,369,660 branch-misses # 1.37% of all branches [83.35%]
29.830928757 seconds time elapsed                                            
$ perf stat java -jar disruptortest.jar type=tq                                      
Producers :1, buffer size: 262144, batch:0                                       
LinkedTransferQueue transfer rate : 2139 per ms, Used 15685ms for 33554432       
Performance counter stats for 'java -jar disruptortest.jar type=tq':             
107428.492713 task-clock # 6.737 CPUs utilized
10,542 context-switches # 0.000 M/sec         
100 CPU-migrations # 0.000 M/sec    
245,909 page-faults # 0.002 M/sec     
278,182,169,187 cycles # 2.589 GHz [83.33%] 
204,478,913,414 stalled-cycles-frontend # 73.51% frontend cycles idle [83.36%]
164,497,727,638 stalled-cycles-backend # 59.13% backend cycles idle [66.73%]
90,952,113,104 instructions # 0.33 insns per cycle    
                                         # 2.25 stalled cycles per insn [83.37%]
  32,522,385,525 branches # 302.735 M/sec [83.30%]
             57,227,684 branch-misses # 0.18% of all branches [83.28%]
15.947024802 seconds time elapsed                                                      

Multiple Producer benchmark

$ perf stat java -jar disruptortest.jar type=dq producer=4                        
Producers :4, buffer size: 262144, batch:0                                      
SingleConsumerDisruptorQueue transfer rate : 2859 per ms, Used 46941m for                                           134217728                                                                        
Performance counter stats for 'java -jar disruptortest.jar type=dq producer=4':   
                 118905.839793 task-clock # 2.523 CPUs utilized                          
2,172,912 context-switches # 0.018 M/sec            
280 CPU-migrations # 0.000 M/sec    
28,697 page-faults # 0.000 M/sec    
 ​141,597,737,150 cycles # 1.191 GHz [83.18%]  
113,618,387,640 stalled-cycles-frontend # 80.24% frontend cycles idle [83.42%]
  96,562,209,060 stalled-cycles-backend # 68.19% backend cycles idle [66.86%] 
55,227,379,587 instructions # 0.39 insns per cycle    
                                         # 2.06 stalled cycles per insn [83.45%]
  9,312,400,407 branches # 78.317 M/sec [83.19%]
         64,375,263 branch-misses # 0.69% of all branches [83.35%]
47.133747893 seconds time elapsed                                          
$ perf stat java -jar disruptortest.jar type=abq producer=4                   
Producers :4, buffer size: 262144, batch:0                                
ArrayBlockingQueue transfer rate : 2047 per ms, Used 65546ms for 134217728
Performance counter stats for 'java -jar disruptortest.jar type=abq producer=4':
Multiple Producer benchmark79345.046656 task-clock # 1.208 CPUs utilized                 
3,003,905 context-switches # 0.038 M/sec             
 594 CPU-migrations # 0.000 M/sec      
77,227 page-faults # 0.001 M/sec     
102,931,605,765 cycles # 1.297 GHz [83.10%]  
78,913,722,891 stalled-cycles-frontend # 76.67% frontend cycles idle [83.46%]
65,701,179,927 stalled-cycles-backend # 63.83% backend cycles idle [66.99%]
52,891,419,177 instructions # 0.51 insns per cycle     
                                        # 1.49 stalled cycles per insn [83.41%]
  9,307,141,741 branches # 117.300 M/sec [83.21%]
        79,855,221 branch-misses # 0.86% of all branches [83.23%]
65.694123910 seconds time elapsed                                            
$ perf stat java -jar disruptortest.jar type=lbq producer=4                     
Producers :4, buffer size: 262144, batch:0                                  
LinkedBlockingQueue transfer rate : 2795 per ms, Used 48014ms for 134217728     
Performance counter stats for 'java -jar disruptortest.jar type=lbq producer=4':
110080.375452 task-clock # 2.284 CPUs utilized  
3,644,802 context-switches # 0.033 M/sec            
597 CPU-migrations # 0.000 M/sec    
136,440 page-faults # 0.001 M/sec     
185,250,018,068 cycles # 1.683 GHz [83.46%] 
144,448,559,949 stalled-cycles-frontend # 77.97% frontend cycles idle [83.62%]
118,250,468,418 stalled-cycles-backend # 63.83% backend cycles idle [66.28%]
73,113,563,433 instructions # 0.39 insns per cycle    
                                         # 1.98 stalled cycles per insn [83.21%]
  12,028,209,235 branches # 109.268 M/sec [83.25%]
        129,234,077 branch-misses # 1.07% of all branches [83.40%]
48.189813503 seconds time elapsed                                        
$ perf stat java -jar disruptortest.jar type=tq producer=4                 
Producers :4, buffer size: 262144, batch:0                                 
LinkedTransferQueue transfer rate : 1438 per ms, Used 93273ms for 134217728
Performance counter stats for 'java -jar disruptortest.jar type=tq producer=4':
761878.416668 task-clock # 8.122 CPUs utilized
71,371 context-switches # 0.000 M/sec       
203 CPU-migrations # 0.000 M/sec  
670,788 page-faults # 0.001 M/sec   
1,976,200,012,808 cycles # 2.594 GHz [83.33%] 
1,584,264,715,610 stalled-cycles-frontend # 80.17% frontend cycles idle [83.34%]
1,368,861,011,899 stalled-cycles-backend # 69.27% backend cycles idle [66.68%]
487,816,405,509 instructions # 0.25 insns per cycle   
                                           # 3.25 stalled cycles per insn [83.34%]
   169,135,278,863 branches # 221.998 M/sec [83.33%]
          615,658,238 branch-misses # 0.36% of all branches [83.33%]
93.798977802 seconds time elapsed                                                        

Conclusion

Using RingBuffer of disruptor to create a blocking queue is possible. For single producer/consumer case, it can be 5x faster than JDK default blocking queue implementation. In multiple producer case, it is much faster than arrayblocking queue and transfer queue, the linked blocking queue can achieve similar throughput but disruptor one has less context switches and less memory footprint. The only limitation is it only support the single consumer thread. The benefits for the BlockingQueue implementation on top of RingBuffer is it can be just a replacement for the existed code, and it give user more control via the BlockingQueue interface, the WorkerPool provided by disruptor only allow user to give a event handler for callback.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值