高效C++无锁队列实现-moodycamel::ConcurrentQueue

 

国外一牛人做的,支持多平台,支持多线程写、多线程读,并可指定读写token,转载过来。

感觉作者也时刻维护着他这个项目,我提了一些问题,每次都会及时得到答复,而且回复得非常认真仔细,非常赞!

 

链接地址(可下载源码):https://github.com/cameron314/concurrentqueue

作者的测试效果统计:http://moodycamel.com/blog/2014/a-fast-general-purpose-lock-free-queue-for-c++#benchmarks

 

 

moodycamel::ConcurrentQueue

 

An industrial-strength lock-free queue for C++.

Note: If all you need is a single-producer, single-consumer queue, I have one of those too.

Features

  • Knock-your-socks-off blazing fast performance.
  • Single-header implementation. Just drop it in your project.
  • Fully thread-safe lock-free queue. Use concurrently from any number of threads.
  • C++11 implementation -- elements are moved (instead of copied) where possible.
  • Templated, obviating the need to deal exclusively with pointers -- memory is managed for you.
  • No artificial limitations on element types or maximum count.
  • Memory can be allocated once up-front, or dynamically as needed.
  • Fully portable (no assembly; all is done through standard C++11 primitives).
  • Supports super-fast bulk operations.
  • Includes a low-overhead blocking version (BlockingConcurrentQueue).
  • Exception safe.

Reasons to use

There are not that many full-fledged lock-free queues for C++. Boost has one, but it's limited to objects with trivial assignment operators and trivial destructors, for example. Intel's TBB queue isn't lock-free, and requires trivial constructors too. There's many academic papers that implement lock-free queues in C++, but usable source code is hard to find, and tests even more so.

This queue not only has less limitations than others (for the most part), but it's also faster. It's been fairly well-tested, and offers advanced features like bulk enqueueing/dequeueing (which, with my new design, is much faster than one element at a time, approaching and even surpassing the speed of a non-concurrent queue even under heavy contention).

In short, there was a lock-free queue shaped hole in the C++ open-source universe, and I set out to fill it with the fastest, most complete, and well-tested design and implementation I could. The result is moodycamel::ConcurrentQueue :-)

Reasons not to use

The fastest synchronization of all is the kind that never takes place. Fundamentally, concurrent data structures require some synchronization, and that takes time. Every effort was made, of course, to minimize the overhead, but if you can avoid sharing data between threads, do so!

Why use concurrent data structures at all, then? Because they're gosh darn convenient! (And, indeed, sometimes sharing data concurrently is unavoidable.)

My queue is not linearizable (see the next section on high-level design). The foundations of its design assume that producers are independent; if this is not the case, and your producers co-ordinate amongst themselves in some fashion, be aware that the elements won't necessarily come out of the queue in the same order they were put in relative to the ordering formed by that co-ordination (but they will still come out in the order they were put in by any individual producer). If this affects your use case, you may be better off with another implementation; either way, it's an important limitation to be aware of.

My queue is also not NUMA aware, and does a lot of memory re-use internally, meaning it probably doesn't scale particularly well on NUMA architectures; however, I don't know of any other lock-free queue that is NUMA aware (except for SALSA, which is very cool, but has no publicly available implementation that I know of).

Finally, the queue is not sequentially consistent; there is a happens-before relationship between when an element is put in the queue and when it comes out, but other things (such as pumping the queue until it's empty) require more thought to get right in all eventualities, because explicit memory ordering may have to be done to get the desired effect. In other words, it can sometimes be difficult to use the queue correctly. This is why it's a good idea to follow the samples where possible. On the other hand, the upside of this lack of sequential consistency is better performance.

High-level design

Elements are stored internally using contiguous blocks instead of linked lists for better performance. The queue is made up of a collection of sub-queues, one for each producer. When a consumer wants to dequeue an element, it checks all the sub-queues until it finds one that's not empty. All of this is largely transparent to the user of the queue, however -- it mostly just worksTM.

One particular consequence of this design, however, (which seems to be non-intuitive) is that if two producers enqueue at the same time, there is no defined ordering between the elements when they're later dequeued. Normally this is fine, because even with a fully linearizable queue there'd be a race between the producer threads and so you couldn't rely on the ordering anyway. However, if for some reason you do extra explicit synchronization between the two producer threads yourself, thus defining a total order between enqueue operations, you might expect that the elements would come out in the same total order, which is a guarantee my queue does not offer. At that point, though, there semantically aren't really two separate producers, but rather one that happens to be spread across multiple threads. In this case, you can still establish a total ordering with my queue by creating a single producer token, and using that from both threads to enqueue (taking care to synchronize access to the token, of course, but there was already extra synchronization involved anyway).

I've written a more detailed overview of the internal design, as well as the full nitty-gritty details of the design, on my blog. Finally, the source itself is available for perusal for those interested in its implementation.

Basic use

The entire queue's implementation is contained in one header, concurrentqueue.h. Simply download and include that to use the queue. The blocking version is in a separate header, blockingconcurrentqueue.h, that depends on the first. The implementation makes use of certain key C++11 features, so it requires a fairly recent compiler (e.g. VS2012+ or g++ 4.8; note that g++ 4.6 has a known bug with std::atomic and is thus not supported). The algorithm implementations themselves are platform independent.

Use it like you would any other templated queue, with the exception that you can use it from many threads at once :-)

Simple example:

  1.  
    #include "concurrentqueue.h"
  2.  
     
  3.  
    moodycamel::ConcurrentQueue< int> q;
  4.  
  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值