@synchronized, NSLock, pthread, OSSpinLock showdown, done right

转载 2016年05月30日 13:25:43

http://perpendiculo.us/2009/09/synchronized-nslock-pthread-osspinlock-showdown-done-right/


Somewhere out there on the internet, there’s a “showdown” between @synchronized, NSLock, pthread mutexes, and OSSpinLock. It aims to measure their performance relative to each other, but uses sloppy code to perform the measuring. As a result, while the performance ordering is correct (@synchronized is the slowest, OSSpinLock is the fastest), the relative cost is severely misrepresented. Herein I attempt to rectify that benchmark.

Locking is absolutely required for critical sections. These arise in multithreaded code, and sometimes their performance can have severe consequences in applications. The problem with the aforementioned benchmark is that it did a bunch of extraneous work while it was locking/unlocking. It was doing the same amount of extraneous work, so the relative order was correct (the fastest was still the fastest, the slowest still the slowest, etc), but it didn’t properly show just how much faster the fastest was.

In the benchmark, the author used autorelease pools, allocated objects, and then released them all.  While locking.  This is a pretty reasonable use-case, but by no means the only one.  For most high-performance, multithreaded code, you’ll spend a _bunch_ of time trying to make the critical sections as small and fast as possible.  Large, slow critical sections effectively undo the multithreading speed up by causing threads to block each other out unnecessarily.  So when you’ve trimmed the critical sections down to the minimum, another sometimes-justified optimization is to optimize the amount of time spent locking/unlocking itself.

Just to make things exciting though, not all locking primitives are created equal.  Two of the 4 mentioned have special properties that can affect how long they take, and how the operate under pressure.  I’ll get to that towards the end.

First up, here’s my “no-nonsense” microbench code:

#import <Foundation/Foundation.h>
#import <objc/runtime.h>
#import <objc/message.h>
#import <libkern/OSAtomic.h>
#import <pthread.h>

#define ITERATIONS (1024*1024*32)

static unsigned long long disp=0, land=0;

int main()
{
 double then, now;
 unsigned int i, count;
 pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
 OSSpinLock spinlock = OS_SPINLOCK_INIT;

 NSAutoreleasePool *pool = [NSAutoreleasePool new];

 NSLock *lock = [NSLock new];
 then = CFAbsoluteTimeGetCurrent();
 for(i=0;i<ITERATIONS;++i)
 {
 [lock lock];
 [lock unlock];
 }
 now = CFAbsoluteTimeGetCurrent();
 printf("NSLock: %f sec\n", now-then);    

 then = CFAbsoluteTimeGetCurrent();
 IMP lockLock = [lock methodForSelector:@selector(lock)];
 IMP unlockLock = [lock methodForSelector:@selector(unlock)];
 for(i=0;i<ITERATIONS;++i)
 {
 lockLock(lock,@selector(lock));
 unlockLock(lock,@selector(unlock));
 }
 now = CFAbsoluteTimeGetCurrent();
 printf("NSLock+IMP Cache: %f sec\n", now-then);    

 then = CFAbsoluteTimeGetCurrent();
 for(i=0;i<ITERATIONS;++i)
 {
 pthread_mutex_lock(&mutex);
 pthread_mutex_unlock(&mutex);
 }
 now = CFAbsoluteTimeGetCurrent();
 printf("pthread_mutex: %f sec\n", now-then);

 then = CFAbsoluteTimeGetCurrent();
 for(i=0;i<ITERATIONS;++i)
 {
 OSSpinLockLock(&spinlock);
 OSSpinLockUnlock(&spinlock);
 }
 now = CFAbsoluteTimeGetCurrent();
 printf("OSSpinlock: %f sec\n", now-then);

 id obj = [NSObject new];

 then = CFAbsoluteTimeGetCurrent();
 for(i=0;i<ITERATIONS;++i)
 {
 @synchronized(obj)
 {
 }
 }
 now = CFAbsoluteTimeGetCurrent();
 printf("@synchronized: %f sec\n", now-then);

 [pool release];
 return 0;
}

We do 5 tests:  We test NSLock, NSLock with IMP caching, pthread mutexes, OSSpinLocks, and then finally @synchronized.  We simply lock and unlock 33554432 times (that’s 1024*1024*32 for those keeping score at home ;), and see how long it takes.  No allocation, no releases, no autorelease pools, nothing.  Just pure lock/unlock goodness.  I ran the test a few times, and averaged the results (so overall, the results are from something like 100 million lock/unlock cycles each)

  1. NSLock: 3.5175 sec
  2. NSLock+IMP Cache: 3.1165 sec
  3. Mutex: 1.5870 sec
  4. SpinLock: 1.0893
  5. @synchronized: 9.9488 sec
Lock Performance

Lock Performance

From the above graph, we can see a couple thing:  First, @synchronized is _Really_ expensive — like, 3 times as expensive as anything else.  We’ll get into why that is in a moment.  Otherwise, we see that NSLock and NSLock+IMP Cache are pretty close — these are built on top of pthread mutexes, but we have to pay for the extra ObjC overhead.  Then there’s Mutex (pthread mutexes) and SpinLock — these are pretty close, but even then SpinLock is almost 30% faster than Mutex.  We’ll get into that one too.  So from top to bottom we have almost an order of magnitude difference between the worst and best.

The nice part about these all is that they all take about the same amount of code — using NSLock takes as many lines as a pthread mutex, and the same number for a spinlock.  @synchronized saves a line or two, but with a cost like that it quickly looks unappealing in all but the most trivial of cases.

So, what makes @sychronized and SpinLock so different from the others?

@synchronized is very heavy weight because it has to set up an exception handler, and it actually ends up taking a few internal locks on its way there.  So instead of a simple cheap lock, you’re paying for a couple locks/unlocks just to acquire your measly lock.  Those take time.

OSSpinLock, on the other hand, doesn’t even enter the kernel — it just keeps reloading the lock, hoping that it’s unlocked.  This is terribly inefficient if locks are held for more than a few nanoseconds, but it saves a costly system call and a couple context switches.  Pthread mutexes actually use an OSSpinLock first, to keep things running smoothly where there’s no contention.  When there is, it resorts to heavier, kernel-level locking/tasking stuff.

So, if you’ve got hotly-contested locks, OSSpinLock probably isn’t for you (unless your critical sections are _Really_ _Fast_).  Pthread mutexes are a tiny bit more expensive, but they avoid the power-wasting effects of OSSpinLock.

NSLock is a pretty wrapper on pthread mutexes.  They don’t provide much else, so there’s not much point in using them over pthread mutexes.

Of course, standard optimization disclaimers apply:  don’t do it until you’re sure you’ve chosen the correct algorithms, have profiled to find hotspots, and have found locking to be one of those hot items.  Otherwise, you’re wasting your time on something that’s likely to provide minimal benefits.

4 Comments »

  1. This is very interesting and useful, thanks for the sample code too!

    Comment by Zachary Howe — 2012.08.10 @ 12:48 pm

  2. thank you very much, With your article, i finally konw the speed of different sync mechanism

    Comment by maple — 2012.11.01 @ 8:05 am

  3. nice article 

iOS 多线程同步策略之-----锁NSLock和@synchronized

1.NSLock---锁同步 - (void)viewDidLoad { [super viewDidLoad]; NSLock * lock = [[NSLock alloc]in...
  • choudang
  • choudang
  • 2014年07月29日 22:18
  • 1443

iOS开发笔记--关于 @synchronized,这儿比你想知道的还要多

如果你已经使用 Objective-C 编写过任何并发程序,那么想必是见过 @synchronized 这货了。@synchronized 结构所做的事情跟锁(lock)类似:它防止不同的线程同时执行...
  • hopedark
  • hopedark
  • 2015年11月04日 17:30
  • 6074

Synchronized与三种锁态

介绍Sychronized的锁实现原理以及三种锁态(偏向锁、轻量级锁和重量级锁)的关系...
  • u010723709
  • u010723709
  • 2015年12月17日 20:53
  • 2769

IOS开发之NSLock 的使用

// NSLock的执行原理: // 某个线程A调用lock方法。这样,NSLock将被上锁。可以执行“关键部分”,完成后,调用unlock方法。 // 如果,在线程A 调用unlo...
  • jscjxysx
  • jscjxysx
  • 2014年07月28日 17:29
  • 1205

对iOS锁的一些研究

最近因为程序中频繁使用到了锁,不知道各种锁对性能的影响,今天稍作测试。顺便研究下里面的机制。 测试代码 测试原理:在一个线程中,对空代码段执行指定次数的加解锁。算出时间差。 ...
  • meegomeego
  • meegomeego
  • 2014年09月25日 10:12
  • 9285

iOS多线程编程:线程同步总结(NSLock)

文章来源:http://blog.csdn.net/lifengzhong/article/details/7487505 1:原子操作 - OSAtomic系列函数 iOS平台下的原...
  • libaineu2004
  • libaineu2004
  • 2015年04月21日 10:10
  • 4335

Object-C 多线程中锁的使用-NSLock

在多线程的编程环境中,锁的使用必不可少! 于是,今天来总结一下为共享资源加锁的操作方法。 一、使用synchronized方式     //线程1     disp...
  • leewolf130
  • leewolf130
  • 2013年12月19日 10:02
  • 1164

Shell中while循环的done 后接一个重定向<

读文件的方法:第一步: 将文件的内容通过管道(|)或重定向(
  • feixiaohuijava
  • feixiaohuijava
  • 2016年11月11日 15:40
  • 4235

java并发学习:synchronized 的不足之处

程老师原文地址:http://flychao88.iteye.com/blog/1852893 原文如下: 1、不能够跨越多个对象。  2、当在等待锁对象的时候,不能中途放弃,直到成功。  ...
  • bohu83
  • bohu83
  • 2016年04月11日 18:23
  • 686

java中几种synchronized用法总结

文章为自己测试后所写,有说得不对的地方希望看到的大神能够指出来让我得以改正,以免误导其他同行 synchronized方法,synchronized静态方法,synchronized(this),sy...
  • apeng_1102
  • apeng_1102
  • 2016年09月28日 09:31
  • 474
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:@synchronized, NSLock, pthread, OSSpinLock showdown, done right
举报原因:
原因补充:

(最多只允许输入30个字)