Linux内核x86平台读写锁rwlock性能分析(一)
作者:gfree.wind@
博客:
微博:
QQ技术群:4367710
问题描述
这段时间一直在不断优化底层代码,产品性能也不断提高。前几天,将一个cache数据结构的锁由spinlock转为rwlock——该操作确实是读多写少,结果不成想性能反而下降了。虽然按照计划,该结构很快很继续优化为无锁,但是也要确定一下,为什么换为rwlock反而会下降?
问题分析
首先,从spinlock和rwlock的功能上分析,前者只允许一个cpu访问共享资源,其余cpu均是忙等待;后者则允许多个读者同时访问,绝对限制只有一个cpu可以作为写者。当读者访问资源时,写者必须忙等待,反之亦然。只从功能描述上分析,rwlock从功能上不应该比spinlock性能差。
但是结果是最有力的证据。这个cache数据结构是用一个hash表维护的,当散列函数比较理想时,锁竞争发生的概率可能很小。那么rwlock造成的性能下降,可能是因为rwlock的自身上锁解锁的cpu消耗要比spinlock高——于是依稀记起,以前看过一篇文章,即使在读多写少的情况下,rwlock的性能并不理想。
试验平台
Linux发行版版本: [root@fgao locktest]#cat /etc/issue Fedora release 16 (Verne) Linux内核版本: [root@fgao locktest]#uname -a Linux fgao.fc16 3.6.11-4.fc16.i686.PAE #1 SMP Tue Jan 8 21:18:14 UTC 2013 i686 i686 i386 GNU/Linux CPU信息:Intel(R) Core(TM) i5-3230M CPU @ 2.60GHz
试验证明
为了测量spinlock与rwlock的性能,需要比较高精度的计时,jiffies是不可能的了。我记得linux内核可以直接读取高精度计时器,但是一时半会也找不到方法,暂且用getnstimeofday来计算耗时。另外,在计算过程中,要禁掉抢占和中断,防止计算过程被打断。
#include
#include
#include
#include
static void lock_test(void)
{
#define TEST_TIMES (100000000)
#define ONE_SEC_NS (1000000000)
DEFINE_RWLOCK(rwlock);
spinlock_t spinlock;
struct timespec start;
struct timespec end;
struct timespec cost;
unsigned long long startns;
unsigned long long endns;
int i;
preempt_disable();
local_irq_disable();
spin_lock_init(&spinlock);
getnstimeofday(&start);
for (i = 0; i < TEST_TIMES; ++i) {
read_lock(&rwlock);
read_unlock(&rwlock);
}
getnstimeofday(&end);
startns = start.tv_sec*ONE_SEC_NS+start.tv_nsec;
endns = end.tv_sec*ONE_SEC_NS+end.tv_nsec;
cost.tv_sec = end.tv_sec-start.tv_sec;
if (end.tv_nsec >= start.tv_nsec) {
cost.tv_nsec = end.tv_nsec-start.tv_nsec;
} else {
--cost.tv_sec;
cost.tv_nsec = end.tv_nsec+ONE_SEC_NS-start.tv_nsec;
}
printk(KERN_INFO "wrlock start: %ld s, %ld ns\n",
start.tv_sec, start.tv_nsec);
printk(KERN_INFO "wrlock end: %ld s, %ld ns\n",
end.tv_sec, end.tv_nsec);
printk(KERN_INFO "wrlock costs %ld s, %ld ns\n",
cost.tv_sec, cost.tv_nsec);
getnstimeofday(&start);
for (i = 0; i < TEST_TIMES; ++i) {
spin_lock(&spinlock);
spin_unlock(&spinlock);
}
getnstimeofday(&end);
startns = start.tv_sec*ONE_SEC_NS+start.tv_nsec;
endns = end.tv_sec*ONE_SEC_NS+end.tv_nsec;
cost.tv_sec = end.tv_sec-start.tv_sec;
if (end.tv_nsec >= start.tv_nsec) {
cost.tv_nsec = end.tv_nsec-start.tv_nsec;
} else {
--cost.tv_sec;
cost.tv_nsec = end.tv_nsec+ONE_SEC_NS-start.tv_nsec;
}
printk(KERN_INFO "spinlock start: %ld s, %ld ns\n",
start.tv_sec, start.tv_nsec);
printk(KERN_INFO "spinlock end: %ld s, %ld ns\n",
end.tv_sec, end.tv_nsec);
printk(KERN_INFO "spinlock costs %ld s, %lu ns\n",
cost.tv_sec, cost.tv_nsec);
local_irq_enable();
preempt_enable();
}
static int lock_test_init(void)
{
printk(KERN_INFO "Lock test init\n");
lock_test();
return 0;
}
static void lock_test_exit(void)
{
printk(KERN_INFO "Lock test exit\n");
}
module_init(lock_test_init);
module_exit(lock_test_exit);
运行三次测试程序,结果如下:
[ 5171.879338] Lock test init
[ 5173.684102] wrlock start: 1385997458 s, 954574279 ns
[ 5173.684121] wrlock end: 1385997460 s, 763607899 ns
[ 5173.684134] wrlock costs 1 s, 809033620 ns
[ 5175.280126] spinlock start: 1385997460 s, 763679378 ns
[ 5175.280144] spinlock end: 1385997462 s, 363456102 ns
[ 5175.280156] spinlock costs 1 s, 599776724 ns
[ 5175.285342] Lock test exit
[ 5176.206625] Lock test init
[ 5178.026367] wrlock start: 1385997463 s, 292230413 ns
[ 5178.026395] wrlock end: 1385997465 s, 116279824 ns
[ 5178.026434] wrlock costs 1 s, 824049411 ns
[ 5179.642872] spinlock start: 1385997465 s, 116388371 ns
[ 5179.642890] spinlock end: 1385997466 s, 736658120 ns
[ 5179.642901] spinlock costs 1 s, 620269749 ns
[ 5179.648328] Lock test exit
[ 5180.465262] Lock test init
[ 5182.289852] wrlock start: 1385997467 s, 561071471 ns
[ 5182.289870] wrlock end: 1385997469 s, 389978057 ns
[ 5182.289884] wrlock costs 1 s, 828906586 ns
[ 5183.895995] spinlock start: 1385997469 s, 390051074 ns
[ 5183.896013] spinlock end: 1385997470 s, 999972534 ns
[ 5183.896025] spinlock costs 1 s, 609921460 ns
[ 5183.901968] Lock test exit
从上面的运行结果分析,rwlock的cpu消耗要比spinlock高将近20%——很可怕啊。
试验结论
当共享资源的锁竞争很小时,rwlock的cpu性能消耗反而比spinlock要高。所以不是所有读多写少的情况下,rwlock都优于spinlock的。
遗留问题
为什么rwlock的性能消耗比spinlock高?
什么时候使用rwlock要优于spinlock?即什么程度的锁竞争,才使用rwlock代替spinlock
(未完待续)