无锁编程

最新推荐文章于 2024-07-01 22:33:05 发布

废言Pro

最新推荐文章于 2024-07-01 22:33:05 发布

阅读量777

点赞数

分类专栏：多核并发

原文链接：https://www.cnblogs.com/linuxbug/p/4840138.html

版权

多核并发专栏收录该内容

28 篇文章 4 订阅

订阅专栏

锁的缺点

锁定被迫交出时间片。

锁定意味着阻塞，多个线程（进程）排队获取资源，无法充分发挥系统性能。

锁定的阻塞无法通过fd进行通知，对性能有进一步的影响（理想的服务器模型是全局一处阻塞统一等待消息）。

一些锁限制了必须使用线程的方式进行开发，而线程无法充分利用系统的内存。

pthread库在特殊情况下可能产生饥饿的情况。

无锁编程的思路

加锁的根本起因是什么？

资源竞争。

解决资源竞争的思路有哪些？

分资源：资源进一步分配，各个资源获得方不相往来。

分功能：对资源进行规划，各自处理不同功能。

做冗余：对资源进行冗余，对当前资源进行切换。

二次检查：不加锁执行后，检查是否被修改（CAS）。

无锁编程

少锁

原子操作与忙等待

CAS解法与ABA问题

seqlock

免锁

实战无锁编程

数据与进程对等的处理

单一生产者与单一消费者进程

下面让我们一个一个的来梳理无锁编程的内容吧。

Double-checked Locking，严格意义上来讲不属于无锁范畴，无论什么时候当临界区中的代码仅仅需要加锁一次，同时当其获取锁的时候必须是线程安全的，此时就可以利用 Double-checked Locking 模式来减少锁竞争和加锁载荷。目前Double-checkedLocking已经广泛应用于单例 (Singleton)模式中。

Double-checked Locking有以下特点：

Double-checked Locking模式是Singleton的多线程版本。
Double-checked Locking模式依旧会使用锁——临界区锁定，不要以为可以避免使用锁。
Double-checked Locking解决的问题是：当多个线程存在访问临界区企图时，保证了临界区只需要访问一次。

以Singleton为例，为了防止多次分配，通常Singleton的实现方式是：

// 实现1

Class singleton
{
	singleton* get_instance()
	{
		lock();
		if (instance == 0)
		{
			instance = new singleton;
		}
		unlock();
		return instance;
	}
}

这里存在的问题是：无论是否已经初始化都要加锁，增加了负荷，已经没有所谓的并发性能了。

要增加并发性能，可以先判断是否已经分配，在没分配的情况下才加锁，也许你想要改成下面这个样子：

// 实现2

Class singleton
{
	singleton* get_instance()
	{
		if (instance == 0)
		{
			lock();
			instance = new singleton;
			unlock();
		}
		return instance;
	}
}

这里存在的问题是：不能保证临界区只初始化一次,没能实现singleton的基本功能。

// 实现3 - Double-checkedLocking

Class singleton
{
	singleton* get_instance()
	{ 
		if (instance == 0)
		{
			lock();
			if (instance == 0 )
			{
				instance = new singleton;
			}
			unlock();
		}
		return instance;
	}
}

严格的说，Double-checked locking不属于无锁编程的范畴，但由原来的每次加锁访问到大多数情况下无须加锁，就是一个巨大的进步。

什么是原子操作

原子操作可以保证指令以原子的方式执行——执行过程不被打断，原子操作是多数无锁编程的基本前提。

原子操作分为以下几类

对1字节的读写

对2字节数（对齐到16位边界）读写

对4字节数（对齐到32位边界）读写

对8字节数（对齐到64位边界）读写

xchg

原子操作基本原理

在x86平台上，CPU提供了在指令执行期间对总线加锁的手段。CPU芯片上有一条引线#HLOCK pin，如果汇编语言的程序中在一条指令前面加上前缀"LOCK"，经过汇编以后的机器代码就使CPU在执行这条指令的时候把#HLOCK pin的电位拉低，持续到这条指令结束时放开，从而把总线锁住，这样同一总线上别的CPU就暂时不能通过总线访问内存了，保证了这条指令在多处理器环境中的原子性。

LOCK是一个指令的描述符，表示后续的指令在执行的时候，在内存总线上加锁。总线锁会导致其他几个核在一定时钟周期内无法访问内存。虽然总线锁会影响其他核的性能，但比起操作系统级别的锁，已经轻量太多了。

#lock是锁FSB(前端串行总线，front serial bus)，FSB是处理器和RAM之间的总线，锁住了它，就能阻止其他处理器或core从RAM获取数据。

内核提供atomic_*系列原子操作

声明和定义：

void atomic_set(atomic_t *v, int i);

atomic_t v = ATOMIC_INIT(0);

读写操作：

int atomic_read(atomic_t *v);

void atomic_add(int i, atomic_t *v);

void atomic_sub(int i, atomic_t *v);

加一减一：

void atomic_inc(atomic_t *v);

void atomic_dec(atomic_t *v);

执行操作并且测试结果：执行操作之后，如果v是0，那么返回1，否则返回0

int atomic_inc_and_test(atomic_t *v);

int atomic_dec_and_test(atomic_t *v);

int atomic_sub_and_test(int i, atomic_t *v);

int atomic_add_negative(int i, atomic_t *v);

int atomic_add_return(int i, atomic_t *v);

int atomic_sub_return(int i, atomic_t *v);

int atomic_inc_return(atomic_t *v);

int atomic_dec_return(atomic_t *v);

gcc内置__sync_*系列built-in函数

gcc内置的__sync_*函数提供了加减和逻辑运算的原子操作，__sync_fetch_and_add系列一共有十二个函数，有加/减/与/或/异或/等函数的原子性操作函数,__sync_fetch_and_add,顾名思义，先fetch，然后自加，返回的是自加以前的值。以count = 4为例，调用__sync_fetch_and_add(&count,1),之后，返回值是4，然后，count变成了5.
有__sync_fetch_and_add,自然也就有__sync_add_and_fetch，先自加，再返回。这两个的关系与i++和++i的关系是一样的。

type可以是1,2,4或8字节长度的int类型，即：
int8_t / uint8_t
int16_t / uint16_t
int32_t / uint32_t
int64_t / uint64_t

type __sync_fetch_and_add (type *ptr, typevalue);
type __sync_fetch_and_sub (type *ptr, type value);
type __sync_fetch_and_or (type *ptr, type value);
type __sync_fetch_and_and (type *ptr, type value);
type __sync_fetch_and_xor (type *ptr, type value);
type __sync_fetch_and_nand(type *ptr, type value);

type __sync_add_and_fetch (type *ptr, typevalue);
type __sync_sub_and_fetch (type *ptr, type value);
type __sync_or_and_fetch (type *ptr, type value);
type __sync_and_and_fetch (type *ptr, type value);
type __sync_xor_and_fetch (type *ptr, type value);
type __sync_nand_and_fetch (type *ptr, type value);

代码讲解1：使用__sync_fetch_and_add操作全局变量

<strong>#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/time.h>
#include <stdint.h>

int count = 0;

void *test_func(void *arg)
{
	int i=0;
	for(i=0;i<2000000;++i)
	{
		__sync_fetch_and_add(&count,1);
	}
	return NULL;
}

int main(int argc, const char *argv[])
{
	pthread_t id[20];
	int i = 0;

	uint64_t usetime;
	struct timeval start;
	struct timeval end;
	
	gettimeofday(&start,NULL);
	
	for(i=0;i<20;++i)
	{
		pthread_create(&id[i],NULL,test_func,NULL);
	}

	for(i=0;i<20;++i)
	{
		pthread_join(id[i],NULL);
	}
	
	gettimeofday(&end,NULL);

	usetime = (end.tv_sec-start.tv_sec)*1000000+(end.tv_usec-start.tv_usec);
	printf("count = %d, usetime = %lu usecs\n", count, usetime);
	return 0;
}
</strong>

代码讲解2：使用互斥锁mutex操作全局变量

<strong>#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/time.h>
#include <stdint.h>

int count = 0;
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

void *test_func(void *arg)
{
	int i=0;
	for(i=0;i<2000000;++i)
	{
		pthread_mutex_lock(&mutex);
		++count;
		pthread_mutex_unlock(&mutex);
	}
	return NULL;
}

int main(int argc, const char *argv[])
{
	pthread_t id[20];
	int i = 0;

	uint64_t usetime;
	struct timeval start;
	struct timeval end;
	
	gettimeofday(&start,NULL);
	
	for(i=0;i<20;++i)
	{
		pthread_create(&id[i],NULL,test_func,NULL);
	}

	for(i=0;i<20;++i)
	{
		pthread_join(id[i],NULL);
	}
	
	gettimeofday(&end,NULL);

	usetime = (end.tv_sec-start.tv_sec)*1000000+(end.tv_usec-start.tv_usec);
	printf("count = %d, usetime = %lu usecs\n", count, usetime);
	return 0;
}
</strong>

结果说明：

[root@rocket lock-free]#./atom_add_gcc_buildin

count = 40000000, usetime = 756694 usecs

[root@rocket lock-free]# ./atom_add_mutex

count = 40000000, usetime = 3247131 usecs

可以看到，使用原子操作是使用互斥锁性能的5倍左右，随着冲突数量的增加，性能差距会进一步拉开。Alexander Sandler实测，原子操作性能大概是互斥锁的6-7倍左右。

有兴趣的同学请参考：

http://www.alexonlinux.com/multithreaded-simple-data-type-access-and-atomic-variables

xchg指令

xchg(ptr, new) 将ptr指向的值置为new，返回交换前的值。

cmpxchg(ptr, old, new) 比较当前值如果跟old相同，则将ptr指向的值置为new，否则不变，返回交换前的值。根据比较返回值是否和old一样来判断是否成功。

int fetch_and_add(int* i, int value, int* confict)
{
	int old_value;
	int new_value;
	int v;
	do 
	{
		old_value = *i;
		new_value = old_value + 1;
		v = cmpxchg(i, old_value, new_value);
		(*confict)++;
	} while (old_value != v);
}

概念

忙等待可以认为是一种特殊的忙等待

忙等待分类

Peterson算法

xchg解法

TSL解法

自旋锁

Peterson算法

Peterson算法是一个实现互斥锁的并发程序设计算法，可以控制两个线程访问一个共享的单用户资源而不发生访问冲突。GaryL. Peterson于1981年提出此算法。

#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/time.h>
#include <stdint.h>

int count = 0;
#define N 2
volatile int turn;   
volatile int interested[N] = {0}; 

void enter_region(int process)
{
	int other = 1 - process; //另一个进程  
	interested[process] = true;
	turn = process;
	while (turn == process && interested[other] == true) NULL; //一直循环，直到other进程退出临界区  
}

void leave_region(int process)
{
	interested[process] = false; 	// leave critical region
}

void *test_func(void *arg)
{
	int process = *((int *)arg);
	printf("thread %d run\n", process);
	int i=0;
	for(i=0;i<2000000;++i)
	{
		enter_region(process);
		//printf("%d enter, count = %d\n", pthread_self(),count);
		++count;
		leave_region(process);
	}
	return NULL;
}

int main(int argc, const char *argv[])
{
	pthread_t id[N];
	int process[N];
	int i = 0;

	uint64_t usetime;
	struct timeval start;
	struct timeval end;
	
	gettimeofday(&start,NULL);
	for(i=0;i<N;++i)
	{
		process[i] = i;
	}	
	
	for(i=0;i<N;++i)
	{
		pthread_create(&id[i],NULL,test_func,&process[i]);
	}

	for(i=0;i<N;++i)
	{
		pthread_join(id[i],NULL);
	}
	
	gettimeofday(&end,NULL);

	usetime = (end.tv_sec-start.tv_sec)*1000000+(end.tv_usec-start.tv_usec);
	printf("count = %d, usetime = %lu usecs\n", count, usetime);
	return 0;
}

结果说明：

[root@rocket lock-free]#./busywait_peterson

thread 0 run

thread 1 run

count = 3999851, usetime = 263132 usecs

可以看出，虽然是互斥算法，但是实测的结果缺不是十分精确，有少量的count丢失，这点让人感到很差异，这里先不去深究，有经验的同学可以帮忙分析一下原因。

xchg解法

#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
#include <sys/types.h>
#include <asm/system.h>
#include <sys/time.h>
#include <stdint.h>

volatile int in_using = 0;
int count = 0;
#define N 2

void enter_region()
{
	while (xchg(&in_using, 1)) NULL;
}

void leave_region()
{
	in_using = 0;	// leave critical region
}

void *test_func(void *arg)
{
	int i=0;
	for(i=0;i<2000000;++i)
	{
		enter_region();
		++count;
		leave_region();
	}
	
	return NULL;
}

int main(int argc, const char *argv[])
{
	pthread_t id[20];
	int i = 0;

	uint64_t usetime;
	struct timeval start;
	struct timeval end;
	
	gettimeofday(&start,NULL);
	
	for(i=0;i<N;++i)
	{
		pthread_create(&id[i],NULL,test_func,NULL);
	}

	for(i=0;i<N;++i)
	{
		pthread_join(id[i],NULL);
	}
	
	gettimeofday(&end,NULL);

	usetime = (end.tv_sec-start.tv_sec)*1000000+(end.tv_usec-start.tv_usec);
	printf("count = %d, usetime = %lu usecs\n", count, usetime);
	return 0;
}

结果说明：这个结果自然是非常精确，感觉比peterson算法靠谱多了，性能倒是差别不大。

[root@rocket lock-free]# ./busywait_xchg

count = 4000000, usetime = 166548 usecs

TSL解法（Test and Set Lock）

enter_region：

tsl register, lock |复制lock到寄存器，并将lock置为1

cmp register, #0 | lock等于0吗?

jne enter_region |如果不等于0，已上锁，再次循环

ret |返回调用程序，进入临界区

leave_region：

move lock, #0 |置lock为0

ret |返回调用程序

Linux同步机制(一) - 线程锁

1 互斥锁

在线程实际运行过程中，我们经常需要多个线程保持同步。

这时可以用互斥锁来完成任务。互斥锁的使用过程中，主要有

pthread_mutex_init

pthread_mutex_destory

pthread_mutex_lock

pthread_mutex_unlock

这几个函数以完成锁的初始化，锁的销毁，上锁和释放锁操作。

1.1 锁的创建

锁可以被动态或静态创建，可以用宏PTHREAD_MUTEX_INITIALIZER来静态的初始化锁，采用这种方式比较容易理解，互斥锁是pthread_mutex_t的结构体，而这个宏是一个结构常量，如下可以完成静态的初始化锁：

pthread_mutex_t mutex =PTHREAD_MUTEX_INITIALIZER;

另外锁可以用pthread_mutex_init函数动态的创建，函数原型如下：

int pthread_mutex_init(pthread_mutex_t*mutex, const pthread_mutexattr_t * attr)

1.2 锁的属性

互斥锁属性可以由pthread_mutexattr_init(pthread_mutexattr_t *mattr)来初始化，然后可以调用其他的属性设置方法来设置其属性。

互斥锁的范围：可以指定是该进程与其他进程的同步还是同一进程内不同的线程之间的同步。可以设置为PTHREAD_PROCESS_SHARE和PTHREAD_PROCESS_PRIVATE。默认是后者，表示进程内使用锁。可以使用

int pthread_mutexattr_setpshared(pthread_mutexattr_t*mattr, int pshared)

pthread_mutexattr_getpshared(pthread_mutexattr_t*mattr,int *pshared)

用来设置与获取锁的范围；

互斥锁的类型：有以下几个取值空间：

PTHREAD_MUTEX_TIMED_NP，这是缺省值，也就是普通锁。当一个线程加锁以后，其余请求锁的线程将形成一个等待队列，并在解锁后按优先级获得锁。这种锁策略保证了资源分配的公平性。
PTHREAD_MUTEX_RECURSIVE_NP，嵌套锁，允许同一个线程对同一个锁成功获得多次，并通过多次unlock解锁。如果是不同线程请求，则在加锁线程解锁时重新竞争。
PTHREAD_MUTEX_ERRORCHECK_NP，检错锁，如果同一个线程请求同一个锁，则返回EDEADLK，否则与PTHREAD_MUTEX_TIMED_NP类型动作相同。这样就保证当不允许多次加锁时不会出现最简单情况下的死锁。
PTHREAD_MUTEX_ADAPTIVE_NP，适应锁，动作最简单的锁类型，仅等待解锁后重新竞争。

可以用
pthread_mutexattr_settype(pthread_mutexattr_t *attr , int type)
pthread_mutexattr_gettype(pthread_mutexattr_t *attr , int *type)

获取或设置锁的类型。

1.3 锁的释放

调用pthread_mutex_destory之后，可以释放锁占用的资源，但这有一个前提上锁当前是没有被锁的状态。

1.4 锁操作

对锁的操作主要包括加锁 pthread_mutex_lock()、解锁pthread_mutex_unlock()和测试加锁pthread_mutex_trylock()三个。

int pthread_mutex_lock(pthread_mutex_t*mutex)
int pthread_mutex_unlock(pthread_mutex_t *mutex)
int pthread_mutex_trylock(pthread_mutex_t *mutex)

pthread_mutex_trylock()语义与pthread_mutex_lock()类似，不同的是在锁已经被占据时返回EBUSY而不是挂起等待。

1.5 代码讲解：

代码说明1：互斥锁基本应用

#include <stdio.h>
#include <pthread.h>
#include <unistd.h>

pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
int count = 0;

void* consume(void *arg)
{
    while(1)
    {
        pthread_mutex_lock(&mutex);
        printf("************************consume begin lock\n");  
        printf("************************consumed %d\n",count);  
        count++;
        sleep(2);
        printf("************************consume over lock\n"); 
        pthread_mutex_unlock(&mutex); 
        printf("************************I'm out of pthread_mutex\n"); 
        sleep(1);
    }
    
    return NULL;
}

void* produce( void * arg )
{
    while(1)
    {
        pthread_mutex_lock(&mutex );
        printf("product begin lock\n");
        printf("produced %d\n", count);
        printf("product over lock\n");
        pthread_mutex_unlock(&mutex );
        printf("I'm out of pthread_mutex\n");
        sleep(1);
    }
    
    return NULL;
}

int main( void )
{
    pthread_t thread1,thread2;
    pthread_create(&thread1, NULL, &produce, NULL );
    pthread_create(&thread2, NULL, &consume, NULL );
    pthread_join(thread1,NULL);
    pthread_join(thread2,NULL);
    return 0;
}

结果说明：

[root@rocket lock-free]# g++ -g -o pthread_mutex_lockpthread_mutex_lock.cpp -lpthread

[root@rocket lock-free]#./pthread_mutex_lock

product begin lock

produced 0

product over lock

I'm out of pthread_mutex

************************consume beginlock

************************consumed 0

/*中间等待了2秒但是product线程没有执行!*/

************************consume overlock

************************I'm out ofpthread_mutex

product begin lock

produced 1

product over lock

I'm out of pthread_mutex

product begin lock

produced 1

product over lock

I'm out of pthread_mutex

************************consume beginlock

************************consumed 1

************************consume overlock

************************I'm out ofpthread_mutex

product begin lock

produced 2

product over lock

I'm out of pthread_mutex

************************consume beginlock

************************consumed 2

************************consume overlock

************************I'm out ofpthread_mutex

代码说明2：pthread_mutext_trylock使用

#include <stdio.h>
#include <pthread.h>
#include <unistd.h>

pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
int count = 0;

void* consume(void *arg)
{
    while(1)
    {
        pthread_mutex_lock(&mutex);
        printf("************************consume begin lock\n");  
        printf("************************consumed %d\n",count);  
        count++;
        sleep(2);
        printf("************************consume over lock\n"); 
        pthread_mutex_unlock(&mutex); 
        printf("************************I'm out of pthread_mutex\n"); 
        sleep(1);
    }
    
    return NULL;
}

void* produce( void * arg )
{
    while(1)
    {
        if(pthread_mutex_trylock(&mutex ) == 0)
        {
            printf("product begin lock\n");
            printf("produced %d\n", count );
            printf("product over lock\n");
            pthread_mutex_unlock(&mutex);
            printf("I'm out of pthread_mutex\n");
            sleep(1);
        }
        else
        {
            printf("I have try!But i can`t lock the mutex!\n");
            sleep(1);
        }
    }
    
    return NULL;
}

int main( void )
{
    pthread_t thread1,thread2;
    pthread_create(&thread1, NULL, &produce, NULL );
    pthread_create(&thread2, NULL, &consume, NULL );
    pthread_join(thread1,NULL);
    pthread_join(thread2,NULL);
    return 0;
}

结果说明：

[root@rocket lock-free]# g++ -g -o pthread_mutex_trylock pthread_mutex_trylock.cpp -lpthread

[root@rocket lock-free]#./pthread_mutex_trylock

************************consume beginlock

************************consumed 0

/* trylock没有成功马上返回! */

I have try!But i can`t lock the mutex!

************************consume overlock

************************I'm out ofpthread_mutex

product begin lock

produced 1

product over lock

I'm out of pthread_mutex

************************consume beginlock

************************consumed 1

I have try!But i can`t lock the mutex!

************************consume overlock

************************I'm out ofpthread_mutex

product begin lock

produced 2

product over lock

I'm out of pthread_mutex

************************consume beginlock

************************consumed 2

I have try!But i can`t lock the mutex!

************************consume overlock

************************I'm out ofpthread_mutex

2 读写锁

读写锁是因为有3种状态，所以可以有更高的并行性。

2.1 特性

一次只有一个线程可以占有写模式的读写锁, 但是可以有多个线程同时占有读模式的读写锁，正是因为这个特性，当读写锁是写加锁状态时，在这个锁被解锁之前, 所有试图对这个锁加锁的线程都会被阻塞。

当读写锁在读加锁状态时, 所有试图以读模式对它进行加锁的线程都可以得到访问权, 但是如果线程希望以写模式对此锁进行加锁, 它必须阻塞直到所有的线程释放锁。

通常，当读写锁处于读模式锁住状态时，如果有另外线程试图以写模式加锁，读写锁通常会阻塞随后的读模式锁请求, 这样可以避免读模式锁长期占用, 而等待的写模式锁请求长期阻塞。

2.2 适用性

读写锁适合于对数据结构的读次数比写次数多得多的情况。因为，读模式锁定时可以共享，以写模式锁住时意味着独占, 所以读写锁又叫共享-独占锁。

2.3 API初始化和销毁

#include <pthread.h>
int pthread_rwlock_init(pthread_rwlock_t *restrict rwlock, const pthread_rwlockattr_t *restrict attr);
int pthread_rwlock_destroy(pthread_rwlock_t *rwlock);
成功则返回0，出错则返回错误编号

同互斥锁一样，在释放读写锁占用的内存之前，需要先通过pthread_rwlock_destroy对读写锁进行清理工作, 释放由init分配的资源。

2.4 读和写

#include <pthread.h>
int pthread_rwlock_rdlock(pthread_rwlock_t *rwlock);
int pthread_rwlock_wrlock(pthread_rwlock_t *rwlock);
int pthread_rwlock_unlock(pthread_rwlock_t *rwlock);

这3个函数分别实现获取读锁, 获取写锁和释放锁的操作. 获取锁的两个函数是阻塞操作

同样，非阻塞的函数为:

#include <pthread.h>
int pthread_rwlock_tryrdlock(pthread_rwlock_t *rwlock);
int pthread_rwlock_trywrlock(pthread_rwlock_t *rwlock);

非阻塞的获取锁操作, 如果可以获取则返回0，否则返回错误的EBUSY

2.5 代码讲解

代码说明1：读写锁基本应用

#include <errno.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <bits/pthreadtypes.h>
 
static pthread_rwlock_t rwlock; //读写锁对象

int count = 0;

void *thread_function_read(void *arg)
{
    while(1)
    {
        pthread_rwlock_rdlock(&rwlock);
        printf("************************%d, read count %d\n", pthread_self(), count);
        sleep(1);
        pthread_rwlock_unlock(&rwlock);
	usleep(100);
    }
    
    return NULL;
}

void *thread_function_write(void *arg)
{
    while(1)
    {
        pthread_rwlock_wrlock(&rwlock);
        count++;
        printf("************************%d, write count %d\n", pthread_self(), count);
        sleep(5);
        pthread_rwlock_unlock(&rwlock);
	usleep(100);
    }
    return NULL;
}
   
int main(int argc, char *argv[])
{
    pthread_t rpthread1, rpthread2, wpthread;

    pthread_rwlock_init(&rwlock,NULL);

    pthread_create(&rpthread1, NULL, thread_function_read, NULL);
    pthread_create(&rpthread2, NULL, thread_function_read, NULL);
    pthread_create(&wpthread, NULL, thread_function_write, NULL);

    pthread_join(rpthread1, NULL);           
    pthread_join(rpthread2, NULL);           
    pthread_join(wpthread, NULL);           
               
    pthread_rwlock_destroy(&rwlock);          
    exit(EXIT_SUCCESS);
}

结果说明：

[root@rocket lock-free]#./pthread_rwlock

/* 2个读线程互相不阻塞 */

************************1442944768,read count 0

************************1432454912,read count 0

/* 写线程阻塞所有其它线程 */

************************1421965056,write count 1

************************1442944768,read count 1

************************1432454912,read count 1

************************1421965056,write count 2

************************1442944768,read count 2

************************1432454912,read count 2

************************1421965056,write count 3

************************1442944768,read count 3

************************1432454912,read count 3

************************1421965056,write count 4

有意思的是，加入去掉上面代码中thread_function_read和thread_function_write中的usleep(100)，则会出现以下结果

[root@rocket lock-free]#./pthread_rwlock

************************-1896831232,read count 0

************************-1907321088,read count 0

************************-1896831232,read count 0

************************-1907321088,read count 0

************************-1896831232,read count 0

************************-1907321088,read count 0

发现抢不到写锁，按我原先的理解，因为reader线程先启动，所以首先是reader抢到锁，reader抢到锁以后，writer阻塞在锁请求上，当reader释放以后，应该轮到writer才对啊，可是不是这样的！当reader释放后再次请求锁时，还是能拿到！writer基本抢不到锁！

查手册写到，"The pthread_rwlock_rdlock() function applies a read lock tothe read-write lock referenced by rwlock. The calling thread acquires the readlock if a writer does not hold the lock and there are no writers blocked on thelock. It is unspecified whether the calling thread acquires the lock when awriter does not hold the lock and there are writers waiting for the lock" 意思就是说，没有writer在等写锁的时辰，reader是可以拿到读锁的。然则没有划定，若是有writer在期待写锁，该若何？

还好，Linux有pthread_rwlockattr_setkind_np这个函数。

enum

{

PTHREAD_RWLOCK_PREFER_READER_NP,

PTHREAD_RWLOCK_PREFER_WRITER_NP,

PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP,

PTHREAD_RWLOCK_DEFAULT_NP =PTHREAD_RWLOCK_PREFER_READER_NP

};

可是直接pthread_rwlockattr_setkind_np(&attr,PTHREAD_RWLOCK_PREFER_WRITER_NP);

没用！为啥呢？连man页都没有，所以我思疑这个函数没实现，因而就用debuginfo-install glibc? 装glibc的调试符号，然后用gdb跟进去，发现pthread_rwlockattr_setkind_np确切是有实现的，代码很简单，更改了attr的一个成员变量。那是为啥呢？

再谷歌，终究找到了pthread_rwlockattr_setkind_np的man page，末尾有一段notes，让我年夜汗：

“Setting the value read-write lockkind to PTHREAD_RWLOCK_PREFER_WRITER_NP, results in the same behavior assetting the value to PTHREAD_RWLOCK_PREFER_READER_NP. As long as a readerthread holds the lock the thread holding a write lock will be starved. Settingthe kind value to PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP, allows thewriter to run. However, the writer may not be recursive as is implied by thename. “

意思就是说，

PTHREAD_RWLOCK_PREFER_WRITER_NP和PTHREAD_RWLOCK_PREFER_READER_NP是一样滴！应当设置成PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP才对！可是PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP也是名存实亡滴，它才不会recursive 呢。

这样就有了代码说明2：读写锁优先级的使用

#include <errno.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <bits/pthreadtypes.h>
 
static pthread_rwlock_t rwlock; //读写锁对象

int count = 0;

void *thread_function_read(void *arg)
{
    while(1)
    {
        pthread_rwlock_rdlock(&rwlock);
        printf("************************%d, read count %d\n", pthread_self(), count);
        sleep(1);
        pthread_rwlock_unlock(&rwlock);
	//usleep(100);
    }
    
    return NULL;
}

void *thread_function_write(void *arg)
{
    while(1)
    {
	pthread_rwlock_wrlock(&rwlock);
        count++;
        printf("************************%d, write count %d\n", pthread_self(), count);
        sleep(1);
        pthread_rwlock_unlock(&rwlock);
	usleep(100);
    }
    return NULL;
}
   
int main(int argc, char *argv[])
{
    pthread_t rpthread1, rpthread2, wpthread;
    
    pthread_rwlockattr_t attr;    
    pthread_rwlockattr_setkind_np(&attr,PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP);
    pthread_rwlock_init(&rwlock, &attr);

    pthread_create(&rpthread1, NULL, thread_function_read, NULL);
    pthread_create(&rpthread2, NULL, thread_function_read, NULL);
    pthread_create(&wpthread, NULL, thread_function_write, NULL);

    pthread_join(rpthread1, NULL);           
    pthread_join(rpthread2, NULL);           
    pthread_join(wpthread, NULL);           
               
    pthread_rwlock_destroy(&rwlock);          
    exit(EXIT_SUCCESS);
}

运行结果：

[root@rocket lock-free]#./pthread_rwlock_withpriority

************************1529054976,read count 0

************************1518565120,read count 0

************************1508075264,write count 1

************************1529054976,read count 1

************************1518565120,read count 1

************************1508075264,write count 2

************************1529054976,read count 2

************************1518565120,read count 2

************************1508075264,write count 3

这样就不会导致writer饿死。

代码说明3：pthread_rwlock_tryrdlock使用

#include <errno.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <bits/pthreadtypes.h>
 
static pthread_rwlock_t rwlock; //读写锁对象

int count = 0;

void *thread_function_read(void *arg)
{
    int print_count = 0;
    while(1)
    {
        if (pthread_rwlock_tryrdlock(&rwlock) == 0)
	{
		printf("************************%d, read count %d\n", pthread_self(), count);
		sleep(1);
		pthread_rwlock_unlock(&rwlock);
		usleep(100);
	}
        else
	{
		print_count++;
		if (print_count % 10 == 0)
		{
			printf("I have try!But i can`t lock the rdlock!\n");
			print_count = 0;
		}
			
		usleep(100);
	}
    }
    
    return NULL;
}

void *thread_function_write(void *arg)
{
    while(1)
    {
        pthread_rwlock_wrlock(&rwlock);
        count++;
        printf("************************%d, write count %d\n", pthread_self(), count);
        sleep(5);
        pthread_rwlock_unlock(&rwlock);
	usleep(100);
    }
    return NULL;
}
   
int main(int argc, char *argv[])
{
    pthread_t rpthread1, rpthread2, wpthread;

    pthread_rwlock_init(&rwlock,NULL);

    pthread_create(&rpthread1, NULL, thread_function_read, NULL);
    pthread_create(&rpthread2, NULL, thread_function_read, NULL);
    pthread_create(&wpthread, NULL, thread_function_write, NULL);

    pthread_join(rpthread1, NULL);           
    pthread_join(rpthread2, NULL);           
    pthread_join(wpthread, NULL);           
               
    pthread_rwlock_destroy(&rwlock);          
    exit(EXIT_SUCCESS);
}

结果说明：

************************1819674368,read count 0

************************1809184512,read count 0

************************1798694656,write count 1

/* trylock没有成功马上返回! */

I have try!But i can`t lock therdlock!

************************1819674368,read count 1

************************1809184512,read count 1

************************1798694656, writecount 2

I have try!But i can`t lock therdlock!

3 自旋锁

自旋锁是SMP架构中的一种low-level的同步机制。
当线程A想要获取一把自旋锁而该锁又被其它线程锁持有时，线程A会在一个循环中自旋以检测锁是不是已经可用了。对于自旋锁需要注意：

由于自旋时不释放CPU，因而持有自旋锁的线程应该尽快释放自旋锁，否则等待该自旋锁的线程会一直在那里自旋，这就会浪费CPU时间。

持有自旋锁的线程在sleep之前应该释放自旋锁以便其它线程可以获得自旋锁。（在内核编程中，如果持有自旋锁的代码sleep了就可能导致整个系统挂起）

Pthreads提供的与Spin Lock锁操作相关的API主要有：

intpthread_spin_destroy(pthread_spinlock_t *);

int pthread_spin_init(pthread_spinlock_t*, int);

intpthread_spin_lock(pthread_spinlock_t *);

intpthread_spin_trylock(pthread_spinlock_t *);

intpthread_spin_unlock(pthread_spinlock_t *);

3.1 初始化自旋锁

pthread_spin_init用来申请使用自旋锁所需要的资源并且将它初始化为非锁定状态。pshared的取值及其含义：

PTHREAD_PROCESS_SHARED：该自旋锁可以在多个进程中的线程之间共享。

PTHREAD_PROCESS_PRIVATE：仅初始化本自旋锁的线程所在的进程内的线程才能够使用该自旋锁。

3.2 获得一个自旋锁

pthread_spin_lock用来获取（锁定）指定的自旋锁. 如果该自旋锁当前没有被其它线程所持有，则调用该函数的线程获得该自旋锁.否则该函数在获得自旋锁之前不会返回。如果调用该函数的线程在调用该函数时已经持有了该自旋锁，则结果是不确定的。

3.3 尝试获取一个自旋锁

pthread_spin_trylock会尝试获取指定的自旋锁，如果无法获取则理解返回失败。

3.4 释放（解锁）一个自旋锁

pthread_spin_unlock用于释放指定的自旋锁。

3.5 销毁一个自旋锁

pthread_spin_destroy用来销毁指定的自旋锁并释放所有相关联的资源（所谓的所有指的是由pthread_spin_init自动申请的资源）在调用该函数之后如果没有调用pthread_spin_init重新初始化自旋锁，则任何尝试使用该锁的调用的结果都是未定义的。如果调用该函数时自旋锁正在被使用或者自旋锁未被初始化则结果是未定义的。

4 特性对比

锁类型	锁特性	适用场景
互斥锁mutex	会导致线程切换	一般情况下的首选
读写锁rwlock	同一时间只能有一个writer 可以同时有多个reader	读多写少的场景
自旋锁spinlock	不会导致线程切换会导致CPU利用率升高适合小代码段	小代码段，加锁不是很频繁的场景

CAS

一般采用原子级的read-modify-write原语来实现Lock-Free算法，其中LL和SC是Lock-Free理论研究领域的理想原语，但实现这些原语需要CPU指令的支持，非常遗憾的是目前没有任何CPU直接实现了SC原语。根据此理论，业界在原子操作的基础上提出了著名的CAS（Compare-And-Swap）操作来实现Lock-Free算法，Intel实现了一条类似该操作的指令：cmpxchg8。

CAS原语负责将某处内存地址的值（1个字节）与一个期望值进行比较，如果相等，则将该内存地址处的值替换为新值，CAS 操作伪码描述如下：

Bool CAS(T* addr, T expected, T newValue)

{

if(*addr == expected )

{

*addr= newValue;

returntrue;

}

else

returnfalse;

}

CAS实际操作

{

备份旧数据；

基于旧数据构造新数据；

}while(!CAS(内存地址，备份的旧数据，新数据))

就是指当两者进行比较时，如果相等，则证明共享数据没有被修改，替换成新值，然后继续往下运行；如果不相等，说明共享数据已经被修改，放弃已经所做的操作，然后重新执行刚才的操作。容易看出CAS操作是基于共享数据不会被修改的假设，采用了类似于数据库的commit-retry的模式。当同步冲突出现的机会很少时，这种假设能带来较大的性能提升。

CAS的Linux解法

cmpxchg先比较内存地址的值是否与传入的值相等，如果相等则执行xchg逻辑。

inline int CAS(unsigned long* mem, unsignedlong newval, unsigned long oldval)

{

__typeof(*mem) ret;

//这里测试的使用64位系统，如果是32位，这里使用cmpschgl

__asm__volatile ("lock; cmpxchgq %2,%1"

:"=a"(ret), "=m"(*mem)

:"r"(newval), "m"(*mem), "0"(oldval));

returnret==oldval;

}

CAS举例（简单应用AtomicInc）

#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/time.h>
#include <stdint.h>

int count = 0;

inline int CAS(unsigned long* mem, unsigned long oldval, unsigned long newval)
{
	__typeof (*mem) ret;
	// 这里测试的使用64位系统，如果是32位，这里使用cmpschgl
	__asm __volatile ("lock; cmpxchgq %2,%1"
						: "=a"(ret), "=m"(*mem)
						: "r"(newval), "m"(*mem), "0"(oldval));
	return ret==oldval;
}

void AtomicInc(int* addr)
{
	int oldval;
	int newval;
	do
	{
		oldval = *addr;
		newval = oldval+1;
	} while(!CAS((unsigned long*)addr, oldval, newval));
}

void *test_func(void *arg)
{
	int i=0;
	int confict = 0;
	for(i=0;i<2000000;++i)
	{
		AtomicInc(&count);
	}
	return NULL;
}

int main(int argc, const char *argv[])
{
	pthread_t id[20];
	int i = 0;

	uint64_t usetime;
	struct timeval start;
	struct timeval end;
	
	gettimeofday(&start,NULL);
	
	for(i=0;i<20;++i)
	{
		pthread_create(&id[i],NULL,test_func,NULL);
	}

	for(i=0;i<20;++i)
	{
		pthread_join(id[i],NULL);
	}
	
	gettimeofday(&end,NULL);

	usetime = (end.tv_sec-start.tv_sec)*1000000+(end.tv_usec-start.tv_usec);
	printf("count = %d, usetime = %lu usecs\n", count, usetime);
	return 0;
}

CAS举例（复杂应用）

struct Node
{
	Node* next;
	int data;
}
Node* head = NULL;

void push(int t)
{
	Node* node = new Node(t);
	do
	{
		node->next = head;
	} while (!CAS(&head, node->next, node));
}

bool pop(int&t )
{
	Node* current = head;
	while(current)
	{
		if (CAS(&head, current, current->next)) // ABA问题
		{
			t = current->data;
			return true;
		}
		current = head;
	}
	return false;
}

ABA问题

一般的CAS在决定是否要修改某个变量时，会判断一下当前值跟旧值是否相等。如果相等，则认为变量未被其他线程修改，可以改。
但是，“相等”并不真的意味着“未被修改”。另一个线程可能会把变量的值从A改成B，又从B改回成A。这就是ABA问题。
很多情况下，ABA问题不会影响你的业务逻辑因此可以忽略。但有时不能忽略，这时要解决这个问题，一般的做法是给变量关联一个只能递增、不能递减的版本号。在compare时不但compare变量值，还要再compare一下版本号。
Java里的AtomicStampedReference类就是干这个的。

原文地址：http://blog.csdn.net/penngrove/article/details/44175387

最近看到Linux Kernel cmpxchg的代码，对实现很不理解。上网查了内嵌汇编以及Intel开发文档，才慢慢理解了，记录下来以享和我一样困惑的开发者。其实cmpxchg实现的原子操作原理早已被熟知：

cmpxchg(void* ptr, int old, int new)，如果ptr和old的值一样，则把new写到ptr内存，否则返回ptr的值，整个操作是原子的。在Intel平台下，会用lock cmpxchg来实现，这里的lock个人理解是锁住内存总线，这样如果有另一个线程想访问ptr的内存，就会被block住。

好了，让我们来看Linux Kernel中的cmpxchg(网上找来的，我自己机器上没找到对应的头文件，据说在include/asm-i386/cmpxchg.h)实现：

01./* TODO: You should use modern GCC atomic instruction builtins instead of this. */  
02.#include <stdint.h>  
03.#define cmpxchg( ptr, _old, _new ) { \  
04.  volatile uint32_t *__ptr = (volatile uint32_t *)(ptr);   \  
05.  uint32_t __ret;                                     \  
06.  asm volatile( "lock; cmpxchgl %2,%1"           \  
07.    : "=a" (__ret), "+m" (*__ptr)                \  
08.    : "r" (_new), "0" (_old)                     \  
09.    : "memory");                 \  
10.  );                                             \  
11.  __ret;                                         \  
12.}

/* TODO: You should use modern GCC atomic instruction builtins instead of this. */
#include <stdint.h>
#define cmpxchg( ptr, _old, _new ) { \
  volatile uint32_t *__ptr = (volatile uint32_t *)(ptr);   \
  uint32_t __ret;                                     \
  asm volatile( "lock; cmpxchgl %2,%1"           \
    : "=a" (__ret), "+m" (*__ptr)                \
    : "r" (_new), "0" (_old)                     \
    : "memory");				 \
  );                                             \
  __ret;                                         \
}

主要要看懂内嵌汇编，c的内嵌汇编格式是

01.asm ( assembler template  
02.    : output operands                   (optional)  
03.    : input operands                    (optional)  
04.    : clobbered registers list          (optional)  
05.    );

asm ( assembler template
    : output operands                   (optional)
    : input operands                    (optional)
    : clobbered registers list          (optional)
    );

output operands和inpupt operands指定参数，它们从左到右依次排列，用','分割，编号从0开始。以cmpxchg汇编为例，(__ret)对应0，(*__ptr)对应1，(_new)对应2，(_old)对应3，如果在汇编中用到"%2"，那么就是指代_new，"%1"指代(*__ptr)。

"=a"是说要把结果写到__ret中，而且要使用eax寄存器，所以最后写结果的时候是的操作是mov eax, ret (eax==>__ret)。"r" (_new)是要把_new的值读到一个通用寄存器中使用。

在cmpxchg中，注意"0"(_old)，这个是困惑我的地方，它像告诉你(_old)和第0号操作数使用相同的寄存器或者内存，即(_old)的存储在和0号操作数一样的地方。在cmpxchg中，就是说_old和__ret使用一样的寄存器，而__ret使用的寄存器是eax，所以_old也用eax。

明白了这些，再来看cmpxchgl，在Intel开发文档上说：

0F B1/r        CMPXCHG r/m32, r32           MR Valid Valid*          Compare EAX with r/m32. If equal, ZF is set
                                                                                                     and r32 is loaded into r/m32. Else, clear ZF
                                                                                                     and load r/m32 into EAX.

翻译一下：

比较eax和目的操作数(第一个操作数)的值，如果相同，ZF标志被设置，同时源操作数(第二个操作)的值被写到目的操作数，否则，清ZF标志，并且把目的操作数的值写回eax。

好了，把上面这句话套在cmpxchg上就是：

比较_old和(*__ptr)的值，如果相同，ZF标志被设置，同时_new的值被写到(*__ptr)，否则，清ZF标志，并且把(*__ptr)的值写回_old。

很明显，符合我们对cmpxchg的理解。

另：Intel开发手册上说lock就是让CPU排他地使用内存。

无锁编程(五) - RCU(Read-Copy-Update)

RCU（Read-Copy Update）

RCU就是指读-拷贝修改，它是基于其原理命名的。对于被RCU保护的共享数据结构，读操作不需要获得任何锁就可以访问，但写操作在访问它时首先拷贝一个副本，然后对副本进行修改，最后在适当的时机把指向原来数据的指针重新指向新的被修改的数据。这个时机就是所有引用该数据的CPU都退出对共享数据的操作。

Linux内核中内存管理大量的运用到了RCU机制。为每个内存对象增加了一个原子计数器用来继续该对象当前访问数。当没有其他进程在访问该对象时（计数器为0），才允许回收该内存。

从这个流程可以看出，RCU类似于一种读写锁的优化，用于解决读和写之间的同步问题。比较适合读多，写少的情况，当写操作过多的时候，这里的拷贝和修改的成本同样也很大。（写操作和写操作之间的同步还需要其它机制来保证）。

代码讲解：

#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>

pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
int currentidx = 0;
char* str[2] = {0};

void* consume(void *arg)
{
    sleep(1);
    while(1)
    {
        printf("************************consumed %s, index %d, self %d\n",str[currentidx], currentidx, pthread_self());
        sleep(1); 
    }
    
    return NULL;
}

void* produce( void * arg )
{
    const char* s_str1 = "hello";
    const char* s_str2 = "world";
	
    while(1)
    {
	printf("product begin\n");
		
	// read copy
        int other = 1 - currentidx;
	str[other] = (char*)malloc(6);
	if (other == 0)
	{
		strncpy(str[other], s_str1, 6);
	}
	else
	{
		strncpy(str[other], s_str2, 6);
	}
		
	// update原子的修改索引
	currentidx = other;
	// delete old currentidx
	free(str[1-currentidx]);
        sleep(5);
    }
    
    return NULL;
}

int main( void )
{
    pthread_t thread1,thread2;
    pthread_create(&thread1, NULL, &produce, NULL );
    pthread_create(&thread2, NULL, &consume, NULL );
    pthread_join(thread1,NULL);
    pthread_join(thread2,NULL);
    return 0;
}

结果说明：

[root@rocket lock-free]# ./lockfree_rcu

product begin

************************consumed world, index1, self 1395513088

product begin

************************consumed hello, index0, self 1395513088

product begin

************************consumed world, index1, self 1395513088

无锁编程(六) - seqlock(顺序锁)

seqlock（顺序锁）

用于能够区分读与写的场合，并且是读操作很多、写操作很少，写操作的优先权大于读操作。
seqlock的实现思路是，用一个递增的整型数表示sequence。写操作进入临界区时，sequence++；退出临界区时，sequence再++。写操作还需要获得一个锁（比如mutex），这个锁仅用于写写互斥，以保证同一时间最多只有一个正在进行的写操作。
当sequence为奇数时，表示有写操作正在进行，这时读操作要进入临界区需要等待，直到sequence变为偶数。读操作进入临界区时，需要记录下当前sequence的值，等它退出临界区的时候用记录的sequence与当前sequence做比较，不相等则表示在读操作进入临界区期间发生了写操作，这时候读操作读到的东西是无效的，需要返回重试。

seqlock写写是必须要互斥的。但是seqlock的应用场景本身就是读多写少的情况，写冲突的概率是很低的。所以这里的写写互斥基本上不会有什么性能损失。
而读写操作是不需要互斥的。seqlock的应用场景是写操作优先于读操作，对于写操作来说，几乎是没有阻塞的（除非发生写写冲突这一小概率事件），只需要做sequence++这一附加动作。而读操作也不需要阻塞，只是当发现读写冲突时需要retry。

seqlock的一个典型应用是时钟的更新，系统中每1毫秒会有一个时钟中断，相应的中断处理程序会更新时钟（写操作）。而用户程序可以调用gettimeofday之类的系统调用来获取当前时间（读操作）。在这种情况下，使用seqlock可以避免过多的gettimeofday系统调用把中断处理程序给阻塞了（如果使用读写锁，而不用seqlock的话就会这样）。中断处理程序总是优先的，而如果gettimeofday系统调用与之冲突了，那用户程序多等等也无妨。

seqlock的实现非常简单：
写操作进入临界区时：
void write_seqlock(seqlock_t *sl)
{
    spin_lock(&sl->lock); // 上写写互斥锁
    ++sl->sequence; // sequence++
}
写操作退出临界区时：
void write_sequnlock(seqlock_t *sl)
{
    sl->sequence++; // sequence再++
    spin_unlock(&sl->lock); // 释放写写互斥锁
}

读操作进入临界区时：
unsigned read_seqbegin(const seqlock_t *sl)
{
    unsigned ret;
    repeat:
        ret = sl->sequence; // 读sequence值
        if (unlikely(ret & 1)) { // 如果sequence为奇数自旋等待
            goto repeat;
        }
    return ret;
}
读操作尝试退出临界区时：
int read_seqretry(const seqlock_t *sl, unsigned start)
{
    return (sl->sequence != start); //看看sequence与进入临界区时是否发生过改变
}
而读操作一般会这样进行：
do {
    seq = read_seqbegin(&seq_lock);// 进入临界区
    do_something();
} while (read_seqretry(&seq_lock, seq)); // 尝试退出临界区，存在冲突则重试