Multi-thread下对int进行原子操作

原子操作与自旋锁

最新推荐文章于 2025-07-30 09:34:24 发布

翻译最新推荐文章于 2025-07-30 09:34:24 发布 · 890 阅读

Linux 同时被 2 个专栏收录

124 篇文章

订阅专栏

多进程与多线程

7 篇文章

订阅专栏

本文探讨了在多线程环境下对整型变量进行保护的方法，对比了互斥锁、信号量与自旋锁的使用场景，并介绍了如何利用处理器特性实现原子变量。此外，还提供了一个使用GCC内置原子操作函数的示例程序。

原文链接

这是一系列文章，作者做了详细的讨论。

http://www.alexonlinux.com/do-you-need-mutex-to-protect-int
http://www.alexonlinux.com/pthread-spinlocks
http://www.alexonlinux.com/multithreaded-simple-data-type-access-and-atomic-variables

学习总结

作者首先提出了一个问题：是否需要用mutex对int进行protect，对一个int使用mutex或semaphore进行保护开销太大了，然后得出结论是使用spinlock，即自旋锁（busy-wait），比较合适。但是在获取自旋锁时要记得不能进行其他较为耗时的操作。

Intel x86 and x86_64 processor architectures (as well as vast majority of other modern CPU architectures) has instructions that allow one to lock FSB, while doing some memory access. FSB stands for Front Serial Bus. This is the bus that processor use to communicate with RAM. I.e. locking FSB will prevent from any other processor (core), and process running on that processor, from accessing RAM. And this is exactly what we need to implement atomic variables.
Atomic variables being widely used in kernel, but from some reason no-one bothered to implement them for user-mode folks. Until gcc 4.1.2.

作者提到，Intel x86/x86_64处理器架构有只允许单独锁住FSB的指令。FSB即为Front Serial Bus，处理器使用该总线与RAM交互。某个处理器（记为A）锁住FSB之后，其他处理器，以及在A上运行的进程都不能访问RAM，因此可以使用这一点实现atomic variable。
atomic variable在linux kernel中被广泛使用，但是直到gcc 4.1.2才可以在用户模式下使用。

测试程序

#define _GNU_SOURCE
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <stdlib.h>
#include <sched.h>
#include <linux/unistd.h>
#include <sys/syscall.h>
#include <errno.h>

#define INC_TO 1000000 // one million...

int global_int = 0;

pid_t gettid( void )
{
    return syscall( __NR_gettid );
}

void *thread_routine( void *arg )
{
    int i;
    int proc_num = (int)(long)arg;
    cpu_set_t set;

    CPU_ZERO( &set );
    CPU_SET( proc_num, &set );

    if (sched_setaffinity( gettid(), sizeof( cpu_set_t ), &set ))
    {
        perror( "sched_setaffinity" );
        return NULL;
    }

    for (i = 0; i < INC_TO; i++)
    {
        // global_int++; // 不使用原子操作，结果是不对的，不过作者指出加上-O2能得到正确结果
        __sync_fetch_and_add( &global_int, 1 );
    }

    return NULL;
}

int main()
{
    int procs = 0;
    int i;
    pthread_t *thrs;

    // Getting number of CPUs
    procs = (int)sysconf( _SC_NPROCESSORS_ONLN );
    if (procs < 0)
    {
        perror( "sysconf" );
        return -1;
    }

    thrs = malloc( sizeof( pthread_t ) * procs );
    if (thrs == NULL)
    {
        perror( "malloc" );
        return -1;
    }

    printf( "Starting %d threads...\n", procs );

    for (i = 0; i < procs; i++)
    {
        if (pthread_create( &thrs[i], NULL, thread_routine,
            (void *)(long)i ))
        {
            perror( "pthread_create" );
            procs = i;
            break;
        }
    }

    for (i = 0; i < procs; i++)
        pthread_join( thrs[i], NULL );

    free( thrs );

    printf( "After doing all the math, global_int value is: %d\n",
        global_int );
    printf( "Expected value is: %d\n", INC_TO * procs );

    return 0;
}

编译

$ gcc -pthread yourfilename

程序首先获取处理器个数N，然后创建N个线程分别在N个处理器上运行（原来还可以这么完>_<）。这四个线程都对global_int进行加1操作，正确结果应该是1000000×N。如果不进行一些同步控制（mutex、semaphore、spinlock）或使用上面提到的built-in原子操作函数，那么结果是随机的。尽管加上-O2也能得到正确结果，但是还是使用原子操作比较合理。

一些细节

for (i = 0; i < procs; i++)
{
  if (pthread_create( &thrs[i], NULL, thread_routine,
    (void *)(long)i ))
  // ...
}

注意pthread_create传递的最后一个参数是否会出现“竞争”。这里的代码不会出现。因为i是以值传递过去的。