pthread_cond_wait()使用、执行过程及一些问题

最新推荐文章于 2024-07-15 22:20:01 发布

路北

最新推荐文章于 2024-07-15 22:20:01 发布

阅读量2k

点赞数 1

分类专栏： linux多线程及同步 Linux

Linux 同时被 2 个专栏收录

13 篇文章 0 订阅

订阅专栏

linux多线程及同步

6 篇文章 0 订阅

订阅专栏

原文：http://hi.baidu.com/susdisk/blog/item/48ca2d8fc88b5ef3503d925f.html
LINUX环境下多线程编程肯定会遇到需要条件变量的情况，此时必然要使用pthread_cond_wait()函数。但这个函数的执行过程比较难于理解。
    pthread_cond_wait()的工作流程如下（以MAN中的EXAMPLE为例）：
       Consider two shared variables x and y, protected by the mutex mut, and a condition vari-
       able cond that is to be signaled whenever x becomes greater than y.

              int x,y;
              pthread_mutex_t mut = PTHREAD_MUTEX_INITIALIZER;
              pthread_cond_t cond = PTHREAD_COND_INITIALIZER;

Waiting until x is greater than y is performed as follows:

              pthread_mutex_lock(&mut);
              while (x <= y) {
                      pthread_cond_wait(&cond, &mut);
              }
              /* operate on x and y */
              pthread_mutex_unlock(&mut);

Modifications on x and y that may cause x to become greater than y should signal the con-
dition if needed:

              pthread_mutex_lock(&mut);
              /* modify x and y */
              if (x > y) pthread_cond_broadcast(&cond);
              pthread_mutex_unlock(&mut);

这个例子的意思是，两个线程要修改X和 Y的值，第一个线程当X<=Y时就挂起，直到X>Y时才继续执行（由第二个线程可能会修改X,Y的值，当X>Y时唤醒第一个线程），即首先初始化一个普通互斥量mut和一个条件变量cond。之后分别在两个线程中分别执行如下函数体：

和：       pthread_mutex_lock(&mut);
              /* modify x and y */
              if (x > y) pthread_cond_signal(&cond);
              pthread_mutex_unlock(&mut);
    其实函数的执行过程非常简单，在第一个线程执行到pthread_cond_wait(&cond,&mut)时，此时如果X<=Y，则此函数就将mut互斥量解锁，再将cond条件变量加锁，此时第一个线程挂起（不占用任何CPU周期）。
    而在第二个线程中，本来因为mut被第一个线程锁住而阻塞，此时因为mut已经释放，所以可以获得锁mut，并且进行修改X和Y的值，在修改之后，一个IF语句判定是不是X>Y，如果是，则此时pthread_cond_signal()函数会唤醒第一个线程，并在下一句中释放互斥量mut。然后第一个线程开始从pthread_cond_wait()执行，首先要再次锁mut，如果锁成功，再进行条件的判断（至于为什么用WHILE，即在被唤醒之后还要再判断，后面有原因分析），如果满足条件，则被唤醒进行处理，最后释放互斥量mut。

至于为什么在被唤醒之后还要再次进行条件判断（即为什么要使用while循环来判断条件），是因为可能有“惊群效应”。有人觉得此处既然是被唤醒的，肯定是满足条件了，其实不然。如果是多个线程都在等待这个条件，而同时只能有一个线程进行处理，此时就必须要再次条件判断，以使只有一个线程进入临界区处理。对此，转来一段：

引用下POSIX的RATIONALE：

Condition Wait Semantics

It is important to note that when pthread_cond_wait() and pthread_cond_timedwait() return without error, the associated predicate may still be false. Similarly, when pthread_cond_timedwait() returns with the timeout error, the associated predicate may be true due to an unavoidable race between the expiration of the timeout and the predicate state change.

The application needs to recheck the predicate on any return because it cannot be sure there is another thread waiting on the thread to handle the signal, and if there is not then the signal is lost. The burden is on the application to check the predicate.

Some implementations, particularly on a multi-processor, may sometimes cause multiple threads to wake up when the condition variable is signaled simultaneously on different processors.

In general, whenever a condition wait returns, the thread has to re-evaluate the predicate associated with the condition wait to determine whether it can safely proceed, should wait again, or should declare a timeout. A return from the wait does not imply that the associated predicate is either true or false.

It is thus recommended that a condition wait be enclosed in the equivalent of a "while loop" that checks the predicate.

从上文可以看出：
1，pthread_cond_signal在多处理器上可能同时唤醒多个线程，当你只能让一个线程处理某个任务时，其它被唤醒的线程就需要继续 wait,while循环的意义就体现在这里了，而且规范要求pthread_cond_signal至少唤醒一个pthread_cond_wait上的线程，其实有些实现为了简单在单处理器上也会唤醒多个线程.
2，某些应用，如线程池，pthread_cond_broadcast唤醒全部线程，但我们通常只需要一部分线程去做执行任务，所以其它的线程需要继续wait.所以强烈推荐此处使用while循环.

其实说白了很简单，就是pthread_cond_signal()也可能唤醒多个线程，而如果你同时只允许一个线程访问的话，就必须要使用while来进行条件判断，以保证临界区内只有一个线程在处理。

///

pthread_cond_wait()

/************pthread_cond_wait()的使用方法**********/

pthread_mutex_lock(&qlock);

pthread_cond_wait(&qready, &qlock);

pthread_mutex_unlock(&qlock);

/*****************************************************/

The mutex passed to pthread_cond_wait protects the condition.The caller passes it locked to the function, which then atomically places them calling thread on the list of threads waiting for the condition and unlocks the mutex. This closes the window between the time that the condition is checked and the time that the thread goes to sleep waiting for the condition to change, so that the thread doesn't miss a change in the condition. When pthread_cond_wait returns, the mutex is again locked.

上面是APUE的原话，就是说pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex)函数传入的参数mutex用于保护条件，因为我们在调用pthread_cond_wait时，如果条件不成立我们就进入阻塞，但是进入阻塞这个期间，如果条件变量改变了的话，那我们就漏掉了这个条件。因为这个线程还没有放到等待队列上，所以调用pthread_cond_wait前要先锁互斥量，即调用pthread_mutex_lock(),pthread_cond_wait在把线程放进阻塞队列后，自动对mutex进行解锁，使得其它线程可以获得加锁的权利。这样其它线程才能对临界资源进行访问并在适当的时候唤醒这个阻塞的进程。当pthread_cond_wait返回的时候又自动给mutex加锁。

实际上边代码的加解锁过程如下：

/************pthread_cond_wait()的使用方法**********/

pthread_mutex_lock(&qlock); /*lock*/

pthread_cond_wait(&qready, &qlock); /*block-->unlock-->wait() return-->lock*/

pthread_mutex_unlock(&qlock); /*unlock*/

/*****************************************************/

#include <sys/types.h>
#include <sys/socket.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <errno.h>
#include <stdlib.h>
#include <time.h>
#include <pthread.h>

void* testThreadPool(int *t);
pthread_mutex_t clifd_mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t clifd_cond = PTHREAD_COND_INITIALIZER;
int a = 0;

int main() {

int sock_fd, conn_fd;
int optval;

socklen_t cli_len;
struct sockaddr_in cli_addr, serv_addr;

sock_fd = socket(AF_INET, SOCK_STREAM, 0);
if (sock_fd < 0) {
printf("socket\n");
}

optval = 1;
if (setsockopt(sock_fd, SOL_SOCKET, SO_REUSEADDR, (void *) &optval,
sizeof(int)) < 0) {
printf("setsockopt\n");
}

memset(&serv_addr, 0, sizeof(struct sockaddr_in));
serv_addr.sin_family = AF_INET;
serv_addr.sin_port = htons(4507);
serv_addr.sin_addr.s_addr = htonl(INADDR_ANY);

if (bind(sock_fd, (struct sockaddr *) &serv_addr,
sizeof(struct sockaddr_in)) < 0) {
printf("bind\n");
}

if (listen(sock_fd, 100) < 0) {
printf("listen\n");
}

cli_len = sizeof(struct sockaddr_in);
int t;
pthread_t * mythread;
mythread = (pthread_t*) malloc(100 * sizeof(pthread_t));
for (t = 0; t < 5; t++) {
   int *i=(int*)malloc(sizeof(int));
   *i=t;
   if (pthread_create(&mythread[t], NULL, (void*)testThreadPool, (void*)i) != 0) {
    printf("pthread_create");
   }
}

while (1) {
   conn_fd = accept(sock_fd, (struct sockaddr *) &cli_addr, &cli_len);
   if (conn_fd < 0) {
    printf("accept\n");
   }
   printf("accept a new client, ip:%s\n", inet_ntoa(cli_addr.sin_addr));

   pthread_mutex_lock(&clifd_mutex);
   a=conn_fd;
   pthread_cond_signal(&clifd_cond);
   pthread_mutex_unlock(&clifd_mutex);
}
return 0;
}

void* testThreadPool(int *t) {

printf("t is %d\n", *t);
for (;;) {
   pthread_mutex_lock(&clifd_mutex);
   pthread_cond_wait(&clifd_cond, &clifd_mutex);
   printf("a is %d\n", a);
   printf("t is %d\n", *t);
   pthread_mutex_unlock(&clifd_mutex);
   sleep(100);
}
return (void*) 0;
}

了解 pthread_cond_wait() 的作用非常重要 -- 它是 POSIX 线程信号发送系统的核心，也是最难以理解的部分。

首先，让我们考虑以下情况：线程为查看已链接列表而锁定了互斥对象，然而该列表恰巧是空的。这一特定线程什么也干不了 -- 其设计意图是从列表中除去节点，但是现在却没有节点。因此，它只能：

锁定互斥对象时，线程将调用 pthread_cond_wait(&mycond,&mymutex)。pthread_cond_wait() 调用相当复杂，因此我们每次只执行它的一个操作。

pthread_cond_wait() 所做的第一件事就是同时对互斥对象解锁（于是其它线程可以修改已链接列表），并等待条件 mycond 发生（这样当 pthread_cond_wait() 接收到另一个线程的“信号”时，它将苏醒）。现在互斥对象已被解锁，其它线程可以访问和修改已链接列表，可能还会添加项。【要求解锁并阻塞是一个原子操作】

此时，pthread_cond_wait() 调用还未返回。对互斥对象解锁会立即发生，但等待条件 mycond 通常是一个阻塞操作，这意味着线程将睡眠，在它苏醒之前不会消耗 CPU 周期。这正是我们期待发生的情况。线程将一直睡眠，直到特定条件发生，在这期间不会发生任何浪费 CPU 时间的繁忙查询。从线程的角度来看，它只是在等待 pthread_cond_wait() 调用返回。

现在继续说明，假设另一个线程（称作“2 号线程”）锁定了 mymutex 并对已链接列表添加了一项。在对互斥对象解锁之后，2 号线程会立即调用函数 pthread_cond_broadcast(&mycond)。此操作之后，2 号线程将使所有等待 mycond 条件变量的线程立即苏醒。这意味着第一个线程（仍处于 pthread_cond_wait() 调用中）现在将苏醒。

现在，看一下第一个线程发生了什么。您可能会认为在 2 号线程调用 pthread_cond_broadcast(&mymutex) 之后，1 号线程的 pthread_cond_wait() 会立即返回。不是那样！实际上，pthread_cond_wait() 将执行最后一个操作：重新锁定 mymutex。一旦 pthread_cond_wait() 锁定了互斥对象，那么它将返回并允许 1 号线程继续执行。那时，它可以马上检查列表，查看它所感兴趣的更改。

停止并回顾！
那个过程非常复杂，因此让我们先来回顾一下。第一个线程首先调用：
pthread_mutex_lock(&mymutex);
然后，它检查了列表。没有找到感兴趣的东西，于是它调用：
pthread_cond_wait(&mycond, &mymutex);
　
然后，pthread_cond_wait() 调用在返回前执行许多操作：

pthread_mutex_unlock(&mymutex);

它对 mymutex 解锁，然后进入睡眠状态，等待 mycond 以接收 POSIX 线程“信号”。一旦接收到“信号”（加引号是因为我们并不是在讨论传统的 UNIX 信号，而是来自 pthread_cond_signal() 或 pthread_cond_broadcast() 调用的信号），它就会苏醒。但 pthread_cond_wait() 没有立即返回 -- 它还要做一件事：重新锁定 mutex：
pthread_mutex_lock(&mymutex);

pthread_cond_wait() 知道我们在查找 mymutex “背后”的变化，因此它继续操作，为我们锁定互斥对象，然后才返回。