条件变量“丢信号”如何看待！

最新推荐文章于 2022-10-27 15:43:35 发布

张飞online

最新推荐文章于 2022-10-27 15:43:35 发布

阅读量1.9k

点赞数

分类专栏： linux应用层

本文链接：https://blog.csdn.net/u013372900/article/details/80822932

版权

linux应用层专栏收录该内容

11 篇文章 1 订阅

订阅专栏

首先我要纠正一个观点，那就是条件变量丢信号。许多人在使用了条件变量后，由于应用场景和应用条件的不满足得出这样的观点。

所谓的条件变量丢信号，其实是大家在使用的时候，没有满足条件变量的使用条件，——：先wait，后发信号。

如果你没有满足这样的条件，在linux下是必须丢，注意我这里用的是必须丢，而不是可能。之所以有这样的问题，这和条件变量的底层实现有关，在linux平台下底层的条件变量实现的时候，类比电路，条件变量的信号是”边沿机制，而非电平机制“，其实底层在实现的时候，当你调用发送信号后，会检查当前是否有等待成员，如果没有人wait直接就返回，那也就意味着你先发信号后wait这种实现方案会产生“丢信号”。

当然也推荐看看这篇博客：

https://blog.csdn.net/absurd/article/details/1402433

先来个小例子：

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>
#include <string.h>


pthread_mutex_t m_pthreadMutex =PTHREAD_MUTEX_INITIALIZER;  
pthread_cond_t m_pthreadCond = PTHREAD_COND_INITIALIZER;  
int is_ok=0;
void * ThreadFunc(void * i)
{

	/*!*/

	pthread_mutex_lock(&m_pthreadMutex);  
	while(1)
	{
	pthread_cond_wait(&m_pthreadCond,&m_pthreadMutex);  
	printf("wait-----------\n");
	
	}
	pthread_mutex_unlock(&m_pthreadMutex);  
	return NULL;

}


int main (void)
{

	pthread_t  tid;
    pthread_create(&tid,NULL,ThreadFunc,NULL);  

	sleep(4);
	int cnt=10;
	while(cnt -- )
	
	{
	pthread_mutex_lock(&m_pthreadMutex); 
	pthread_cond_signal(&m_pthreadCond);
	printf("signal\n");
	pthread_mutex_unlock(&m_pthreadMutex);  
	}
	pthread_join(tid,NULL);
	return 0;
}

显然上面的结果和我们的需求以及理解完全不相符，我们愿意是发送一次signal打印一次wait ，但是事实是发送了一堆signal只有一次wait。

抢锁

解锁

[区间可以抢]

休眠

抢锁[不一定成功]

解锁

条件变量，可能发生的事情，

1.先发了几个信号，那边还没有wait

2.解锁等待的时候，一起来了几个信号

都说：源码之前，了无秘密。

https://github.com/lattera/glibc/blob/895ef79e04a953cac1493863bcae29ad85657ee1/nptl/pthread_cond_wait.c

基本就是下面的逻辑了

static __always_inline int
__pthread_cond_wait_common (pthread_cond_t *cond, pthread_mutex_t *mutex,
    const struct timespec *abstime)
{
  const int maxspin = 0;
  int err;
  int result = 0;

  LIBC_PROBE (cond_wait, 2, cond, mutex);

  /* Acquire a position (SEQ) in the waiter sequence (WSEQ).  We use an
     atomic operation because signals and broadcasts may update the group
     switch without acquiring the mutex.  We do not need release MO here
     because we do not need to establish any happens-before relation with
     signalers (see __pthread_cond_signal); modification order alone
     establishes a total order of waiters/signals.  We do need acquire MO
     to synchronize with group reinitialization in
     __condvar_quiesce_and_switch_g1.  */
  uint64_t wseq = __condvar_fetch_add_wseq_acquire (cond, 2);
  /* Find our group's index.  We always go into what was G2 when we acquired
     our position.  */
  unsigned int g = wseq & 1;
  uint64_t seq = wseq >> 1;

  /* Increase the waiter reference count.  Relaxed MO is sufficient because
     we only need to synchronize when decrementing the reference count.  */
  unsigned int flags = atomic_fetch_add_relaxed (&cond->__data.__wrefs, 8);
  int private = __condvar_get_private (flags);

  /* Now that we are registered as a waiter, we can release the mutex.
     Waiting on the condvar must be atomic with releasing the mutex, so if
     the mutex is used to establish a happens-before relation with any
     signaler, the waiter must be visible to the latter; thus, we release the
     mutex after registering as waiter.
     If releasing the mutex fails, we just cancel our registration as a
     waiter and confirm that we have woken up.  */
  err = __pthread_mutex_unlock_usercnt (mutex, 0);
  if (__glibc_unlikely (err != 0))
    {
      __condvar_cancel_waiting (cond, seq, g, private);
      __condvar_confirm_wakeup (cond, private);
      return err;
    }

  /* Now wait until a signal is available in our group or it is closed.
     Acquire MO so that if we observe a value of zero written after group
     switching in __condvar_quiesce_and_switch_g1, we synchronize with that
     store and will see the prior update of __g1_start done while switching
     groups too.  */
  unsigned int signals = atomic_load_acquire (cond->__data.__g_signals + g);

  do
    {
      while (1)
	{
	  /* Spin-wait first.
	     Note that spinning first without checking whether a timeout
	     passed might lead to what looks like a spurious wake-up even
	     though we should return ETIMEDOUT (e.g., if the caller provides
	     an absolute timeout that is clearly in the past).  However,
	     (1) spurious wake-ups are allowed, (2) it seems unlikely that a
	     user will (ab)use pthread_cond_wait as a check for whether a
	     point in time is in the past, and (3) spinning first without
	     having to compare against the current time seems to be the right
	     choice from a performance perspective for most use cases.  */
	  unsigned int spin = maxspin;
	  while (signals == 0 && spin > 0)
	    {
	      /* Check that we are not spinning on a group that's already
		 closed.  */
	      if (seq < (__condvar_load_g1_start_relaxed (cond) >> 1))
		goto done;

	      /* TODO Back off.  */

	      /* Reload signals.  See above for MO.  */
	      signals = atomic_load_acquire (cond->__data.__g_signals + g);
	      spin--;
	    }

	  /* If our group will be closed as indicated by the flag on signals,
	     don't bother grabbing a signal.  */
	  if (signals & 1)
	    goto done;

	  /* If there is an available signal, don't block.  */
	  if (signals != 0)
	    break;

	  /* No signals available after spinning, so prepare to block.
	     We first acquire a group reference and use acquire MO for that so
	     that we synchronize with the dummy read-modify-write in
	     __condvar_quiesce_and_switch_g1 if we read from that.  In turn,
	     in this case this will make us see the closed flag on __g_signals
	     that designates a concurrent attempt to reuse the group's slot.
	     We use acquire MO for the __g_signals check to make the
	     __g1_start check work (see spinning above).
	     Note that the group reference acquisition will not mask the
	     release MO when decrementing the reference count because we use
	     an atomic read-modify-write operation and thus extend the release
	     sequence.  */
	  atomic_fetch_add_acquire (cond->__data.__g_refs + g, 2);
	  if (((atomic_load_acquire (cond->__data.__g_signals + g) & 1) != 0)
	      || (seq < (__condvar_load_g1_start_relaxed (cond) >> 1)))
	    {
	      /* Our group is closed.  Wake up any signalers that might be
		 waiting.  */
	      __condvar_dec_grefs (cond, g, private);
	      goto done;
	    }

	  // Now block.
	  struct _pthread_cleanup_buffer buffer;
	  struct _condvar_cleanup_buffer cbuffer;
	  cbuffer.wseq = wseq;
	  cbuffer.cond = cond;
	  cbuffer.mutex = mutex;
	  cbuffer.private = private;
	  __pthread_cleanup_push (&buffer, __condvar_cleanup_waiting, &cbuffer);

	  if (abstime == NULL)
	    {
	      /* Block without a timeout.  */
	      err = futex_wait_cancelable (
		  cond->__data.__g_signals + g, 0, private);
	    }
	  else
	    {
	      /* Block, but with a timeout.
		 Work around the fact that the kernel rejects negative timeout
		 values despite them being valid.  */
	      if (__glibc_unlikely (abstime->tv_sec < 0))
	        err = ETIMEDOUT;

	      else if ((flags & __PTHREAD_COND_CLOCK_MONOTONIC_MASK) != 0)
		{
		  /* CLOCK_MONOTONIC is requested.  */
		  struct timespec rt;
		  if (__clock_gettime (CLOCK_MONOTONIC, &rt) != 0)
		    __libc_fatal ("clock_gettime does not support "
				  "CLOCK_MONOTONIC");
		  /* Convert the absolute timeout value to a relative
		     timeout.  */
		  rt.tv_sec = abstime->tv_sec - rt.tv_sec;
		  rt.tv_nsec = abstime->tv_nsec - rt.tv_nsec;
		  if (rt.tv_nsec < 0)
		    {
		      rt.tv_nsec += 1000000000;
		      --rt.tv_sec;
		    }
		  /* Did we already time out?  */
		  if (__glibc_unlikely (rt.tv_sec < 0))
		    err = ETIMEDOUT;
		  else
		    err = futex_reltimed_wait_cancelable
			(cond->__data.__g_signals + g, 0, &rt, private);
		}
	      else
		{
		  /* Use CLOCK_REALTIME.  */
		  err = futex_abstimed_wait_cancelable
		      (cond->__data.__g_signals + g, 0, abstime, private);
		}
	    }

	  __pthread_cleanup_pop (&buffer, 0);

	  if (__glibc_unlikely (err == ETIMEDOUT))
	    {
	      __condvar_dec_grefs (cond, g, private);
	      /* If we timed out, we effectively cancel waiting.  Note that
		 we have decremented __g_refs before cancellation, so that a
		 deadlock between waiting for quiescence of our group in
		 __condvar_quiesce_and_switch_g1 and us trying to acquire
		 the lock during cancellation is not possible.  */
	      __condvar_cancel_waiting (cond, seq, g, private);
	      result = ETIMEDOUT;
	      goto done;
	    }
	  else
	    __condvar_dec_grefs (cond, g, private);

	  /* Reload signals.  See above for MO.  */
	  signals = atomic_load_acquire (cond->__data.__g_signals + g);
	}

    }
  /* Try to grab a signal.  Use acquire MO so that we see an up-to-date value
     of __g1_start below (see spinning above for a similar case).  In
     particular, if we steal from a more recent group, we will also see a
     more recent __g1_start below.  */
  while (!atomic_compare_exchange_weak_acquire (cond->__data.__g_signals + g,
						&signals, signals - 2));

  /* We consumed a signal but we could have consumed from a more recent group
     that aliased with ours due to being in the same group slot.  If this
     might be the case our group must be closed as visible through
     __g1_start.  */
  uint64_t g1_start = __condvar_load_g1_start_relaxed (cond);
  if (seq < (g1_start >> 1))
    {
      /* We potentially stole a signal from a more recent group but we do not
	 know which group we really consumed from.
	 We do not care about groups older than current G1 because they are
	 closed; we could have stolen from these, but then we just add a
	 spurious wake-up for the current groups.
	 We will never steal a signal from current G2 that was really intended
	 for G2 because G2 never receives signals (until it becomes G1).  We
	 could have stolen a signal from G2 that was conservatively added by a
	 previous waiter that also thought it stole a signal -- but given that
	 that signal was added unnecessarily, it's not a problem if we steal
	 it.
	 Thus, the remaining case is that we could have stolen from the current
	 G1, where "current" means the __g1_start value we observed.  However,
	 if the current G1 does not have the same slot index as we do, we did
	 not steal from it and do not need to undo that.  This is the reason
	 for putting a bit with G2's index into__g1_start as well.  */
      if (((g1_start & 1) ^ 1) == g)
	{
	  /* We have to conservatively undo our potential mistake of stealing
	     a signal.  We can stop trying to do that when the current G1
	     changes because other spinning waiters will notice this too and
	     __condvar_quiesce_and_switch_g1 has checked that there are no
	     futex waiters anymore before switching G1.
	     Relaxed MO is fine for the __g1_start load because we need to
	     merely be able to observe this fact and not have to observe
	     something else as well.
	     ??? Would it help to spin for a little while to see whether the
	     current G1 gets closed?  This might be worthwhile if the group is
	     small or close to being closed.  */
	  unsigned int s = atomic_load_relaxed (cond->__data.__g_signals + g);
	  while (__condvar_load_g1_start_relaxed (cond) == g1_start)
	    {
	      /* Try to add a signal.  We don't need to acquire the lock
		 because at worst we can cause a spurious wake-up.  If the
		 group is in the process of being closed (LSB is true), this
		 has an effect similar to us adding a signal.  */
	      if (((s & 1) != 0)
		  || atomic_compare_exchange_weak_relaxed
		       (cond->__data.__g_signals + g, &s, s + 2))
		{
		  /* If we added a signal, we also need to add a wake-up on
		     the futex.  We also need to do that if we skipped adding
		     a signal because the group is being closed because
		     while __condvar_quiesce_and_switch_g1 could have closed
		     the group, it might stil be waiting for futex waiters to
		     leave (and one of those waiters might be the one we stole
		     the signal from, which cause it to block using the
		     futex).  */
		  futex_wake (cond->__data.__g_signals + g, 1, private);
		  break;
		}
	      /* TODO Back off.  */
	    }
	}
    }

 done:

  /* Confirm that we have been woken.  We do that before acquiring the mutex
     to allow for execution of pthread_cond_destroy while having acquired the
     mutex.  */
  __condvar_confirm_wakeup (cond, private);

  /* Woken up; now re-acquire the mutex.  If this doesn't fail, return RESULT,
     which is set to ETIMEDOUT if a timeout occured, or zero otherwise.  */
  err = __pthread_mutex_cond_lock (mutex);
  /* XXX Abort on errors that are disallowed by POSIX?  */
  return (err != 0) ? err : result;
}

可以看到的逻辑是首先解锁，__pthread_mutex_unlock_usercnt，这个时候自旋原子判断条件状况，然后休眠