RCU概念

kernel/common/Documentation/RCU/stallwarn.txt

RCU(Read-Copy Update) 

  • [Read]指的是对于被RCU保护的共享数据,reader可以直接访问,不需要获得任何锁;
  • [Copy Update]指的是writer修改数据前首先拷贝一个副本,然后在副本上进行修改,修改完毕后向reclaimer(垃圾回收器)注册一个回调函数(callback),在适当的时机完成真正的修改操作–把原数据的指针重新指向新的被修改的数据,–这里所说的适当的时机就是当既有的reader全都退出临界区的时候,而等待恰当时机的过程被称为grace period 。
  • writer不需要和reader竞争任何锁,只在有多个writer的情况下它们之间需要某种锁进行同步作,如果写操作频繁的话RCU的性能会严重下降,所以RCU只适用于读多写少的情况

What Causes RCU CPU Stall Warnings?


o	A CPU looping in an RCU read-side critical section.

o	A CPU looping with interrupts disabled.

o	A CPU looping with preemption disabled.  This condition can
	result in RCU-sched stalls and, if ksoftirqd is in use, RCU-bh
	stalls.

o	A CPU looping with bottom halves disabled.  This condition can
	result in RCU-sched and RCU-bh stalls.

o	For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel
	without invoking schedule().  If the looping in the kernel is
	really expected and desirable behavior, you might need to add
	some calls to cond_resched().

o	Booting Linux using a console connection that is too slow to
	keep up with the boot-time console-message rate.  For example,
	a 115Kbaud serial console can be -way- too slow to keep up
	with boot-time message rates, and will frequently result in
	RCU CPU stall warning messages.  Especially if you have added
	debug printk()s.

o	Anything that prevents RCU's grace-period kthreads from running.
	This can result in the "All QSes seen" console-log message.
	This message will include information on when the kthread last
	ran and how often it should be expected to run.  It can also
	result in the "rcu_.*kthread starved for" console-log message,
	which will include additional debugging information.

o	A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might
	happen to preempt a low-priority task in the middle of an RCU
	read-side critical section.   This is especially damaging if
	that low-priority task is not permitted to run on any other CPU,
	in which case the next RCU grace period can never complete, which
	will eventually cause the system to run out of memory and hang.
	While the system is in the process of running itself out of
	memory, you might see stall-warning messages.

o	A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
	is running at a higher priority than the RCU softirq threads.
	This will prevent RCU callbacks from ever being invoked,
	and in a CONFIG_PREEMPT_RCU kernel will further prevent
	RCU grace periods from ever completing.  Either way, the
	system will eventually run out of memory and hang.  In the
	CONFIG_PREEMPT_RCU case, you might see stall-warning
	messages.

o	A periodic interrupt whose handler takes longer than the time
	interval between successive pairs of interrupts.  This can
	prevent RCU's kthreads and softirq handlers from running.
	Note that certain high-overhead debugging options, for example
	the function_graph tracer, can result in interrupt handler taking
	considerably longer than normal, which can in turn result in
	RCU CPU stall warnings.

o	Testing a workload on a fast system, tuning the stall-warning
	timeout down to just barely avoid RCU CPU stall warnings, and then
	running the same workload with the same stall-warning timeout on a
	slow system.  Note that thermal throttling and on-demand governors
	can cause a single system to be sometimes fast and sometimes slow!

o	A hardware or software issue shuts off the scheduler-clock
	interrupt on a CPU that is not in dyntick-idle mode.  This
	problem really has happened, and seems to be most likely to
	result in RCU CPU stall warnings for CONFIG_NO_HZ_COMMON=n kernels.

o	A bug in the RCU implementation.

o	A hardware failure.  This is quite unlikely, but has occurred
	at least once in real life.  A CPU failed in a running system,
	becoming unresponsive, but not causing an immediate crash.
	This resulted in a series of RCU CPU stall warnings, eventually
	leading the realization that the CPU had failed.
  • RCU的bug通常可以通过config_rcu_trace和RCU的事件跟踪来调试。有关RCU事件跟踪的信息,请参见include/trace/events/ RCU .h。
  • CONFIG_RCU_CPU_STALL_TIMEOUT :这个内核配置参数定义了RCU从宽限期开始等待到发出RCU CPU stall警告的时间。这个时间段通常是21秒。
  • RCU_STALL_DELAY_DELTA:尽管lockdeep工具非常有用,但它确实增加了一些开销。因此,在CONFIG_PROVE_RCU下,RCU_STALL_DELAY_DELTA宏在给RCU CPU stall警告消息之前允许额外的5秒。(这是一个cpp宏,而不是内核配置参数。)
  • RCU_STALL_RAT_DELAY:CPU失速检测器试图让有问题的CPU打印自己的警告,因为这通常会提供质量更好的堆栈跟踪。但是,如果有问题的CPU没有在RCU_STALL_RAT_DELAY指定的jiffies数中检测到自己的延迟,那么其他CPU将会抱怨。这个延迟通常设置为两个瞬间。(这是一个cpp宏,而不是内核配置参数。)

异常分析


INFO: rcu_sched detected stalls on CPUs/tasks:
	2-...: (3 GPs behind) idle=06c/0/0 softirq=1453/1455 fqs=0
	16-...: (0 ticks this GP) idle=81c/0/0 softirq=764/764 fqs=0
	(detected by 32, t=2603 jiffies, g=7075, q=625)
  •  此消息表示CPU 32检测到CPU 2和CPU 16都造成了延迟,并且延迟影响了RCU-sched。此消息之后通常会有每个CPU的堆栈转储。请注意,PREEMPT_RCU的构建可以被任务和cpu暂停,任务将由PID表示,例如“P3421”。rcu_preempt_state甚至有可能同时由cpu和任务引起,在这种情况下,有问题的cpu和任务都将在列表中被调用。
  • CPU 2的“(3 GPs后)”表示该CPU在过去的3个宽限期内没有与RCU核心交互。相反,CPU 16的“(0 ticks this GP)”表示该CPU在当前停滞宽限期内没有接受任何调度时钟中断。
  • 消息的“idle=”部分打印动态空闲状态。第一个“/”之前的十六进制数是动态计数器的低阶12位,如果CPU处于动态空闲模式,它将具有偶数值,否则具有奇数值。两个“/”之间的十六进制数是嵌套的值,如果在空闲循环中(如上所示),它将是一个小的非负数,否则将是一个非常大的正数。
  • 消息的"softirq="部分跟踪RCU的软处理程序的数量,该暂停的CPU已经执行。“/”前面的数字是自启动以来执行的数量,该CPU最后一次注意到宽限期的开始,该宽限期可能是当前的(停止的)宽限期,也可能是更早的宽限期(对于)例如,如果CPU可能在很长一段时间内处于动态空闲模式。“/”后面的数字是自引导到当前时间为止执行的数字。如果后一个数字在重复的延迟警告消息中保持不变,则可能RCU的软处理程序不再能够在该CPU上执行。这可能发生在暂停的CPU正在旋转并且禁用中断的情况下,或者在-rt内核中,如果高优先级进程正在饿死RCU的软处理程序。
  • “fps=”显示了自该CPU上次注意到宽限期开始以来,线程在该CPU上进行的宽限期的强制静止状态空闲/脱机检测通过的次数
  • detected by”这行表示哪个CPU检测到了这个失速(在本例中是CPU 32),从宽限期开始(在本例中是2603),宽限期序列号(7075),以及在所有CPU上排队的RCU回调总数的估计(在本例中是625)。
在 Android 中,RCU(Read-Copy Update)是一种用于实现读写并发的同步机制。它主要用于多核系统中,通过提供一种读者之间不需要互斥访问共享数据的机制,从而提高并发性能。 RCU 在 Android 中的实现基于 Linux 内核的 RCU 机制,并在其上进行了一些优化和扩展。Android 中的 RCU 主要用于对共享数据进行读操作,以提高性能并减少锁竞争。 Android 中的 RCU 机制包括以下几个重要组件和概念: 1. `rcu_read_lock()` 和 `rcu_read_unlock()`:这是 RCU 读取锁的接口函数。通过在读取共享数据之前调用 `rcu_read_lock()`,并在读取完成后调用 `rcu_read_unlock()`,可以告知系统当前线程正在进行 RCU 读取操作。 2. `rcu_dereference()`:这是一个宏,用于访问 RCU 保护的共享数据。它确保在访问共享数据期间,不会发生数据被修改或释放的情况。通过 `rcu_dereference()` 宏进行访问,可以避免显式加锁和解锁的开销。 3. `synchronize_rcu()`:这是一个同步函数,用于等待当前正在进行的 RCU 读操作完成。当需要修改共享数据时,可以调用 `synchronize_rcu()` 来等待所有之前的 RCU 读取操作完成,以确保数据的一致性。 4. `call_rcu()`:这是一个用于延迟释放共享数据的函数。当共享数据不再需要时,可以使用 `call_rcu()` 注册一个回调函数,在所有正在进行的 RCU 读取操作完成后,异步释放共享数据。 通过使用 RCU,Android 在某些场景下可以避免锁竞争,提高并发性能,并减少对显式锁的依赖。但是,RCU 适用于特定的场景和数据访问模式,需要开发者根据具体情况进行合理使用。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

打个工而已

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值