kernel/common/Documentation/RCU/stallwarn.txt
RCU(Read-Copy Update)
- [Read]指的是对于被RCU保护的共享数据,reader可以直接访问,不需要获得任何锁;
- [Copy Update]指的是writer修改数据前首先拷贝一个副本,然后在副本上进行修改,修改完毕后向reclaimer(垃圾回收器)注册一个回调函数(callback),在适当的时机完成真正的修改操作–把原数据的指针重新指向新的被修改的数据,–这里所说的适当的时机就是当既有的reader全都退出临界区的时候,而等待恰当时机的过程被称为grace period 。
- writer不需要和reader竞争任何锁,只在有多个writer的情况下它们之间需要某种锁进行同步作,如果写操作频繁的话RCU的性能会严重下降,所以RCU只适用于读多写少的情况
What Causes RCU CPU Stall Warnings?
o A CPU looping in an RCU read-side critical section. o A CPU looping with interrupts disabled. o A CPU looping with preemption disabled. This condition can result in RCU-sched stalls and, if ksoftirqd is in use, RCU-bh stalls. o A CPU looping with bottom halves disabled. This condition can result in RCU-sched and RCU-bh stalls. o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel without invoking schedule(). If the looping in the kernel is really expected and desirable behavior, you might need to add some calls to cond_resched(). o Booting Linux using a console connection that is too slow to keep up with the boot-time console-message rate. For example, a 115Kbaud serial console can be -way- too slow to keep up with boot-time message rates, and will frequently result in RCU CPU stall warning messages. Especially if you have added debug printk()s. o Anything that prevents RCU's grace-period kthreads from running. This can result in the "All QSes seen" console-log message. This message will include information on when the kthread last ran and how often it should be expected to run. It can also result in the "rcu_.*kthread starved for" console-log message, which will include additional debugging information. o A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might happen to preempt a low-priority task in the middle of an RCU read-side critical section. This is especially damaging if that low-priority task is not permitted to run on any other CPU, in which case the next RCU grace period can never complete, which will eventually cause the system to run out of memory and hang. While the system is in the process of running itself out of memory, you might see stall-warning messages. o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that is running at a higher priority than the RCU softirq threads. This will prevent RCU callbacks from ever being invoked, and in a CONFIG_PREEMPT_RCU kernel will further prevent RCU grace periods from ever completing. Either way, the system will eventually run out of memory and hang. In the CONFIG_PREEMPT_RCU case, you might see stall-warning messages. o A periodic interrupt whose handler takes longer than the time interval between successive pairs of interrupts. This can prevent RCU's kthreads and softirq handlers from running. Note that certain high-overhead debugging options, for example the function_graph tracer, can result in interrupt handler taking considerably longer than normal, which can in turn result in RCU CPU stall warnings. o Testing a workload on a fast system, tuning the stall-warning timeout down to just barely avoid RCU CPU stall warnings, and then running the same workload with the same stall-warning timeout on a slow system. Note that thermal throttling and on-demand governors can cause a single system to be sometimes fast and sometimes slow! o A hardware or software issue shuts off the scheduler-clock interrupt on a CPU that is not in dyntick-idle mode. This problem really has happened, and seems to be most likely to result in RCU CPU stall warnings for CONFIG_NO_HZ_COMMON=n kernels. o A bug in the RCU implementation. o A hardware failure. This is quite unlikely, but has occurred at least once in real life. A CPU failed in a running system, becoming unresponsive, but not causing an immediate crash. This resulted in a series of RCU CPU stall warnings, eventually leading the realization that the CPU had failed.
- RCU的bug通常可以通过config_rcu_trace和RCU的事件跟踪来调试。有关RCU事件跟踪的信息,请参见include/trace/events/ RCU .h。
- CONFIG_RCU_CPU_STALL_TIMEOUT :这个内核配置参数定义了RCU从宽限期开始等待到发出RCU CPU stall警告的时间。这个时间段通常是21秒。
- RCU_STALL_DELAY_DELTA:尽管lockdeep工具非常有用,但它确实增加了一些开销。因此,在CONFIG_PROVE_RCU下,RCU_STALL_DELAY_DELTA宏在给RCU CPU stall警告消息之前允许额外的5秒。(这是一个cpp宏,而不是内核配置参数。)
- RCU_STALL_RAT_DELAY:CPU失速检测器试图让有问题的CPU打印自己的警告,因为这通常会提供质量更好的堆栈跟踪。但是,如果有问题的CPU没有在RCU_STALL_RAT_DELAY指定的jiffies数中检测到自己的延迟,那么其他CPU将会抱怨。这个延迟通常设置为两个瞬间。(这是一个cpp宏,而不是内核配置参数。)
异常分析
INFO: rcu_sched detected stalls on CPUs/tasks:
2-...: (3 GPs behind) idle=06c/0/0 softirq=1453/1455 fqs=0
16-...: (0 ticks this GP) idle=81c/0/0 softirq=764/764 fqs=0
(detected by 32, t=2603 jiffies, g=7075, q=625)
- 此消息表示CPU 32检测到CPU 2和CPU 16都造成了延迟,并且延迟影响了RCU-sched。此消息之后通常会有每个CPU的堆栈转储。请注意,PREEMPT_RCU的构建可以被任务和cpu暂停,任务将由PID表示,例如“P3421”。rcu_preempt_state甚至有可能同时由cpu和任务引起,在这种情况下,有问题的cpu和任务都将在列表中被调用。
- CPU 2的“(3 GPs后)”表示该CPU在过去的3个宽限期内没有与RCU核心交互。相反,CPU 16的“(0 ticks this GP)”表示该CPU在当前停滞宽限期内没有接受任何调度时钟中断。
- 消息的“idle=”部分打印动态空闲状态。第一个“/”之前的十六进制数是动态计数器的低阶12位,如果CPU处于动态空闲模式,它将具有偶数值,否则具有奇数值。两个“/”之间的十六进制数是嵌套的值,如果在空闲循环中(如上所示),它将是一个小的非负数,否则将是一个非常大的正数。
- 消息的"softirq="部分跟踪RCU的软处理程序的数量,该暂停的CPU已经执行。“/”前面的数字是自启动以来执行的数量,该CPU最后一次注意到宽限期的开始,该宽限期可能是当前的(停止的)宽限期,也可能是更早的宽限期(对于)例如,如果CPU可能在很长一段时间内处于动态空闲模式。“/”后面的数字是自引导到当前时间为止执行的数字。如果后一个数字在重复的延迟警告消息中保持不变,则可能RCU的软处理程序不再能够在该CPU上执行。这可能发生在暂停的CPU正在旋转并且禁用中断的情况下,或者在-rt内核中,如果高优先级进程正在饿死RCU的软处理程序。
- “fps=”显示了自该CPU上次注意到宽限期开始以来,线程在该CPU上进行的宽限期的强制静止状态空闲/脱机检测通过的次数。
- “detected by”这行表示哪个CPU检测到了这个失速(在本例中是CPU 32),从宽限期开始(在本例中是2603),宽限期序列号(7075),以及在所有CPU上排队的RCU回调总数的估计(在本例中是625)。