Documentation_lockup-watchdogs.txt

如果想评论或更新本文的内容,请直接联系原文档的维护者。

如果你使用英文交流有困难的话,也可以向中文版维护者求助。

如果本翻译更新不及时或者翻译存在问题,请联系中文版维护者。

中文版维护者: 姚家珺AriosYao   ks666dejia@163.com

中文版翻译者: 姚家珺AriosYao   ks666dejia@163.com

中文版校译者: 姚家珺AriosYao   ks666dejia@163.com

===============================================================

softlockup检测器和hardlockup检测器(又名nmi_watchdog)

===============================================================

The Linux kernel can act as a watchdog to detect both soft and hard
lockups.
Linux内核可以监督检测软、硬死锁现象。

A 'softlockup' is defined as a bug that causes the kernel to loop in
kernel mode for more than 20 seconds (see "Implementation" below for
details), without giving other tasks a chance to run. The current
stack trace is displayed upon detection and, by default, the system
will stay locked up. Alternatively, the kernel can be configured to
panic; a sysctl, "kernel.softlockup_panic", a kernel parameter,
"softlockup_panic" (see "Documentation/kernel-parameters.txt" for
details), and a compile option, "BOOTPARAM_HARDLOCKUP_PANIC", are
provided for this.

“软件死锁”被定义为一个错误,导致核环
20秒以上的内核模式(见下面的“执行”
详情),而不给其他任务运行的机会。依赖系统检测的正确栈跟踪显示
,以及默认情况下的系统
将一直处于死锁。另外,内核可被配置为
“panic”;sysctl命令,"kernel.softlockup_panic",内核参数
“softlockup_panic”(见"Documentation/kernel-parameters.txt"
细节),和编译选项,“bootparam_hardlockup_panic”为此而提供。

A 'hardlockup' is defined as a bug that causes the CPU to loop in
kernel mode for more than 10 seconds (see "Implementation" below for
details), without letting other interrupts have a chance to run.
Similarly to the softlockup case, the current stack trace is displayed
upon detection and the system will stay locked up unless the default
behavior is changed, which can be done through a compile time knob,
"BOOTPARAM_HARDLOCKUP_PANIC", and a kernel parameter, "nmi_watchdog"
(see "Documentation/kernel-parameters.txt" for details).

"硬件死锁"也被定义为一个错误,导致CPU循环
超过10秒(见“实施”下面的内核模式
细节),而不让其他中断运行的机会。
和软件死锁时一样,除非默认操作发生改变,依赖系统检测的正确栈跟踪显示
和系统将保持锁定,改变可以通过指令“BOOTPARAM_HARDLOCKUP_PANIC”,和内核参数,“nmi_watchdog”实现
(详情见"Documentation/kernel-parameters.txt")。

The panic option can be used in combination with panic_timeout (this
timeout is set through the confusingly named "kernel.panic" sysctl),
to cause the system to reboot automatically after a specified amount
of time.

Panic选项可以结合使用panic_timeout(这
个timeout变量是通过设置名为“kernel.panic”的sysctl项),
引起系统重新启动后会自动指定的时间量。

=== Implementation ===
=== 执行 ===

The soft and hard lockup detectors are built on top of the hrtimer and
perf subsystems, respectively. A direct consequence of this is that,
in principle, they should work in any architecture where these
subsystems are present.

软件和硬件死锁检测器建立在hrtimer和perf子系统的顶端,
一个直接后果是,原则上,他们应该能在任何拥有这种子系统的架构上工作
子系统都存在。

A periodic hrtimer runs to generate interrupts and kick the watchdog
task. An NMI perf event is generated every "watchdog_thresh"
(compile-time initialized to 10 and configurable through sysctl of the
same name) seconds to check for hardlockups. If any CPU in the system
does not receive any hrtimer interrupt during that time the
'hardlockup detector' (the handler for the NMI perf event) will
generate a kernel warning or call panic, depending on the
configuration.

一个定期的hrtimer运行会产生中断以及剔除watchdog任务。
NMI效能活动产生每一个“watchdog_thresh”时间变量
(编译时初始化为10个,并通过同名的sysctl项配置)检查硬件死锁。
如果系统中的所有CPU都没有接受到hrtimer中断,那么在这期间,硬件死锁检测器
(NMI效能活动的处理程序)生成一个内核警告或引起Panic,这取决于系统
的配置。

The watchdog task is a high priority kernel thread that updates a
timestamp every time it is scheduled. If that timestamp is not updated
for 2*watchdog_thresh seconds (the softlockup threshold) the
'softlockup detector' (coded inside the hrtimer callback function)
will dump useful debug information to the system log, after which it
will call panic if it was instructed to do so or resume execution of
other kernel code.

watchdog task是一个高优先级的内核线程,每次运行时就会更新时间戳,
如果该时间戳记连续在2* watchdog_thresh(软件死锁阈值),秒内没有被更新
软件死锁检测器(内部编码的的hrtimer回调函数)
将转储到系统日志中,获取有用的调试信息后,它
,将调用Pinic或继续执行其他内核代码。

The period of the hrtimer is 2*watchdog_thresh/5, which means it has
two or three chances to generate an interrupt before the hardlockup
detector kicks in.

hrtimer的期限为2 * watchdog_thresh/ 5,这意味着它有
两个或三个机会,在硬件死锁探测器开始工作之前产生一个中断。

As explained above, a kernel knob is provided that allows
administrators to configure the period of the hrtimer and the perf
event. The right value for a particular environment is a trade-off
between fast response to lockups and detection overhead.

如上所述,kernel knob提供了一个让
管理员配置hrtimer和perf
事件期限的途径。为特定的环境的一个特定的值,应该权衡
快速响应,锁定和检测开销。

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值