nmi_watchdog功能测试及解析

本文介绍了NMI_Watchdog的功能和原理,涉及软lockup和硬lockup的检测机制。通过测试展示了如何触发softlockup和hardlockup,并分析了相关 Panic 日志,讨论了其在内核开发和用户态hang检测中的应用与局限性。
摘要由CSDN通过智能技术生成
  • 由 b178903294创建, 最后修改于9月 23, 2019

 

严格意义来讲nmi_watchdog  ,属于中断检测范畴,是基于非屏蔽中断NMI的检测机制,是一种内核状态监护的狗,关于其介绍可参考nmi_watchdog.txt

1

2   [NMI watchdog is available for x86 and x86-64 architectures] 

3    

4   Is your system locking up unpredictably? No keyboard activity, just 

5   a frustrating complete hard lockup? Do you want to help us debugging 

6   such lockups? If all yes then this document is definitely for you. 

7    

8   On many x86/x86-64 type hardware there is a feature that enables 

9   us to generate 'watchdog NMI interrupts'.  (NMI: Non Maskable Interrupt 

10  which get executed even if the system is otherwise locked up hard). 

11  This can be used to debug hard kernel lockups.  By executing periodic 

12  NMI interrupts, the kernel can monitor whether any CPU has locked up, 

13  and print out debugging messages if so.

14 

15  In order to use the NMI watchdog, you need to have APIC support in your

16  kernel. For SMP kernels, APIC support gets compiled in automatically. For

17  UP, enable either CONFIG_X86_UP_APIC (Processor type and features -> Local

18  APIC support on uniprocessors) or CONFIG_X86_UP_IOAPIC (Processor type and

19  features -> IO-APIC support on uniprocessors) in your kernel config.

20  CONFIG_X86_UP_APIC is for uniprocessor machines without an IO-APIC.

21  CONFIG_X86_UP_IOAPIC is for uniprocessor with an IO-APIC. [Note: certain

22  kernel debugging options, such as Kernel Stack Meter or Kernel Tracer,

23  may implicitly disable the NMI watchdog.]

24 

25  For x86-64, the needed APIC is always compiled in.

26 

27  Using local APIC (nmi_watchdog=2) needs the first performance register, so

28  you can't use it for other purposes (such as high precision performance

29  profiling.) However, at least oprofile and the perfctr driver disable the

30  local APIC NMI watchdog automatically.

31 

32  To actually enable the NMI watchdog, use the 'nmi_watchdog=N' boot

33  parameter.  Eg. the relevant lilo.conf entry:

34 

35          append="nmi_watchdog=1"

36 

37  For SMP machines and UP machines with an IO-APIC use nmi_watchdog=1.

38  For UP machines without an IO-APIC use nmi_watchdog=2, this only works

39  for some processor types.  If in doubt, boot with nmi_watchdog=1 and

40  check the NMI count in /proc/interruptsif the count is zero then

41  reboot with nmi_watchdog=2 and check the NMI count.  If it is still

42  zero then log a problem, you probably have a processor that needs to be

43  added to the nmi code.

44 

45  A 'lockup' is the following scenario: if any CPU in the system does not

46  execute the period local timer interrupt for more than 5 seconds, then

47  the NMI handler generates an oops and kills the process. This

48  'controlled crash' (and the resulting kernel messages) can be used to

49  debug the lockup. Thus whenever the lockup happens, wait 5 seconds and

50  the oops will show up automatically. If the kernel produces no messages

51  then the system has crashed so hard (eg. hardware-wise) that either it

52  cannot even accept NMI interrupts, or the crash has made the kernel

53  unable to print messages.

54 

55  Be aware that when using local APIC, the frequency of NMI interrupts

56  it generates, depends on the system load. The local APIC NMI watchdog,

57  lacking a better source, uses the "cycles unhalted" event. As you may

58  guess it doesn't tick when the CPU is in the halted state (which happens

59  when the system is idle), but if your system locks up on anything but the

60  "hlt" processor instruction, the watchdog will trigger very soon as the

61  "cycles unhalted" event will happen every clock tick. If it locks up on

62  "hlt"then you are out of luck -- the event will not happen at all and the

63  watchdog won't trigger. This is a shortcoming of the local APIC watchdog

64  -- unfortunately there is no "clock ticks" event that would work all the

65  time. The I/O APIC watchdog is driven externally and has no such shortcoming.

66  But its NMI frequency is much higher, resulting in more significant hit

67  to the overall system performance.

68 

69  On x86 nmi_watchdog is disabled by default so you have to enable it with

70  a boot time parameter.

71 

72  It's possible to disable the NMI watchdog in run-time by writing "0" to

73  /proc/sys/kernel/nmi_watchdog. Writing "1" to the same file will re-enable

74  the NMI watchdog. Notice that you still need to use <

  • 0
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
NMI_Handler是一个函数,用于处理NMI(Non-Maskable Interrupt)中断。当NMI引脚在电路上接地时,程序在启动时会触发NMI中断,并进入NMI_Handler函数。这个函数的原型如下: ```c void NMI_Handler(void) { // 中断处理代码 } ``` 在这个函数中,你可以编写处理NMI中断的代码。NMI属于内部中断,并且默认是使能的。如果你想在main函数中操作寄存器更改,是行不通的,因为当NMI引脚接地时,在进入main函数之前就会触发NMI中断。所以,如果你需要处理NMI中断,你可以在NMI_Handler函数中编写相应的代码。\[1\]\[2\]\[3\] #### 引用[.reference_title] - *1* *3* [关于单片机中的NMI_Handler(不可屏蔽中断处理器)](https://blog.csdn.net/weixin_42240669/article/details/112826556)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insert_down28v1,239^v3^insert_chatgpt"}} ] [.reference_item] - *2* [嵌入式开发中断全解(2)Hard Fault的诊断](https://blog.csdn.net/hyk687/article/details/126695965)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insert_down28v1,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值