AMD-Vi completion-wait loop timed out

前言

内核大量打印"AMD-Vi completion-wait loop timed out",同时伴随有soft lockup或者rcu cpu stall,如下:

Dec  8 10:02:17  kernel: AMD-Vi: Completion-Wait loop timed out
Dec  8 10:02:17  kernel: AMD-Vi: Completion-Wait loop timed out
Dec  8 10:02:17  kernel: AMD-Vi: Completion-Wait loop timed out
Dec  8 10:02:17  kernel: AMD-Vi: Completion-Wait loop timed out
Dec  8 10:02:18  kernel: AMD-Vi: Completion-Wait loop timed out
Dec  8 10:02:18  kernel: AMD-Vi: Completion-Wait loop timed out
Dec  8 10:02:18  kernel: AMD-Vi: Completion-Wait loop timed out
Dec  8 10:02:18  kernel: AMD-Vi: Completion-Wait loop timed out
Dec  8 10:02:18  kernel: AMD-Vi: Completion-Wait loop timed out
Dec  8 10:02:19  kernel: AMD-Vi: Completion-Wait loop timed out
Dec  8 10:02:19  kernel: AMD-Vi: Completion-Wait loop timed out
Dec  8 10:02:19  kernel: AMD-Vi: Completion-Wait loop timed out
Dec  8 10:02:19  kernel: AMD-Vi: Completion-Wait loop timed out
Dec  8 10:02:19  kernel: AMD-Vi: Completion-Wait loop timed out
Dec  8 10:02:20  kernel: AMD-Vi: Completion-Wait loop timed out
Dec  8 10:02:20  kernel: AMD-Vi: Completion-Wait loop timed out
Dec  8 10:02:20  kernel: AMD-Vi: Completion-Wait loop timed out
Dec  8 10:02:20  kernel: AMD-Vi: Completion-Wait loop timed out
Dec  8 10:02:20  kernel: AMD-Vi: Completion-Wait loop timed out
Dec  8 10:02:21  kernel: AMD-Vi: Completion-Wait loop timed out
Dec  8 10:02:21  kernel: AMD-Vi: Completion-Wait loop timed out
Dec  8 10:02:21  kernel: AMD-Vi: Completion-Wait loop timed out
Dec  8 10:02:21  kernel: AMD-Vi: Completion-Wait loop timed out
Dec  8 10:02:21  kernel: AMD-Vi: Completion-Wait loop timed out
Dec  8 10:02:22  kernel: AMD-Vi: Completion-Wait loop timed out
Dec  8 10:02:22  kernel: AMD-Vi: Completion-Wait loop timed out
Dec  8 10:02:22  kernel: AMD-Vi: Completion-Wait loop timed out
Dec  8 10:02:22  kernel: AMD-Vi: Completion-Wait loop timed out
Dec  8 10:02:22  kernel: AMD-Vi: Completion-Wait loop timed out
Dec  8 10:02:22  kernel: watchdog: BUG: soft lockup - CPU#46 stuck for 22s! [swapper/6:0]
Dec  8 10:02:22  kernel: CPU: 46 PID: 0 Comm: swapper/46 Tainted: G             L    5.10.128 2
Dec  8 10:02:22  kernel: RIP: 0010:_raw_spin_unlock_irqrestore+0x15/0x20
Dec  8 10:02:22  kernel: Call Trace:
Dec  8 10:02:22  kernel: <IRQ>
Dec  8 10:02:22  kernel: amd_iommu_flush_iotlb_all+0x4e/0x60
Dec  8 10:02:22  kernel: iommu_dma_flush_iotlb_all+0x1d/0x20
Dec  8 10:02:22  kernel: iova_domain_flush+0x1e/0x30
Dec  8 10:02:22  kernel: fq_flush_timeout+0x39/0xb0
Dec  8 10:02:22  kernel: ? fq_ring_free+0x110/0x110
Dec  8 10:02:22  kernel: call_timer_fn+0x2e/0x100
Dec  8 10:02:22  kernel: __run_timers.part.0+0x1de/0x260
Dec  8 10:02:22  kernel: ? clockevents_program_event+0x8f/0xe0
Dec  8 10:02:22  kernel: ? tick_program_event+0x41/0x80
Dec  8 10:02:22  kernel: run_timer_softirq+0x2a/0x50
Dec  8 10:02:22  kernel: __do_softirq+0xce/0x281
Dec  8 10:02:22  kernel: asm_call_irq_on_stack+0x12/0x20
Dec  8 10:02:22  kernel: </IRQ>
Dec  8 10:02:22  kernel: do_softirq_own_stack+0x3d/0x50
Dec  8 10:02:22  kernel: irq_exit_rcu+0xc5/0x100
Dec  8 10:02:22  kernel: sysvec_apic_timer_interrupt+0x3d/0x90
Dec  8 10:02:22  kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20
Dec  8 10:02:22  kernel: RIP: 0010:native_safe_halt+0xe/0x10

勤快的小伙伴可能会迅速的google到下面的链接:

https://support.lenovo.com/us/en/solutions/tt1512-thinksystem-server-with-amd-processor-running-linux-may-hang-or-crash-with-kernel-message-amd-vi-completion-wait-loop-timed-out

其中却没有解释,为啥机器上会有soft lockup,而且还一直在一个CPU上soft lockup。

Timed out log来源

AMD iommu架构中的一条命令,参考其spec,2.4.1 COMPLETION_WAIT

The COMPLETION_WAIT command allows software to serialize itself with IOMMU command processing. The COMPLETION_WAIT command does not finish until all older commands issuedsince a prior COMPLETION_WAIT have completely executed.

其命令的中,有关于该命令是否完成的说明如下:

当命令完成时,iommu会将cmd.store_data写入cmd.store_addr中;参考代码:

5.10.128

iommu_completion_wait()
---
    data = ++iommu->cmd_sem_val;
    build_completion_wait(&cmd, iommu, data);

    ret = __iommu_queue_command_sync(iommu, &cmd, false);
    if (ret)
        goto out_unlock;

    ret = wait_on_sem(iommu, data)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值