serial8250: too much work for irq4

一:问题现象

最近在值班时遇到一个问题,ECS一段时间失联了,应该是hang住了。

日志报错非常多serial8250: too much work for irq4,主要还会报rcu_stall

二:问题分析

serial8250: too much work for irq4打印的函数:

表明串口的中断数量太大了。

static irqreturn_t serial8250_interrupt(int irq, void *dev_id)
{
        do {
                l = l->next;
                if (l == i->head && pass_counter++ > PASS_LIMIT) {#512
                        /* If we hit this, we're dead. */
                        printk_ratelimited(KERN_ERR
                                "serial8250: too much work for irq%d\n", irq);
                        break;
                }
        } while (l != end);

查看对应的中断数据:

中断号是4 中断数非常多,这是开机几个小时产生的。

查看了ttsS0的中断数为352857。

而且是非常段的时间内产生的。ttyS0是想console打印日志。

到目前为止怀疑是console打印量太大,导致中断数较大,rcu_stall发现异常。这个时候并没有hang住,后面又触发了jdb2的hang。导致磁盘无法正常读取数据,这时候机器异常了。

三:结论

咨询SRE在对应时间点的操作,他们反映有virsh console到虚拟机,有sz文件,看到有大量的二进制打印,这种打印会触发大量的中断,导致对应的cpu0一直响应中断。下载文件无法写入,并且磁盘卡主。系统hang.

如下的patch把"too much work for irq4"打印去掉了。

commit 9d7c249a1ef9bf0d5696df14e6bc067004f16979
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Thu Aug 2 13:14:32 2018 +0200

    serial: 8250: drop the printk from serial8250_interrupt()
 If the UART has (legitimate) work to do and we break out of the loop,
    nothing changes: the interrupt is most likely already pending in the
    interrupt controller and we end up in the handler anyway. This printk is
    hardly helping.

    Older kernels also had a comment saying that a bad configuration might
    lead to this but I don't see how that should happen because a wrongly
    configured interrupt number would let the handler leave "early" with
    IRQ_NONE and the spurious detected will handle that (weill since 2.6.11,
    before that we had no spurious detector). In that case, we would never
    loop that often here.

    This loop looks like an optimisation in order to pull the bytes from the
    FIFO which were received while we were already here instead of waiting
    for the interrupt. This might have been a good idea while the CPUs were
    slow and FIFOs small.
    There are other serial driver in tree, like the amba-pl*, which also
    have this kind of a loop but without the printk (and were based on this
    driver).

    Remove the printk which might trigger in otherwise valid situtations.

    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

diff --git a/drivers/tty/serial/8250/8250_core.c b/drivers/tty/serial/8250/8250_core.c
index 8fe3d0ed229e..94f3e1c64490 100644
--- a/drivers/tty/serial/8250/8250_core.c
+++ b/drivers/tty/serial/8250/8250_core.c
@@ -130,12 +130,8 @@ static irqreturn_t serial8250_interrupt(int irq, void *dev_id)

                l = l->next;

-               if (l == i->head && pass_counter++ > PASS_LIMIT) {
-                       /* If we hit this, we're dead. */
-                       printk_ratelimited(KERN_ERR
-                               "serial8250: too much work for irq%d\n", irq);
+               if (l == i->head && pass_counter++ > PASS_LIMIT)
                        break;
-               }
        } while (l != end);

        spin_unlock(&i->lock);
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值