闰秒惊魂

Date Fri, 2 Jan 2009 18:21:14 -0600
From Chris Adams <>

Below follows a summary of the reported crashes. I’m ignoring the
zillions of “mine didn’t crash” reports, or the “you’re a paranoid
conspiracy theorist, its random chance” reports.

I have reproduced this and got a stack trace (this is with Fedora 8 and
kernel kernel-2.6.26.6-49.fc8.x86_64):

0 ktime_get_ts (ts=0xffffffff8158bb30) at include/asm/processor.h:691

1 0xffffffff8104c09a in ktime_get () at kernel/hrtimer.c:59

2 0xffffffff8102a39a in hrtick_start_fair (rq=0xffff810009013880,

p=<value optimized out>) at kernel/sched.c:1064

3 0xffffffff8102decc in enqueue_task_fair (rq=0xffff810009013880,

p=0xffff81003fb02d40, wakeup=1) at kernel/sched_fair.c:863

4 0xffffffff81029a08 in enqueue_task (rq=0xffffffff8158bb30,

p=0xffff81003b8ac418, wakeup=-994836480) at kernel/sched.c:1550

5 0xffffffff81029a39 in activate_task (rq=0xffff810009013880,

p=0xffff81003b8ac418, wakeup=20045) at kernel/sched.c:1614

6 0xffffffff8102be38 in try_to_wake_up (p=0xffff81003fb02d40,

state=<value optimized out>, sync=0) at kernel/sched.c:2173

7 0xffffffff8102be9c in default_wake_function (curr=,

mode=998949912, sync=20045, key=0x4c4b40000) at kernel/sched.c:4366

8 0xffffffff810492ed in autoremove_wake_function (wait=0xffffffff8158bb30,

mode=998949912, sync=20045, key=0x4c4b40000) at kernel/wait.c:132

9 0xffffffff810296a2 in __wake_up_common (q=0xffffffff813d3180, mode=1,

nr_exclusive=1, sync=0, key=0x0) at kernel/sched.c:4387

10 0xffffffff8102b97b in __wake_up (q=0xffffffff813d3180, mode=1,

nr_exclusive=1, key=0x0) at kernel/sched.c:4406

11 0xffffffff8103692f in wake_up_klogd () at kernel/printk.c:1005

12 0xffffffff81036abb in release_console_sem () at kernel/printk.c:1051

13 0xffffffff81036fd1 in vprintk (fmt=,

args=<value optimized out>) at kernel/printk.c:789

14 0xffffffff81037081 in printk (

fmt=0xffffffff8158bb30 "yj$\201????\2008\001\t") at kernel/printk.c:613

15 0xffffffff8104ec16 in ntp_leap_second (timer=)

at kernel/time/ntp.c:143

16 0xffffffff8104b7a6 in run_hrtimer_pending (cpu_base=0xffff81000900f740)

at kernel/hrtimer.c:1204

17 0xffffffff8104b86a in run_hrtimer_softirq (h=)

at kernel/hrtimer.c:1355

18 0xffffffff8103b31f in __do_softirq () at kernel/softirq.c:234

19 0xffffffff8100d52c in call_softirq () at include/asm/current_64.h:10

20 0xffffffff8100ed5e in do_softirq () at arch/x86/kernel/irq_64.c:262

21 0xffffffff8103b280 in irq_exit () at kernel/softirq.c:310

22 0xffffffff8101b0fe in smp_apic_timer_interrupt (regs=)

at arch/x86/kernel/apic_64.c:514

23 0xffffffff8100cf52 in apic_timer_interrupt ()

at include/asm/current_64.h:10

24 0xffff81003b9d5a90 in ?? ()

25 0x0000000000000000 in ?? ()

基本上,从我的角度看,闰秒问题是timer中断导致的,timer持有xtime_lock.
在获取时间的时候,要去通知klogd这件事,然后它又去试图获取系统时间,获取的时候又要去获取xtimer_lock,导致死循环

I can only reproduce this if the system is busy. If the system is
otherwise idle at the timer interrupt, I guess the scheduler doesn’t try
to get the time. I can run a “find / | xargs cat > /dev/nul” in one
window and then trigger the leap second in another, and the system dies
most of the time.
I’m looking at the source for the RHEL 4 kernel 2.6.9-67.0.7.EL (which I
had crash on a system), and the scheduler is enough different that I am
not finding the path to the deadlock right off.

In any case, the quick-n-dirty fix would be to not try to printk while
holding xtime_lock (I think the NTP code is the only thing that does).
However, it would be nice to still get the leap second notification, so

some other fix would be better I guess.

Chris Adams cmadams@hiwaay.net
Systems and Network Administrator - HiWAAY Internet Services
I don’t speak for anybody but myself - that’s enough trouble.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值