闰秒惊魂

最新推荐文章于 2021-06-07 13:23:31 发布

flying_panda

最新推荐文章于 2021-06-07 13:23:31 发布

阅读量374

点赞数

分类专栏：翻译 java编程源码分析

java编程同时被 3 个专栏收录

5 篇文章 0 订阅

订阅专栏

源码分析

5 篇文章 0 订阅

订阅专栏

翻译

1 篇文章 0 订阅

订阅专栏

Date Fri, 2 Jan 2009 18:21:14 -0600
From Chris Adams <>

Below follows a summary of the reported crashes. I’m ignoring the
zillions of “mine didn’t crash” reports, or the “you’re a paranoid
conspiracy theorist, its random chance” reports.

I have reproduced this and got a stack trace (this is with Fedora 8 and
kernel kernel-2.6.26.6-49.fc8.x86_64):

0 ktime_get_ts (ts=0xffffffff8158bb30) at include/asm/processor.h:691

1 0xffffffff8104c09a in ktime_get () at kernel/hrtimer.c:59

2 0xffffffff8102a39a in hrtick_start_fair (rq=0xffff810009013880,

p=<value optimized out>) at kernel/sched.c:1064

3 0xffffffff8102decc in enqueue_task_fair (rq=0xffff810009013880,

p=0xffff81003fb02d40, wakeup=1) at kernel/sched_fair.c:863

4 0xffffffff81029a08 in enqueue_task (rq=0xffffffff8158bb30,

p=0xffff81003b8ac418, wakeup=-994836480) at kernel/sched.c:1550

5 0xffffffff81029a39 in activate_task (rq=0xffff810009013880,

p=0xffff81003b8ac418, wakeup=20045) at kernel/sched.c:1614

6 0xffffffff8102be38 in try_to_wake_up (p=0xffff81003fb02d40,

state=<value optimized out>, sync=0) at kernel/sched.c:2173

7 0xffffffff8102be9c in default_wake_function (curr=,

mode=998949912, sync=20045, key=0x4c4b40000) at kernel/sched.c:4366

8 0xffffffff810492ed in autoremove_wake_function (wait=0xffffffff8158bb30,

mode=998949912, sync=20045, key=0x4c4b40000) at kernel/wait.c:132

9 0xffffffff810296a2 in __wake_up_common (q=0xffffffff813d3180, mode=1,

nr_exclusive=1, sync=0, key=0x0) at kernel/sched.c:4387

10 0xffffffff8102b97b in __wake_up (q=0xffffffff813d3180, mode=1,

nr_exclusive=1, key=0x0) at kernel/sched.c:4406

11 0xffffffff8103692f in wake_up_klogd () at kernel/printk.c:1005

12 0xffffffff81036abb in release_console_sem () at kernel/printk.c:1051

13 0xffffffff81036fd1 in vprintk (fmt=,

args=<value optimized out>) at kernel/printk.c:789

14 0xffffffff81037081 in printk (

fmt=0xffffffff8158bb30 "yj$\201????\2008\001\t") at kernel/printk.c:613

15 0xffffffff8104ec16 in ntp_leap_second (timer=)

at kernel/time/ntp.c:143

16 0xffffffff8104b7a6 in run_hrtimer_pending (cpu_base=0xffff81000900f740)

at kernel/hrtimer.c:1204

17 0xffffffff8104b86a in run_hrtimer_softirq (h=)

at kernel/hrtimer.c:1355

18 0xffffffff8103b31f in __do_softirq () at kernel/softirq.c:234

19 0xffffffff8100d52c in call_softirq () at include/asm/current_64.h:10

20 0xffffffff8100ed5e in do_softirq () at arch/x86/kernel/irq_64.c:262

21 0xffffffff8103b280 in irq_exit () at kernel/softirq.c:310

22 0xffffffff8101b0fe in smp_apic_timer_interrupt (regs=)

at arch/x86/kernel/apic_64.c:514

23 0xffffffff8100cf52 in apic_timer_interrupt ()

at include/asm/current_64.h:10

24 0xffff81003b9d5a90 in ?? ()

25 0x0000000000000000 in ?? ()

基本上，从我的角度看，闰秒问题是timer中断导致的，timer持有xtime_lock.
在获取时间的时候，要去通知klogd这件事，然后它又去试图获取系统时间，获取的时候又要去获取xtimer_lock，导致死循环

I can only reproduce this if the system is busy. If the system is
otherwise idle at the timer interrupt, I guess the scheduler doesn’t try
to get the time. I can run a “find / | xargs cat > /dev/nul” in one
window and then trigger the leap second in another, and the system dies
most of the time.
I’m looking at the source for the RHEL 4 kernel 2.6.9-67.0.7.EL (which I
had crash on a system), and the scheduler is enough different that I am
not finding the path to the deadlock right off.

In any case, the quick-n-dirty fix would be to not try to printk while
holding xtime_lock (I think the NTP code is the only thing that does).
However, it would be nice to still get the leap second notification, so

some other fix would be better I guess.

Chris Adams cmadams@hiwaay.net
Systems and Network Administrator - HiWAAY Internet Services
I don’t speak for anybody but myself - that’s enough trouble.

flying_panda

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
闰秒惊魂

Date Fri, 2 Jan 2009 18:21:14 -0600 From Chris Adams <> Below follows a summary of the reported crashes. I’m ignoring the zillions of “mine didn’t crash” reports, or the “you’re a paranoid
复制链接

扫一扫

专栏目录