How to fix hung_task_timeout_secs and blocked for more than 120 seconds problem

Author:Skate
Time:2015/03/04

 

How to fix hung_task_timeout_secs and blocked for more than 120 seconds problem

 

现象:系统hang住,可以ping通,但ssh无响应

查看message log
[1379100.801689] [<ffffffff81536f95>] page_fault+0x25/0x30
[1379100.801693] INFO: task java:710923 blocked for more than 120 seconds.
[1379100.801766] Not tainted 2.6.32-042stab104.1 #1
[1379100.801835] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1379100.801963] java D ffff8800372d7200 0 710923 709954 67084186 0x00000000
[1379100.801968] ffff880e57e71cf0 0000000000000082 ffffea00021a8fc0 ffff880e57e71c68
[1379100.801972] ffffffff81155c60 ffff8800372d7200 ffffea00021a8fc0 ffff88100c409638
[1379100.801976] 00000007fa23bffc ffff880e57e71c78 ffffffff81155cd1 ffff880e57e71ca8
[1379100.801980] Call Trace:
[1379100.801984] [<ffffffff81155c60>] ? __lru_cache_add+0x40/0x90
[1379100.801988] [<ffffffff81155cd1>] ? lru_cache_add_lru+0x21/0x40
[1379100.801992] [<ffffffff81172c9c>] ? handle_pte_fault+0x65c/0x1040
[1379100.801996] [<ffffffff81536705>] rwsem_down_failed_common+0x95/0x1d0
[1379100.802000] [<ffffffff81536896>] rwsem_down_read_failed+0x26/0x30
[1379100.802004] [<ffffffff812a6a34>] call_rwsem_down_read_failed+0x14/0x30
[1379100.802008] [<ffffffff81535d94>] ? down_read+0x24/0x30
[1379100.802011] [<ffffffff8104dffe>] __do_page_fault+0x18e/0x480
[1379100.802015] [<ffffffff8106f0c8>] ? finish_task_switch+0xc8/0x120
[1379100.802019] [<ffffffff81539c2e>] do_page_fault+0x3e/0xa0
[1379100.802022] [<ffffffff81536f95>] page_fault+0x25/0x30
Show  Vitaly Medvedev added a comment - Yesterday 10:34 PM [1379100.801682] [<ffffffff81015019>] ? read_tsc+0x9/0x20 [1379100.801685] [<ffffffff81539c2e>] do_page_fault+0x3e/0xa0 [1379100.801689] [<ffffffff81536f95>] page_fault+0x25/0x30 [1379100.801693] INFO: task java:710923 blocked for more than 120 seconds. [1379100.801766] Not tainted 2.6.32-042stab104.1 #1 [1379100.801835] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1379100.801963] java D ffff8800372d7200 0 710923 709954 67084186 0x00000000 [1379100.801968] ffff880e57e71cf0 0000000000000082 ffffea00021a8fc0 ffff880e57e71c68 [1379100.801972] ffffffff81155c60 ffff8800372d7200 ffffea00021a8fc0 ffff88100c409638 [1379100.801976] 00000007fa23bffc ffff880e57e71c78 ffffffff81155cd1 ffff880e57e71ca8 [1379100.801980] Call Trace: [1379100.801984] [<ffffffff81155c60>] ? __lru_cache_add+0x40/0x90 [1379100.801988] [<ffffffff81155cd1>] ? lru_cache_add_lru+0x21/0x40 [1379100.801992] [<ffffffff81172c9c>] ? handle_pte_fault+0x65c/0x1040 [1379100.801996] [<ffffffff81536705>] rwsem_down_failed_common+0x95/0x1d0 [1379100.802000] [<ffffffff81536896>] rwsem_down_read_failed+0x26/0x30 [1379100.802004] [<ffffffff812a6a34>] call_rwsem_down_read_failed+0x14/0x30 [1379100.802008] [<ffffffff81535d94>] ? down_read+0x24/0x30 [1379100.802011] [<ffffffff8104dffe>] __do_page_fault+0x18e/0x480 [1379100.802015] [<ffffffff8106f0c8>] ? finish_task_switch+0xc8/0x120 [1379100.802019] [<ffffffff81539c2e>] do_page_fault+0x3e/0xa0 [1379100.802022] [<ffffffff81536f95>] page_fault+0x25/0x30


宿主机的load达到460左右

By default Linux uses up to 40% of the available memory for file system caching.
After this mark has been reached the file system flushes all outstanding data to
disk causing all following IOs going synchronous. For flushing out this data to
disk this there is a time limit of 120 seconds by default. In the case here the
IO subsystem is not fast enough to flush the data withing 120 seconds. As IO
subsystem responds slowly and more requests are served, System Memory gets filled
up resulting in the above error, thus serving HTTP requests.


解决方案:

1. 修改参数 vm.dirty_ratio 和 vm.dirty_backgroud_ratio 可以避免这个问题

# sysctl -w vm.dirty_ratio=10
# sysctl -w vm.dirty_background_ratio=5

立即生效:
# sysctl -p

永久修改(需要reboot生效):
# vi /etc/sysctl.conf
vm.dirty_background_ratio = 5
vm.dirty_ratio = 10

2.找到好资源的进程,然后对其优化


参考:http://www.blackmoreops.com/2014/09/22/linux-kernel-panic-issue-fix-hung_task_timeout_secs-blocked-120-seconds-problem/


-------end-------

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值