soft lockup - CPU#9 stuck for 105s! [vmmemctl:838]

【故障现象】

某平台生产服务器突然欠费人员通知portal无法登录,检查数据库发现其ip地址ping不通,数据库mysql的端口也telnet不通,判断数据库主机发生宕机故障,协调主机运维人员,发现后台主机黑屏,报:软锁故障,需要重启。
重启后,查看故障mysql主机日志发现有如下报错:

Mar 19 13:15:50 localhost kernel: BUG: soft lockup - **CPU#9 stuck for 105s!** [**vmmemctl:838**]
Mar 19 13:15:50 localhost kernel: BUG: soft lockup - **CPU#2 stuck for 99s!** [mysqld:754]
Mar 19 13:15:50 localhost kernel: Modules linked in: iptable_filter ip_tables ip_vs libcrc32c tcp_diag
Mar 19 13:15:50 localhost kernel: BUG: soft lockup - CPU#4 stuck for 99s! [mysqld:8773]
Mar 19 13:15:50 localhost kernel: inet_diag
Mar 19 13:15:50 localhost kernel: Modules linked in: vsock(U) iptable_filter ipt_REJECT ip_tables ip_vs ip6t_REJECT libcrc32c tcp_diag inet_diag vsock(U) ipt_REJECT ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ppdev parport_pc parport vmware_balloon sg vmci(U) nf_conntrack_ipv6 i2c_piix4 nf_defrag_ipv6 i2c_core xt_state shpchp nf_conntrack ext4 ip6table_filter jbd2 ip6_tables ipv6 mbcache ppdev sd_mod parport_pc parport vmware_balloon sg vmci(U) i2c_piix4 i2c_core shpchp ext4 jbd2 mbcache sd_mod
Mar 19 13:15:50 localhost kernel: BUG: soft lockup - CPU#8 stuck for 99s! [ps:29030]
Mar 19 13:15:50 localhost kernel: Modules linked in: iptable_filter ip_tables ip_vs libcrc32c tcp_diag inet_diag vsock(U) ipt_REJECT ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ppdev parport_pc parport vmware_balloon sg vmci(U) i2c_piix4 i2c_core shpchp ext4 jbd2 mbcache sd_mod
Mar 19 13:15:50 localhost kernel: BUG: soft lockup - CPU#12 stuck for 99s! [mrtg:28880]
Mar 19 13:15:50 localhost kernel: BUG: soft lockup - CPU#13 stuck for 100s! [mysqld:22619]
Mar 19 13:15:50 localhost kernel: iptable_filter
Mar 19 13:15:50 localhost kernel: Modules linked in: ip_tables iptable_filter ip_vs ip_tables libcrc32c tcp_diag ip_vs inet_diag libcrc32c vsock(U) ipt_REJECT tcp_diag ip6t_REJECT inet_diag nf_conntrack_ipv6 vsock(U) ipt_REJECT nf_defrag_ipv6 ip6t_REJECT nf_conntrack_ipv6 xt_state nf_conntrack nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ip6table_filter ip6_tables ipv6 ppdev parport_pc ppdev parport_pc parport vmware_balloon parport sg vmware_balloon vmci(U) sg vmci(U) i2c_piix4 i2c_core shpchp ext4 i2c_piix4 i2c_core shpchp jbd2 mbcache ext4 sd_mod jbd2 mbcache sd_mod
Mar 19 13:15:50 localhost kernel: BUG: soft lockup - CPU#5 stuck for 99s! [mysqld:25691]
Mar 19 13:15:50 localhost kernel: Modules linked in: iptable_filter ip_tables ip_vs libcrc32c tcp_diag inet_diag vsock(U) ipt_REJECT ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ppdev parport_pc
Mar 19 13:15:50 localhost kernel: BUG: soft lockup - CPU#11 stuck for 100s! [sh:29031]
Mar 19 13:15:50 localhost kernel: Modules linked in: iptable_filter ip_tables ip_vs libcrc32c tcp_diag inet_diag vsock(U) ipt_REJECT ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ppdev parport_pc parport vmware_balloon sg vmci(U) i2c_piix4 i2c_core shpchp ext4 jbd2 mbcache sd_mod parport vmware_balloon sg vmci(U) i2c_piix4 i2c_core shpchp ext4 jbd2 mbcache sd_mod
Mar 19 13:15:50 localhost kernel: Modules linked in: iptable_filter ip_tables ip_vs libcrc32c tcp_diag inet_diag vsock(U) ipt_REJECT ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ppdev parport_pc parport vmware_balloon sg vmci(U) i2c_piix4 i2c_core shpchp ext4 jbd2 mbcache sd_mod crc_t10dif sr_mod cdrom vmxnet3 mptspi mptscsih mptbase
Mar 19 13:15:50 localhost kernel: BUG: soft lockup - CPU#7 stuck for 111s! [events/7:74]
Mar 19 13:15:50 localhost kernel: crc_t10dif sr_mod cdrom vmxnet3 mptspi mptscsih mptbase scsi_transport_spi pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nf_defrag_ipv4]
Mar 19 13:15:50 localhost kernel: CPU 2 
Mar 19 13:15:50 localhost kernel: Modules linked in: iptable_filter ip_tables crc_t10dif ip_vs sr_mod libcrc32c cdrom tcp_diag vmxnet3 inet_diag mptspi vsock mptscsih(U) mptbase scsi_transport_spi pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nf_defrag_ipv4]
Mar 19 13:15:50 localhost kernel: CPU 5 
Mar 19 13:15:50 localhost kernel: Modules linked in: iptable_filter ip_tables ip_vs libcrc32c tcp_diag inet_diag vsock(U) crc_t10dif sr_mod cdrom vmxnet3 mptspi mptscsih mptbase scsi_transport_spi pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nf_defrag_ipv4]
Mar 19 13:15:50 localhost kernel: CPU 13 
Mar 19 13:15:50 localhost kernel: Modules linked in: iptable_filter ip_tables ip_vs libcrc32c tcp_diag inet_diag vsock(U) crc_t10dif sr_mod cdrom vmxnet3 mptspi mptscsih mptbase scsi_transport_spi pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [lastunloaded: nf_defrag_ipv4]
Mar 19 13:15:50 localhost kernel: CPU 13 
Mar 19 13:15:50 localhost kernel: Modules linked in: iptable_filter ip_tables ip_vs libcrc32c tcp_diag inet_diag vsock(U) crc_t10dif sr_mod cdrom vmxnet3 mptspi mptscsih mptbase scsi_transport_spi pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last
 unloaded: nf_defrag_ipv4]
Mar 19 13:15:50 localhost kernel: CPU 4 
Mar 19 13:15:50 localhost kernel: Modules linked in: iptable_filter ip_tables ip_vs libcrc32c tcp_diag inet_diag vsock(U) ipt_REJECT ip6t_REJECT ipt_REJECT nf_conntrack_ipv6 ip6t_REJECT nf_defrag_ipv6 nf_conntrack_ipv6 xt_state nf_defrag_ipv6 nf_conntrack xt_state ip6table_filter crc_t10dif crc_t10dif crc_t10dif sr_mod sr_mod cdrom sr_mod cdrom vmxnet3 mptspi cdrom vmxnet3 mptscsih mptspi mptscsih mptbase scsi_transport_spi ip6_tables ipv6 ppdev pata_acpi mptbase vmxnet3 mptspi mptscsih nf_conntrack
Mar 19 13:15:50 localhost kernel: Modules linked in: parport_pc iptable_filter parport ip6table_filter ip_tables ip6_tables ipv6 ppdev parport_pc parport vmware_balloon ip_vs vmware_balloon sg vmci(U) libcrc32c sg vmci(U) i2c_piix4 tcp_diag scsi_transport_spi pata_acpi i2c_piix4 i2c_core inet_diag mptbase scsi_transport_spi pata_acpi ata_generic shpchp i2c_core shpchp vsock(U) ext4 ata_generic ata_piix ata_generic ata_piix dm_mirror dm_region_hash ata_piix dm_mirror dm_region_hash dm_log dm_log dm_mirror dm_mod [last unloaded: nf_defrag_ipv4] ipt_REJECT jbd2 ip6t_REJECT mbcache dm_region_hash
Mar 19 13:15:50 localhost kernel: dm_modCPU 8 
Mar 19 13:15:50 localhost kernel: Modules linked in: dm_log dm_mod nf_conntrack_ipv6 sd_mod ext4 jbd2 mbcache nf_defrag_ipv6 sd_mod [last unloaded: nf_defrag_ipv4]
Mar 19 13:15:50 localhost kernel: crc_t10dif xt_state nf_conntrack ip6table_filter iptable_filter crc_t10dif [last unloaded: nf_defrag_ipv4]CPU 12 
Mar 19 13:15:50 localhost kernel: Modules linked in:
Mar 19 13:15:50 localhost kernel: ip_tables ip6_tables ipv6 ppdev parport_pc parport vmware_balloon iptable_filter ip_tables ip_vs ip_vs libcrc32c tcp_diag sg vmci(U) i2c_piix4 i2c_core shpchp ext4 jbd2 sr_mod mbcache sd_mod crc_t10dif sr_mod cdrom sr_mod cdrom vmxnet3 vmxnet3 mptspi cdrom mptspi inet_diagCPU 11 
Mar 19 13:15:50 localhost kernel: 
Mar 19 13:15:50 localhost kernel: pata_acpi ata_generic
Mar 19 13:15:50 localhost kernel: Pid: 29031, comm: sh Not tainted 2.6.32-431.el6.x86_64 #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
Mar 19 13:15:50 localhost kernel: RIP: 0010:[<ffffffff8114862f>]  [<ffffffff8114862f>] unmap_vmas+0x6df/0xc50
Mar 19 13:15:50 localhost kernel: scsi_transport_spi
Mar 19 13:15:50 localhost kernel: RSP: 0018:ffff8808005a1928  EFLAGS: 00010246
Mar 19 13:15:50 localhost kernel: RAX: ffffea006ee92b78 RBX: ffff8808005a1a58 RCX: ffffea006e945898
Mar 19 13:15:50 localhost kernel: RDX: ffffea0000000000 RSI: ffff8810788cebc0 RDI: 8000001fb0559025
Mar 19 13:15:50 localhost kernel: RBP: ffffffff8100bb8e R08: ffff88099841fc78 R09: 0000000000000000
Mar 19 13:15:50 localhost kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 000000002f797d0d
Mar 19 13:15:50 localhost kernel: ata_piix ipt_REJECT pata_acpi ip6t_REJECT dm_mirror dm_region_hash dm_log dm_mod nf_conntrack_ipv6 ata_generic ata_piix dm_mirror nf_defrag_ipv6
Mar 19 13:15:50 localhost kernel: R13: 000000005e72ff7c R14: 000000002f797d0d R15: 000000005e72ff7c
Mar 19 13:15:50 localhost kernel: dm_region_hash dm_log dm_mod [last unloaded: nf_defrag_ipv4]
Mar 19 13:15:50 localhost kernel: dm_mod [last unloaded: nf_defrag_ipv4]
Mar 19 13:15:50 localhost kernel: FS:  0000000000000000(0000) GS:ffff8810788c0000(0000) knlGS:0000000000000000
Mar 19 13:15:50 localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Mar 19 13:15:50 localhost kernel: CR2: 00007fdf9e5bd48e CR3: 000000128d3c3000 CR4: 00000000000407e0
Mar 19 13:15:50 localhost kernel: CPU 9 
Mar 19 13:15:50 localhost kernel: Modules linked in: xt_state
Mar 19 13:15:50 localhost kernel: Pid: 28880, comm: mrtg Not tainted 2.6.32-431.el6.x86_64 #1 VMware, Inc. VMware Virtual Platform
Mar 19 13:15:50 localhost kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 19 13:15:50 localhost kernel: nf_conntrack
Mar 19 13:15:50 localhost kernel: Pid: 29030, comm: ps Not tainted 2.6.32-431.el6.x86_64 #1 iptable_filter/440BX Desktop Reference Platform
Mar 19 13:15:50 localhost kernel: ip6table_filter ip_tables ip6_tables VMware, Inc. VMware Virtual Platform ip_vs/440BX Desktop Reference Platform
Mar 19 13:15:50 localhost kernel: RIP: 0033:[<00007f15c85d5683>]  ipv6
Mar 19 13:15:50 localhost kernel: RIP: 0010:[<ffffffff8122e4e0>]  libcrc32c ppdev [<00007f15c85d5683>] 0x7f15c85d5683
Mar 19 13:15:50 localhost kernel: RSP: 002b:00007ffff0f22fe0  EFLAGS: 00000206
Mar 19 13:15:51 localhost kernel: <d> ffff880abfb64100 ipt_REJECT ip6t_REJECT nf_conntrack_ipv6 [last unloaded: nf_defrag_ipv4] dm_log i
pt_REJECT
Mar 19 13:15:51 localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 19 13:15:51 localhost kernel: CR2: 0000000001d3c5a8 CR3: 0000001f2f4d7000 CR4: 00000000000407e0
Mar 19 13:15:51 localhost kernel: ffffffff00000000 ffff8808005a1a78 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6
 ppdev crc_t10dif
Mar 19 13:15:51 localhost kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
……
Mar 19 13:15:53 localhost kernel: Call Trace:
Mar 19 13:15:53 localhost kernel: mbcache tcp_diag
Mar 19 13:15:53 localhost kernel: Pid: 8773, comm: mysqld Not tainted 2.6.32-431.el6.x86_64 #1
Mar 19 13:15:53 localhost kernel: Pid: 754, comm: mysqld Not tainted 2.6.32-431.el6.x86_64 #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
Mar 19 13:15:53 localhost kernel: sd_mod crc_t10dif sr_mod cdrom vmxnet3 mptspi mptscsih mptbase scsi_transport_spi pata_acpi ata_generic [<ffffffff81185b8d>] ? filp_close+0x5d/0x90
Mar 19 13:15:53 localhost kernel: ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nf_defrag_ipv4]
……Pid: 29035, comm: keepalived Not tainted 2.6.32-431.el6.x86_64 #1 VMware, Inc. VMware Virtual Platform
/440BX Desktop Reference Platform
Mar 19 13:15:54 localhost kernel: Call Trace:
Mar 19 13:15:54 localhost kernel: RIP: 0033:[<00000000010db503>]  ffff8800283d9548 ffff881029af7ee0 [<00000000010db503>] 0x10db503
Mar 19 13:15:54 localhost kernel: RSP: 002b:00007f68d25af000  EFLAGS: 00000206
Mar 19 13:15:54 localhost kernel: RAX: 00007f661cf22d8e RBX: 00007f68d25af030 RCX: 0000000000000004
Mar 19 13:15:54 localhost kernel: RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f661cf22d8c
Mar 19 13:15:54 localhost kernel: RBP: ffffffff8100bb8e R08: 0000000000000017 R09: 00007f66546881d8
Mar 19 13:15:54 localhost kernel: R10: 00007f7bf3f1050f R11: 0000000000000100 R12: 00007f68d25af090
Mar 19 13:15:54 localhost kernel: Pid: 29035, comm: keepalived Not tainted 2.6.32-431.el6.x86_64 #1 VMware, Inc. VMware Virtual Platform
/440BX Desktop Reference Platform
Mar 19 13:15:54 localhost kernel: RIP: 0010:[<ffffffff8112d46a>]  [<ffffffff8112d46a>] get_page_from_freelist+0x2da/0x870
Mar 19 13:15:54 localhost kernel: RSP: 0018:ffff8814ff289b40  EFLAGS: 00000246
Mar 19 13:15:54 localhost kernel: RAX: 0000000000000064 RBX: ffff8814ff289c60 RCX: 0000000000000013
Mar 19 13:15:54 localhost kernel: RDX: ffff881029904a40 RSI: 000000000000001b RDI: 0000000000000246
Mar 19 13:15:54 localhost kernel: RBP: ffffffff8100bb8e R08: 0000000000000064 R09: 000000000005733e
Mar 19 13:15:54 localhost kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff00000001
Mar 19 13:15:54 localhost kernel: R13: 0000000000000058 R14: ffffea002d4a6278 R15: ffffffff8112ba93
Mar 19 13:15:54 localhost kernel: ffffffff81094d20
Mar 19 13:15:54 localhost kernel: FS:  00007f13f656e7c0(0000) GS:ffff880028240000(0000) knlGS:0000000000000000
Mar 19 13:15:54 localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Mar 19 13:15:54 localhost kernel: 
Mar 19 13:15:54 localhost kernel: Call Trace:
Mar 19 13:15:54 localhost kernel: R13: 00007f65de084e18 R14: 0000000000000031 R15: 00007f661cf201d5
Mar 19 13:15:54 localhost kernel: FS:  00007f68d25b2700(0000) GS:ffff880028340000(0000) knlGS:0000000000000000
Mar 19 13:15:54 localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 19 13:15:54 localhost kernel: CR2: 0000000000448000 CR3: 0000001e1f1bd000 CR4: 00000000000407e0
Mar 19 13:15:54 localhost kernel: [<ffffffff8109b5ce>] ? prepare_to_wait+0x4e/0x80
Mar 19 13:15:54 localhost kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 19 13:15:54 localhost kernel: [<ffffffff81171950>] ? cache_reap+0x0/0x250
Mar 19 13:15:54 localhost kernel: 6a 
Mar 19 13:15:54 localhost kernel: [<ffffffff81142069>] ? refresh_cpu_vm_stats+0x159/0x180
Mar 19 13:15:54 localhost kernel: 3b 46 
Mar 19 13:15:54 localhost kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 19 13:15:54 localhost kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Mar 19 13:15:54 localhost kernel: Process keepalived (pid: 29035, threadinfo ffff8814ff288000, task ffff881831190aa0)
Mar 19 13:15:54 localhost kernel: Stack:
Mar 19 13:15:54 localhost kernel: 00000001ffafffac [<ffffffff81094d20>] ? worker_thread+0x170/0x2a0
Mar 19 13:15:54 localhost kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Mar 19 13:15:54 localhost kernel: [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
Mar 19 13:15:54 localhost kernel: Process mysqld (pid: 25691, threadinfo ffff880362442000, task ffff881672a2b500)
Mar 19 13:15:54 localhost kernel: 
Mar 19 13:15:54 localhost kernel: Call Trace:
Mar 19 13:15:54 localhost kernel: ffff881c00000000 ffff882026fb1e50
Mar 19 13:15:54 localhost kernel: [<ffffffff81094bb0>] ? worker_thread+0x0/0x2a0
Mar 19 13:15:54 localhost kernel: ffff8814ff289b88
Mar 19 13:15:54 localhost kernel: [<ffffffff8109aef6>] ? kthread+0x96/0xa0
Mar 19 13:15:54 localhost kernel: 
Mar 19 13:15:54 localhost kernel: <d> ffff8814ff289e08
Mar 19 13:15:54 localhost kernel: [<ffffffff8100c20a>] ? child_rip+0xa/0x20
Mar 19 13:15:54 localhost kernel: ffff88062498f025
Mar 19 13:15:54 localhost kernel: [<ffffffff8109ae60>] ? kthread+0x0/0xa0
Mar 19 13:15:54 localhost kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Mar 19 13:15:54 localhost kernel: Code: 04 41 89 06 7d 64 4c 89 ef 57 9d <0f> 1f 44 00 00 48 8b 5d d8 4c  0000000000000000
Mar 19 13:15:54 localhost kernel: <d> 00000040fffffffe35 01 00 00 c7 43 60 00 00 
Mar 19 13:15:54 localhost kernel: [<ffffffff8109b5ce>] ? prepare_to_wait+0x4e/0x80
Mar 19 13:15:54 localhost kernel: 00 00 e8 
Mar 19 13:15:54 localhost kernel: [<ffffffff81142090>] ? vmstat_update+0x0/0x40
Mar 19 13:15:54 localhost kernel: 8b 65 e0 4c 8b 6d  0000000000000000e8 4c 8b 75 f0 
Mar 19 13:15:54 localhost kernel: Call Trace:
Mar 19 13:15:54 localhost kernel: [<ffffffff8100be2e>] ? reschedule_interrupt+0xe/0x20
Mar 19 13:15:54 localhost kernel: [<ffffffff8112ff2f>] ? free_hot_page+0x2f/0x60
Mar 19 13:15:54 localhost kernel: [<ffffffff8112ffc0>] ? __free_pages+0x60/0xa0
Mar 19 13:15:54 localhost kernel: [<ffffffffa0023189>] ? vmballoon_pop+0x59/0x90 [vmware_balloon]
Mar 19 13:15:54 localhost kernel: [<ffffffffa00232f0>] ? vmballoon_work+0x0/0x7d8 [vmware_balloon]
Mar 19 13:15:54 localhost kernel: [<ffffffffa0023388>] ? vmballoon_work+0x98/0x7d8 [vmware_balloon]
Mar 19 13:15:54 localhost kernel: [<ffffffff8109b5ce>] ? prepare_to_wait+0x4e/0x80
Mar 19 13:15:54 localhost kernel: [<ffffffffa00232f0>] ? vmballoon_work+0x0/0x7d8 [vmware_balloon]
Mar 19 13:15:54 localhost kernel: [<ffffffff81094d20>] ? worker_thread+0x170/0x2a0
Mar 19 13:15:54 localhost kernel: a9  [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
Mar 19 13:15:54 localhost kernel: [<ffffffff81094bb0>] ? worker_thread+0x0/0x2a0
Mar 19 13:15:54 localhost kernel: ffff88000016234868  0000000227d08a803b 00 49 8b 84 24 58 80 00 00 48 3d d0 da fc 81 4c 8d a0 a8 7f 
Mar 19 13:15:54 localhost kernel: Call Trace:
Mar 19 13:15:54 localhost kernel: ff ff 
Mar 19 13:15:54 localhost kernel: 0f  [<ffffffff8111fa47>] ? unlock_page+0x27/0x30
Mar 19 13:15:54 localhost kernel: 84  [<ffffffff811420a6>] ? vmstat_update+0x16/0x40
Mar 19 13:15:54 localhost kernel: a8  [<ffffffff8109aef6>] ? kthread+0x96/0xa0
Mar 19 13:15:54 localhost kernel: [<ffffffff8100c20a>] ? child_rip+0xa/0x20
Mar 19 13:15:54 localhost kernel: [<ffffffff8109ae60>] ? kthread+0x0/0xa0
Mar 19 13:15:54 localhost kernel: 00 83 6d 80 01 
Mar 19 13:15:54 localhost kernel: [<ffffffff81094d20>] ? worker_thread+0x170/0x2a0
Mar 19 13:15:54 localhost kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Mar 19 13:15:54 localhost kernel: 86 00 00 00 <4b> 8b 5c fc 08 65 
Mar 19 13:15:54 localhost kernel: Call Trace:
Mar 19 13:15:54 localhost kernel: 48 8b 04 
Mar 19 13:15:54 localhost kernel: [<ffffffff81094bb0>] ? worker_thread+0x0/0x2a0
Mar 19 13:15:54 localhost kernel: [<ffffffff8114856f>] ? unmap_vmas+0x61f/0xc50
Mar 19 13:15:54 localhost kernel: 25 d0 e0 00 00 4a 8b 14 30 48 8b 43 
Mar 19 13:15:54 localhost kernel: Call Trace:
Mar 19 13:15:54 localhost kernel: [<ffffffff8109aef6>] ? kthread+0x96/0xa0
Mar 19 13:15:54 localhost kernel: [<ffffffff8109b5ce>] ? prepare_to_wait+0x4e/0x80
Mar 19 13:15:54 localhost kernel: [<ffffffff81171950>] ? cache_reap+0x0/0x250
Mar 19 13:15:54 localhost kernel: [<ffffffff8114a499>] ? __do_fault+0x469/0x530
Mar 19 13:15:54 localhost kernel: [<ffffffff81094d20>] ? worker_thread+0x170/0x2a0
Mar 19 13:15:54 localhost kernel: [<ffffffff8112f3a3>] ? __alloc_pages_nodemask+0x113/0x8d0
Mar 19 13:15:54 localhost kernel: [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
Mar 19 13:15:54 localhost kernel: [<ffffffff8114e477>] ? exit_mmap+0x87/0x170
Mar 19 13:15:54 localhost kernel: [<ffffffff8100c20a>] ? child_rip+0xa/0x20
Mar 19 13:15:54 localhost kernel: [<ffffffff8109ae60>] ? kthread+0x0/0xa0
Mar 19 13:15:54 localhost kernel: [<ffffffff8106f22c>] ? mmput+0x6c/0x120
Mar 19 13:15:54 localhost kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Mar 19 13:15:54 localhost kernel: Code: 00 00 49 89 c4  [<ffffffff81190aa4>] ? flush_old_exec+0x484/0x690
Mar 19 13:15:54 localhost kernel: fa 66 0f 1f 44 00 00 44 8b 2e 44 39 6e 08 48 89 f2 44 0f 4e 6e 08 44 89 ee e8 2c ef ff ff 44 29  [<ffffffff811e45c0>] ? load_elf_binary+0x350/0x1ab0
Mar 19 13:15:54 localhost kernel: 2b 4c 89 e7 57 9d <0f> 1f 44 00 00 48 8b 5d e8 4c 8b 65 f0 4c 8b 6d f8 c9 c3 90 55 
Mar 19 13:15:54 localhost kernel: Call Trace:
Mar 19 13:15:54 localhost kernel: [<ffffffff81094bb0>] ? worker_thread+0x0/0x2a0
Mar 19 13:15:54 localhost kernel: [<ffffffff8114a657>] ? handle_pte_fault+0xf7/0xb00
Mar 19 13:15:54 localhost kernel: [<ffffffff8109aef6>] ? kthread+0x96/0xa0
Mar 19 13:15:54 localhost kernel: [<ffffffff8118d0c5>] ? chrdev_open+0x125/0x230
Mar 19 13:15:54 localhost kernel: [<ffffffff8100c20a>] ? child_rip+0xa/0x20
Mar 19 13:15:54 localhost kernel: [<ffffffff812317b1>] ? selinux_dentry_open+0xe1/0x140
Mar 19 13:15:54 localhost kernel: [<ffffffff8109ae60>] ? kthread+0x0/0xa0
Mar 19 13:15:54 localhost kernel: [<ffffffff81167a9a>] ? alloc_pages_current+0xaa/0x110
Mar 19 13:15:54 localhost kernel: [<ffffffff81142069>] ? refresh_cpu_vm_stats+0x159/0x180
Mar 19 13:15:54 localhost kernel: [<ffffffff8109b5ce>] ? prepare_to_wait+0x4e/0x80
Mar 19 13:15:54 localhost kernel: [<ffffffff81188dba>] ? do_sync_read+0xfa/0x140
Mar 19 13:15:54 localhost kernel: [<ffffffff81142090>] ? vmstat_update+0x0/0x40
Mar 19 13:15:54 localhost kernel: [<ffffffff811420a6>] ? vmstat_update+0x16/0x40
Mar 19 13:15:54 localhost kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Mar 19 13:15:54 localhost kernel: [<ffffffff8112cf3e>] ? __get_free_pages+0xe/0x50
Mar 19 13:15:54 localhost kernel: [<ffffffff811a3b9a>] ? dput+0x9a/0x150
Mar 19 13:15:54 localhost kernel: [<ffffffff811e186e>] ? load_misc_binary+0x9e/0x3f0
Mar 19 13:15:54 localhost kernel: [<ffffffff81094d20>] ? worker_thread+0x170/0x2a0
Mar 19 13:15:54 localhost kernel: [<ffffffff8106fa46>] ? copy_process+0x126/0x1450
Mar 19 13:15:54 localhost kernel: [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
Mar 19 13:15:54 localhost kernel: [<ffffffff81094bb0>] ? worker_thread+0x0/0x2a0
Mar 19 13:15:54 localhost kernel: [<ffffffff8109aef6>] ? kthread+0x96/0xa0
Mar 19 13:15:54 localhost kernel: [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
Mar 19 13:15:54 localhost kernel: [<ffffffff8100c20a>] ? child_rip+0xa/0x20
Mar 19 13:15:54 localhost kernel: [<ffffffff812334eb>] ? selinux_file_permission+0xfb/0x150
Mar 19 13:15:54 localhost kernel: [<ffffffff8109ae60>] ? kthread+0x0/0xa0
Mar 19 13:15:54 localhost kernel: [<ffffffff8104a98c>] ? __do_page_fault+0x1ec/0x480
Mar 19 13:15:54 localhost kernel: [<ffffffff81070e11>] ? do_fork+0xa1/0x480
Mar 19 13:15:54 localhost kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Mar 19 13:15:54 localhost kernel: [<ffffffff811920b7>] ? search_binary_handler+0x137/0x370
Mar 19 13:15:54 localhost kernel: [<ffffffff8128ed8b>] ? strncpy_from_user+0x5b/0x90
Mar 19 13:15:54 localhost kernel: [<ffffffff811910b6>] ? kernel_read+0x46/0x60
Mar 19 13:15:54 localhost kernel: [<ffffffff810894b7>] ? do_sigaction+0x197/0x1d0
Mar 19 13:15:54 localhost kernel: [<ffffffff81009598>] ? sys_clone+0x28/0x30
Mar 19 13:15:54 localhost kernel: [<ffffffff811e2a77>] ? load_script+0x267/0x2b0
Mar 19 13:15:54 localhost kernel: [<ffffffff8114b829>] ? get_user_pages+0x49/0x50
Mar 19 13:15:54 localhost kernel: [<ffffffff8100b393>] ? stub_clone+0x13/0x20
Mar 19 13:15:54 localhost kernel: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
Mar 19 13:15:54 localhost kernel: Code: 06  [<ffffffff811917ac>] ? get_arg_page+0x5c/0x100

Mar 19 13:15:50 localhost kernel: BUG: soft lockup - CPU#9 stuck for 105s! [vmmemctl:838]
Mar 19 13:15:50 localhost kernel: BUG: soft lockup - CPU#2 stuck for 99s! [mysqld:754]
Mar 19 13:15:50 localhost kernel: BUG: soft lockup - CPU**#4 stuck for 99s! [mysqld:8773]**
Mar 19 13:15:50 localhost kernel: inet_diag
Mar 19 13:15:50 localhost kernel: BUG: soft lockup - CPU#8 stuck for 99s! [ps:29030]
comm: keepalived Not tainted 2.6.32-431.el6.x86_64 #1 VMware, Inc. VMware Virtual Platform
Mar 19 13:15:59 localhost rsyslogd-2177: imuxsock lost 34 messages from pid 2643 due to rate-limiting
Mar 19 13:16:34 localhost Keepalived_vrrp[2643]: VRRP_Script(check_run) timed out
Mar 19 13:16:34 localhost Keepalived_vrrp[2643]: VRRP_Script(check_run) succeeded
Mar 19 13:16:36 localhost Keepalived_vrrp[2643]: VRRP_Script(check_run) timed out
Mar 19 13:16:36 localhost Keepalived_vrrp[2643]: Process [29331] didn’t respond to SIGTERM
在这里插入图片描述
在这里插入图片描述
Mar 19 13:39:03 localhost kernel: INFO: task mysqld:30584 blocked for more than 120 seconds.
Mar 19 13:39:03 localhost kernel: Not tainted 2.6.32-431.el6.x86_64 #1
Mar 19 13:39:03 localhost kernel: “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
Mar 19 13:39:03 localhost kernel: mysqld R running task 0 30584 30106 0x00000080

Mar 19 13:39:05 localhost kernel: “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
Mar 19 13:39:05 localhost kernel: crond D 0000000000000003 0 2871 1762 0x00000080

##【分析】
相关网络资料对此解释为:
Mar 19 13:39:03 localhost kernel: INFO: task mysqld:30584 blocked for more than 120 seconds.
Mar 19 13:39:03 localhost kernel: Not tainted 2.6.32-431.el6.x86_64 #1
Mar 19 13:39:03 localhost kernel: “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
Mar 19 13:39:03 localhost kernel: mysqld R running task 0 30584 30106 0x00000080
后台对进行的任务由于120s超时而挂起,linux会设置40%的可用内存用来做系统cache,当flush数据时这40%内存中的数据由于和IO同步问题导致超时(120s),所将40%减小到10%,避免超时。简单来说;一般情况下Linux写磁盘时会用到缓存,这个缓存大概是内存的40%,只有当这个缓存差不多用光时,系统才会将缓存中的内容同步写到磁盘中。但是操作系统对这个同步过程有一个时间限制,就是120秒。如果系统IO比较慢,在120秒内搞不定,那就会出现这个异常。这通常发生在内存很大的系统上。

This is a know bug. By default Linux uses up to 40% of the available memory for file system caching.
After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous.
For flushing out this data to disk this there is a time limit of 120 seconds by default.
In the case here the IO subsystem is not fast enough to flush the data withing 120 seconds.
This especially happens on systems with a lof of memory.

The problem is solved in later kernels and there is not “fix” from Oracle.
I fixed this by lowering the mark for flushing the cache from 40% to 10% by setting “vm.dirty_ratio=10″ in /etc/sysctl.conf.
This setting does not influence overall database performance since you hopefully use Direct IO and bypass the file system cache completely.
检查系统内核当前设置为:
#sysctl -a|grep dirty
调整内核参数:
#调整缓存占内存的比例
  sysctl -w vm.dirty_ratio=10
  sysctl -w vm.dirty_background_ratio=5
  #修改系统的IO调度策略,使用noop的方式,这是一种基于FIFO的最简 单的调度方式
  echo noop > /sys/block/sda/queue/scheduler
  
/sbin/sysctl -w kernel.hung_task_timeout_secs = 0

sysctl -p

重启后如想继续生效,需添加至内核文件:

vi /etc/sysctl.conf

vm.dirty_background_ratio = 5

vm.dirty_ratio = 10
【注意】
vm.dirty_background_ratio:这个参数指定了当文件系统缓存脏页数量达到系统内存百分之多少时(如5%)就会触发pdflush/flush/kdmflush等后台回写进程运行,将一定缓存的脏页异步地刷入外存;

vm.dirty_ratio:而这个参数则指定了当文件系统缓存脏页数量达到系统内存百分之多少时(如10%),系统不得不开始处理缓存脏页(因为此时脏页数量已经比较多,为了避免数据丢失需要将一定脏页刷入外存);在此过程中很多应用进程可能会因为系统转而处理文件IO而阻塞。
在这里插入图片描述
进程等待IO时,经常处于D状态,即TASK_UNINTERRUPTIBLE状态,处于这种状态的进程不处理信号,所以kill不掉,如果进程长期处于D状态,那么肯定不正常,
原因可能有二
1)IO路径上的硬件出问题了,比如硬盘坏了(只有少数情况会导致长期D,通常会返回错误)
2)内核自己出问题了
这种问题不好定位,而且一旦出现就通常不可恢复,kill不掉,通常只能重启恢复了。
内核针对这种开发了一种hung task的检测机制。
基本原理是:定时检测系统中处于D状态的进程,如果其处于D状态的时间超过了指定时间(默认120s,可以配置),则打印相关堆栈信息,也可以通过proc参数配置使其直接panic。

对于报错:NET: Registered protocol family 36
网上相关资料解释:由于Linux运行在VMWare虚拟化环境下,安装了vmware-tools,将vmware-tools从10.0.5.-1 升级到10.1.0 就会解决这个问题。
#vmware-toolbox-cmd -v ##检查vmtools版本
#cat /etc/protocols //查看协议Registered protocol family 36为xtp协议,

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

羌俊恩

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值