现象:MySQl 挂起,不能写入数据,或者整个系统崩溃掉
1. BUG 1 - MySQL 挂起
1.1. 症状
kernel: ------------[ cut here ]------------
kernel: WARNING: at fs/jbd2/journal.c:507 __jbd2_log_start_commit+0x72/0xf0 [jbd2]()
kernel: jbd: bad log_start_commit: 2768618130 2768618130 49048062 2768618131
kernel: Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl sunrpc ip_vs_wlc ip_vs nf_conntrack ipv6 libcrc32c ovmapi(U) xen_netfront pcspkr ext4 mbcache jbd2 xen_blkfront dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
kernel: Pid: 30273, comm: mysqld Not tainted 2.6.39-200.32.1.el6uek.x86_64 #1
kernel: Call Trace:
kernel: [] warn_slowpath_common+0x7f/0xc0
kernel: [] warn_slowpath_fmt+0x46/0x50
kernel: [] __jbd2_log_start_commit+0x72/0xf0 [jbd2]
kernel: [] jbd2_log_start_commit+0x30/0x50 [jbd2]
kernel: [] ext4_sync_file+0x162/0x2d0 [ext4]
kernel: [] vfs_fsync_range+0x7f/0xa0
kernel: [] vfs_fsync+0x1c/0x20
kernel: [] do_fsync+0x3a/0x60
kernel: [] sys_fsync+0x10/0x20
kernel: [] system_call_fastpath+0x16/0x1b
kernel: ---[ end trace 75c73bd1e52b1c73 ]---
kernel: jbd2_log_wait_commit: error: j_commit_request=-1526349166, tid=49048062
# grep each /proc/fs/jbd2/*/info
/proc/fs/jbd2/dm-0-8/info:2981802046 transaction, each up to 8192 blocks
/proc/fs/jbd2/dm-1-8/info:5620 transaction, each up to 8192 blocks
1.2. BUG 描述
ext4/jbd2: don't wait (forever) for stale tid caused by wraparound (uek3, kernel.org)
1.3. rhel/ol 5 修复版本
2.6.18-371.12.1.el5 或 el/ol 5.11(2.6.18-398.el5) (1)
[fs] jbd: don't wake kjournald unnecessarily (Denys Vlasenko) [1116027
1081785]
[fs] jbd: don't wait (forever) for stale tid caused by wraparound
(Denys Vlasenko) [1116027 1081785]
[fs] ext4: fix waiting and sending of barrier in ext4_sync_file()
(Denys Vlasenko) [1116027 1081785]
[fs] jbd2: Add function jbd2_trans_will_send_data_barrier() (Denys
Vlasenko) [1116027 1081785]
[fs] jbd2: fix sending of data flush on journal commit (Denys
Vlasenko) [1116027 1081785]
[fs] ext4, jbd2: Add barriers for file systems with ext journals
(Denys Vlasenko) [1116027 1081785]
[fs] jbd: fix fsync() tid wraparound bug (Denys Vlasenko) [1116027
1081785] | 2.6.32-358.18.1.el6 或 6.5(2.6.32-431.el6) (ol)
[fs] ext4/jbd2: dont wait (forever) for stale tid caused by wraparound
(Eric Sandeen) [963557 955807]
[fs] jbd: dont wait (forever) for stale tid caused by wraparound (Eric
Sandeen) [963557 955807]
1.4. rhel/ol 6 修复版本
2.6.32-358.18.1.el6 或 6.5(2.6.32-431.el6) (ol)
[fs] ext4/jbd2: dont wait (forever) for stale tid caused by wraparound
(Eric Sandeen) [963557 955807]
[fs] jbd: dont wait (forever) for stale tid caused by wraparound (Eric
Sandeen) [963557 955807]
1.5. ol uek 修复版本
无
1.6. ol uek 3 修复版本
3.8.13-16 (1)
2. BUG 2 - 系统崩溃
2.1. 症状
# ----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at fs/jbd2/commit.c:395
invalid opcode: 0000 [1] SMP
last sysfs file: /devices/system/cpu/cpu0/topology/physical_package_id
CPU 0
Modules linked in: autofs4 i2c_dev i2c_core hidp nfs nfs_acl rfcomm l2cap bluetooth lockd sunrpc ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 xfrm_nalgo crypto_api ext4 jbd2 crc16 dm_mirror parport_pc lp parport xennet pcspkr xenblk dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache ata_piix libata scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 986, comm: jbd2/dm-0-8 Not tainted 2.6.18-274.0.0.0.1.el5xen #1
RIP: e030:[] [] :jbd2:jbd2_journal_commit_transaction+0x50/0x1074
RSP: e02b:ffff8803fb97bd70 EFLAGS: 00010246
RAX: 0000000000000008 RBX: ffff8803fb968800 RCX: ffff8803fb97be80
RDX: 0000000000000008 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff8803fb968800 R08: ffff8803fb968898 R09: 0000000000000000
R10: ffff8803e8a6a488 R11: 0000000000000000 R12: ffff8803fb968824
R13: ffff8803fb9688c0 R14: 0000000000000000 R15: ffff8803fb968890
FS: 00002aaaaaac5330(0000) GS:ffffffff8062e000(0000) knlGS:0000000000000000
CS: e033 DS: 0000 ES: 0000
Process jbd2/dm-0-8 (pid: 986, threadinfo ffff8803fb97a000, task ffff8803ffafa080)
Stack: ffff8803fc2d1000 00000f7c00000000 0000000800000000 00000001ffffffff
ffff880300000001 00000000eb6ca2da 0000000000000000 0000000000000000
0000000000000000 0000000000000000
Call Trace:
[] _spin_lock_irqsave+0x9/0x14
[] lock_timer_base+0x1b/0x3c
[] try_to_del_timer_sync+0x7f/0x88
[] :jbd2:kjournald2+0x9a/0x1ec
[] autoremove_wake_function+0x0/0x2e
[] keventd_create_kthread+0x0/0xc4
[] :jbd2:kjournald2+0x0/0x1ec
[] keventd_create_kthread+0x0/0xc4
[] kthread+0xfe/0x132
[] child_rip+0xa/0x12
[] keventd_create_kthread+0x0/0xc4
[] kthread+0x0/0x132
[] child_rip+0x0/0x12
Code: 0f 0b 68 1b a5 1c 88 c2 8b 01 eb fe 48 83 7d 50 00 74 0c 0f
RIP [] :jbd2:jbd2_journal_commit_transaction+0x50/0x1074
RSP
<0>Kernel panic - not syncing: Fatal exception
2.2. BUG 描述
Bug 735768 hitting J_ASSERT(journal->j_running_transaction != NULL) in journal_commit_transaction (el)
2.3. rhel/ol 5 修复版本
2.4. rhel/ol 6 修复版本
2.6.32-304.el6 或 el/ol 6.4(2.6.32-358.el6) (ol, el)
[fs] jbd2: fix fsync() tid wraparound bug (Dave Wysochanski) [735768]
[fs] jbd: fix fsync() tid wraparound bug (Dave Wysochanski) [735768]
[fs] jbd, jbd2: fixed typos (Dave Wysochanski) [735768]
2.5. ol uek 修复版本
2.6.39 (1, 2, 3 没找到)
2.6. ol uek 3 修复版本
3.8.13-16 (1, 2, 3)
3. 解决方案
Oracle Linux 6.x 系统升级到 6.4+,安装 uek3 内核 。ol 5.x 建议重装为 6.5 操作系统,安装 uek3 内核。
RHEL 系统升级到 6.5+ 或 5.11+。
4. 系统升级方法
4.1. ol 6.x 升级到 6.5
# 配置 ol 6.5 的 yum repos
vi /etc/yum.repos.d/.....
yum clean all
yum update -x "Percona-Server-*"
# 注意升级完后时区变成系统默认的 EST,请手工修改为 CST
URL=ftp://aa.bb.cc.dd/.../linux/uek3
rpm -q xenstoreprovider && rpm -Uvh "$URL/xenstoreprovider-3.0-11.el6.x86_64.rpm"
rpm -ev kmod-ovmapi-uek
KV=3.8.13-55.1.5.el6uek
rpm -Uvh \
"$URL/kernel-uek-$KV.x86_64.rpm" \
"$URL/kernel-uek-devel-$KV.x86_64.rpm" \
"$URL/kernel-uek-firmware-$KV.noarch.rpm" \
"$URL/libdtrace-ctf-0.4.1-1.x86_64.rpm"
reboot
4.2. ol 5.x 升级到 6.5
备份数据
重装系统
升级 el5 老版本软件
4.3. ol 6.4 升级 uek3 内核
# 配置 6.5 yum
vi /etc/yum.repos.d/.....
yum -y update dracut
yum -y install elfutils-libs
URL=ftp://aa.bb.cc.dd/.../linux/uek3
rpm -q xenstoreprovider && rpm -Uvh "$URL/xenstoreprovider-3.0-11.el6.x86_64.rpm"
KV=3.8.13-55.1.5.el6uek
rpm -Uvh \
"$URL/kernel-uek-$KV.x86_64.rpm" \
"$URL/kernel-uek-devel-$KV.x86_64.rpm" \
"$URL/kernel-uek-firmware-$KV.noarch.rpm" \
"$URL/libdtrace-ctf-0.4.1-1.x86_64.rpm"
reboot