一、Kdump 的基本概念
Kdump 的概念出现在 2005 左右,是迄今为止最可靠的内核转存机制,已经被主要的 linux 厂商选用。kexec是一个快速启动机制,允许通过已运行的内核的上下文启动一个linux内核,不需要经过BIOS。实现kdump机制的关键,包括二个组成部分:一是内核空间的系统调用kexec_load,负责在生产内核启动时将捕获内核加载到指定的地址。二是用户空间的工具kexec-tools,将捕获内核的地址传递给生产内核,从而在系统崩溃的时候能够找到捕获内核的地址并运行。kdump 是一种先进的基于 kexec 的内核崩溃转储机制。当系统崩溃时,kdump 使用 kexec 启动到第二个内核。第二个内核通常叫做捕获内核,以很小内存启动以捕获转储镜像。第一个内核保留了内存的一部分给第二内核启动用。由于 kdump 利用 kexec 启动捕获内核,绕过了 BIOS,所以第一个内核的内存得以保留。这是内核崩溃转储的本质。
kdump 需要两个不同目的的内核,生产内核和捕获内核。生产内核是捕获内核服务的对像。捕获内核会在生产内核崩溃时启动起来,与相应的 ramdisk 一起组建一个微环境,用以对生产内核下的内存进行收集和转存。
二、工具安装
系统环境:
Linux ubuntu 3.13.0-24-generic #47-Ubuntu SMP Fri May 2 23:30:00 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
1. crash安装
~$ sudo apt install crash
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following package was automatically installed and is no longer required:
libdw1
Use 'apt-get autoremove' to remove it.
Suggested packages:
kexec-tools makedumpfile
The following NEW packages will be installed:
crash
0 upgraded, 1 newly installed, 0 to remove and 219 not upgraded.
Need to get 0 B/2,421 kB of archives.
After this operation, 7,623 kB of additional disk space will be used.
Selecting previously unselected package crash.
(Reading database ... 183601 files and directories currently installed.)
Preparing to unpack .../crash_7.0.3-3ubuntu4.5_amd64.deb ...
Unpacking crash (7.0.3-3ubuntu4.5) ...
Processing triggers for man-db (2.6.7.1-1ubuntu1) ...
Setting up crash (7.0.3-3ubuntu4.5) ...
2. kdump-tools安装
~$ sudo apt install kdump-tools
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following extra packages will be installed:
kexec-tools makedumpfile
The following NEW packages will be installed:
kdump-tools kexec-tools makedumpfile
0 upgraded, 3 newly installed, 0 to remove and 219 not upgraded.
Need to get 0 B/212 kB of archives.
After this operation, 756 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Preconfiguring packages ...
Selecting previously unselected package makedumpfile.
(Reading database ... 183615 files and directories currently installed.)
Preparing to unpack .../makedumpfile_1.5.5-2ubuntu1.6_amd64.deb ...
Unpacking makedumpfile (1.5.5-2ubuntu1.6) ...
Selecting previously unselected package kexec-tools.
Preparing to unpack .../kexec-tools_1%3a2.0.6-0ubuntu2.3_amd64.deb ...
Unpacking kexec-tools (1:2.0.6-0ubuntu2.3) ...
Selecting previously unselected package kdump-tools.
Preparing to unpack .../kdump-tools_1.5.5-2ubuntu1.6_all.deb ...
Unpacking kdump-tools (1.5.5-2ubuntu1.6) ...
Processing triggers for man-db (2.6.7.1-1ubuntu1) ...
Processing triggers for ureadahead (0.100.0-16) ...
ureadahead will be reprofiled on next reboot
Setting up makedumpfile (1.5.5-2ubuntu1.6) ...
Setting up kexec-tools (1:2.0.6-0ubuntu2.3) ...
Generating grub configuration file ...
Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT is set is no longer supported.
Found linux image: /boot/vmlinuz-3.13.0-24-generic
Found initrd image: /boot/initrd.img-3.13.0-24-generic
Found memtest86+ image: /boot/memtest86+.elf
Found memtest86+ image: /boot/memtest86+.bin
done
Setting up kdump-tools (1.5.5-2ubuntu1.6) ...
kdump-tools stop/waiting
3. kexec-tools安装
~$ sudo apt install kexec-tools
Reading package lists... Done
Building dependency tree
Reading state information... Done
kexec-tools is already the newest version.
kexec-tools set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 219 not upgraded.
4. makedumpfile安装
~$ sudo apt install makedumpfile
Reading package lists... Done
Building dependency tree
Reading state information... Done
makedumpfile is already the newest version.
makedumpfile set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 219 not upgraded.
5. 安装后检查grub文件/boot/grub/grub.cfg,会发生引导内核命令后多了一个参数(一共2处):crashkernel=384M-:128M
linux /boot/vmlinuz-3.13.0-24-generic root=UUID=273c313e-f524-4b65-b9e3-9412d836485b ro find_preseed=/preseed.cfg auto noprompt priority=critical locale=en_US quiet crashkernel=384M-:128M
6. 修改kdump配置文件(/etc/default/kdump-tools)
修改其中的USE_KDUMP=0为USE_KDUMP=1
7. 查看kdump相关服务是否开启
~$ sudo service --status-all
……
[ - ] kdump-tools
[ + ] kerneloops
[ ? ] kexec
[ ? ] kexec-load
……
8. 启动kdump
~$sudo /etc/init.d/kdump-tools start
kdump-tools stop/waiting
三、功能检查
完成以上步骤后,重启电脑或虚拟机。
1. 查看crashkernel内存分配的地址空间(应该在二、6步完成后才有信息)
~$ cat /proc/iomem | grep -i crash
2d000000-34ffffff : Crash kernel
2. 查看crashkernel内存分配的大小(应该在二、6步完成后才有信息)
~$ cat /sys/kernel/kexec_crash_size
134217728
3. 查看crashkernel大小(应该在二、6步完成后才有信息)
~$ sudo dmesg | grep crash
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=273c313e-f524-4b65-b9e3-9412d836485b ro find_preseed=/preseed.cfg auto noprompt priority=critical locale=en_US quiet crashkernel=384M-:128M
[ 0.000000] Reserving 128MB of memory at 720MB for crashkernel (System RAM: 3567MB)
[ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=273c313e-f524-4b65-b9e3-9412d836485b ro find_preseed=/preseed.cfg auto noprompt priority=critical locale=en_US quiet crashkernel=384M-:128M
4. 查看kdump相关服务的状态
~$ sudo kdump-config show
DUMP_MODE: kdump
USE_KDUMP: 1
KDUMP_SYSCTL: kernel.panic_on_oops=1
KDUMP_COREDIR: /var/crash
crashkernel addr: 0x2d000000
current state: ready to kdump
kexec command:
/sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=273c313e-f524-4b65-b9e3-9412d836485b ro find_preseed=/preseed.cfg auto noprompt priority=critical locale=en_US quiet irqpoll maxcpus=1 nousb" --initrd=/boot/initrd.img-3.13.0-24-generic /boot/vmlinuz-3.13.0-24-generic
5. 配置kdump
通过/etc/default/grub.d/kdump-tools.cfg文件修改crashkernel内存大小。但是/etc/default/grub.d/下并没有kdump-tools.cfg文件,倒是有kexec-tools.cfg文件。其内容如下:
~$ cat /etc/default/grub.d/kexec-tools.cfg
GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=384M-:128M"
网上有关于crashkernel大小设置的参考:
但是我这边对于8G内存的机器crashkernel使用默认的128M是不够的,参照ubuntu设置为192M可以正常使用,如果crashkernel设置过小可以每次增加128M进行尝试。