利用KVM调试内核

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/leoufung/article/details/48781083

 

虽然kvm运行的虚拟机也是host的一个进程,但是却不能像UML那样直接gdb attach到对应的进程进行调试,毕竟kvmuml完全不同,如果那样做的话,你会发现你attach的只是qemu-system-x86进程:

(gdb) bt

#0 0x00007f8dba022ed2 in select () from /lib64/libc.so.6

#1 0x00007f8dbdd2118a in ?? () from /usr/local/bin/qemu-system-x86_64

#2 0x00007f8dbdd1a798 in main () from /usr/local/bin/qemu-system-x86_64

(gdb)

要用gdb调试kvm虚拟机内核,需要借助qemu-system-x86的两个选项:

-s shorthand for -gdb tcp::1234

-S freeze CPU at startup (use 'c' to start execution)

选项-s使得可以通过gdb远程连接qemu进行调试,而-S将让kvm虚拟机停止在执行第一条内核镜像代码的地方,等待gdb连接,如果没有-S选项,那么kvm不等待:

[root@localhost kvm]# qemu-system-x86_64 -hda vdisk.img -net none -m 1024 -daemonize -cpu host -smp 2 -vnc :1 -s

 

不要使用qemu-kvm命令,否则在断点停不下来,使用qemu-system-x86_64就没有这个问题

可以通过127.0.0.1:1234:1234gdb在本机执行)或192.168.1.1:1234gdb在另外的机器执行,而kvm host机器ip192.168.1.1),假设在本host执行gdb命令:

 

[root@localhost kvm]# gdb

GNU gdb Fedora (6.8-37.el5)

Copyright (C) 2008 Free Software Foundation, Inc.

License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software: you are free to change and redistribute it.

There is NO WARRANTY, to the extent permitted by law. Type "show copying"

and "show warranty" for details.

This GDB was configured as "x86_64-redhat-linux-gnu".

(gdb) target remote :1234

Remote debugging using :1234

[New Thread 1]

Remote 'g' packet reply is too long: d85f8780ffffffff88f58680ffffffff00000000000000000000000000000000180000000000000020fb7c80ffffffff40318880ffffffff205f8780ffffffff000000000000000063c3dd712e00000072feff00000000004bb52180ffffffffb76ddbb66ddbb66d20748b80ffffffffc09c8b80ffffffff0000000000000000241c2280ffffffff4602000010000000180000001800000018000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000007f03000000000000000000000000000000000000000000000000000000000000000000000000e03f00000000000000007b14ae47e17a843f00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000a01f0000

(gdb)

如果出现上面这种情况,需要先执行:set architecture i386:x86-64:intel,我的kvm客户机是x86-64

[root@localhost ~]# uname -a

Linux localhost.localdomain 2.6.30-gentoo-r8 #55 SMP Thu May 10 20:05:44 CST 2012 x86_64 x86_64 x86_64 GNU/Linux

,使得gdb知道远程系统的架构:

(gdb) set architecture i386:x86-64:intel

The target architecture is assumed to be i386:x86-64:intel

(gdb) target remote :1234

Remote debugging using :1234

[New Thread 1]

0xffffffff80221c24 in ?? ()

(gdb)

加载对应的kvm客户机内核镜像,当然是未压缩的(务必选中内核选项[*] Compile the kernel with debug info[*] Compile the kernel with frame pointers

(gdb) file /tmp/vmlinux

A program is being debugged already.

Are you sure you want to change the file? (y or n) y

Reading symbols from /tmp/vmlinux...done.

(gdb) bt

#0 native_safe_halt () at /usr/src/linux-2.6.37.2/arch/x86/include/asm/irqflags.h:51

#1 0xffffffff80211e41 in default_idle ()

at /usr/src/linux-2.6.37.2/arch/x86/include/asm/paravirt.h:802

#2 0xffffffff8020ab67 in cpu_idle ()

at /usr/src/linux-2.6.37.2/arch/x86/kernel/process_64.c:149

#3 0xffffffff8061ab0d in rest_init () at /usr/src/linux-2.6.37.2/init/main.c:474

#4 0xffffffff808adcda in start_kernel () at /usr/src/linux-2.6.37.2/init/main.c:701

#5 0xffffffff808ad2a7 in x86_64_start_reservations (

real_mode_data=<value optimized out>)

at /usr/src/linux-2.6.37.2/arch/x86/kernel/head64.c:123

#6 0xffffffff808ad39f in x86_64_start_kernel (

real_mode_data=0x93050 <Address 0x93050 out of bounds>)

at /usr/src/linux-2.6.37.2/arch/x86/kernel/head64.c:94

#7 0x0000000000000000 in ?? ()

(gdb)

加个__schedule断点:

(gdb) c

Continuing.

^C

Program received signal SIGINT, Interrupt.

native_safe_halt () at /usr/src/linux-2.6.37.2/arch/x86/include/asm/irqflags.h:51

51    /usr/src/linux-2.6.37.2/arch/x86/include/asm/irqflags.h: No such file or directory.

    in /usr/src/linux-2.6.37.2/arch/x86/include/asm/irqflags.h

(gdb) b __schedule

Breakpoint 1 at 0xffffffff80636792: file /usr/src/linux-2.6.37.2/kernel/sched.c, line 5022.

(gdb) c

Continuing.

[New Thread 4]

[Switching to Thread 4]

 

Breakpoint 1, __schedule () at /usr/src/linux-2.6.37.2/kernel/sched.c:5022

5022    /usr/src/linux-2.6.37.2/kernel/sched.c: No such file or directory.

    in /usr/src/linux-2.6.37.2/kernel/sched.c

(gdb) bt

#0 __schedule () at /usr/src/linux-2.6.37.2/kernel/sched.c:5022

#1 0xffffffff80636f51 in schedule () at /usr/src/linux-2.6.37.2/kernel/sched.c:5084

#2 0xffffffff8020ab88 in cpu_idle ()

at /usr/src/linux-2.6.37.2/arch/x86/kernel/process_64.c:159

#3 0xffffffff80632a4e in start_secondary (unused=<value optimized out>)

at /usr/src/linux-2.6.37.2/arch/x86/kernel/smpboot.c:329

#4 0x0000000000000000 in ?? ()

(gdb)

利用命令q退出gdb时,如果导致kvm虚拟机终止,此时需先执行detach命令,后再退出gdb

(gdb) q

The program is running. Exit anyway? (y or n) n

Not confirmed.

(gdb) detach

Ending remote debugging.

(gdb) q

[root@localhost kvm]#

对于内核调试的一个十分有利帮助是串口的使用,kvm虚拟机的串口可以这样添加:

[root@localhost kvm]# qemu-system-x86_64 -hda vdisk.img -net none -m 1024 -daemonize -cpu host -smp 2 -vnc :1 -s -serial file:serial.log

这将在当前目录下生成一个serial.log的文件kvm虚拟机的串口输出将重定向到这个文件内,比如给kvm虚拟机的内核加上串口输出选项(console=ttyS0,115200)后,kvm虚拟机的内核信息将输出到这个文件:

[root@localhost kvm]# ls serial.log -lh

-rw-r----- 1 root root 21K May 11 16:56 serial.log

[root@localhost kvm]#

还可以将kvm虚拟机的串口重定向到一个tcp监听口:

[root@localhost kvm]# qemu-system-x86_64 -hda vdisk.img -net none -m 1024 -daemonize -cpu host -smp 2 -vnc :1 -s -serial tcp::1235,server

QEMU waiting for connection on: tcp:0.0.0.0:1235,server

执行qemu-system-x86_64后进行等待链接状态,在本机可以执行(当然,你需要另开一个终端):

[root@localhost ~]# telnet 127.0.0.1 1235

在另外的机器,那么可执行(前面已交代这里kvm host机器的ip192.168.1.1):

[root@localhost ~]# telnet 192.168.1.1 1235

之后,kvm虚拟机的串口输出将都打印在telnet上,并且此时可通过这个串口通道登陆kvm虚拟机。

另外,发现一个问题就是通过windows上的VNC Viewer 4远程连接到kvm虚拟机,进入grub后键盘就无响应,任何对内核选项的上下选择、编辑或启动都失效,此时无法做任何操作,只能在host机器内 kill qemu-system-x86_64。如果在升级内核,这非常不方便,但值得庆幸的是qemu-system-x86_64支持直接在外部指定内核镜像 (具体可以参考qemu-system-x86_64 –help):

[root@localhost kvm]# qemu-system-x86_64 -hda vdisk.img -net none -m 1024 -daemonize -cpu host -smp 2 -vnc :1 -kernel vmlinuz-2.6.18-194.el5 -initrd initrd-2.6.18-194.el5.img

[root@localhost kvm]#

所以,在装好最初的kvm虚拟机后立马把这两个文件备份到host机器来,这样如果后续捣鼓其它内核出了问题还能通过这种方法进入kvm虚拟机内进行修复(也许还可以利用其它工具,比如http://libguestfs.org/来进行,不过毕竟不是直接手段而比较麻烦)。

kvm虚拟机模块的调试要麻烦一点,首先需要在gdb里主动加载对应模块的符号,并且要加载到正确的位置。模块的代码位置可以kvm虚拟机使用如下命令查看:

[root@localhost ~]# cat /proc/modules

igb 84012 0 - Live 0xffffffffa0007000

dca 6468 1 igb, Live 0xffffffffa0000000

[root@localhost ~]#

只加载了两个模块,以igb模块为例,在host机内的gdb内执行add-symbol-file,其中/tmp/igb.kokvm虚拟机的 igb模块文件,拷贝到host机器内的,而0xffffffffa0007000是从上面/proc/modules文件内看到的:

(gdb) add-symbol-file /tmp/igb.ko 0xffffffffa0007000

add symbol table from file "/tmp/igb.ko" at

    .text_addr = 0xffffffffa0007000

(y or n) y

Reading symbols from /tmp/igb.ko...done.

(gdb) c

Continuing.

设置一个igb模块内的igb_clean_tx_irq函数断点试试,马上断下来了(因为我这里使用了igb ssh远程连接),看来没什么问题:

(gdb) b igb_clean_tx_irq

Breakpoint 2 at 0xffffffffa000a5a8

(gdb) c

Continuing.

[New Thread 2]

[Switching to Thread 2]

 

Breakpoint 2, 0xffffffffa000a5a8 in igb_clean_tx_irq ()

(gdb) bt

#0 0xffffffffa000a5a8 in igb_clean_tx_irq ()

#1 0xffffffffa000c19e in igb_msix_tx ()

#2 0xffffffff8027cb92 in handle_IRQ_event (irq=27, action=0xffff88003e18bf40)

at /usr/src/linux-2.6.37.2/kernel/irq/handle.c:371

#3 0xffffffff8027e9f0 in handle_edge_irq (irq=27, desc=0xffff88003e6a85c0)

at /usr/src/linux-2.6.37.2/kernel/irq/chip.c:514

#4 0xffffffff8020de43 in handle_irq (irq=27, regs=<value optimized out>)

at /usr/src/linux-2.6.37.2/include/linux/irq.h:312

#5 0xffffffff8020d6a1 in do_IRQ (regs=0xffff88003f89de18) at /usr/src/linux-2.6.37.2/arch/x86/kernel/irq.c:215

#6 0xffffffff8020c453 in common_interrupt ()

#7 0xffff88003f89de40 in ?? ()

#8 0x0000000000000000 in ?? ()

(gdb) c

Continuing.

如果不执行对应的add-symbol-file命令,那么将会这样:

(gdb) b igb_clean_tx_irq

Function "igb_clean_tx_irq" not defined.

Make breakpoint pending on future shared library load? (y or [n]) n

就算选择y,后续也不能捕获到该断点。

 

 

 

 

 

 

 

 

 

 

 

 

 

展开阅读全文

没有更多推荐了,返回首页