ftrace
ftrace 的作用是帮助开发人员了解 Linux 内核的运行时行为,以便进行故障调试或性能分析。
最早 ftrace 是一个 function tracer,仅能够记录内核的函数调用流程。如今 ftrace 已经成为一个 framework,采用 plugin 的方式支持开发人员添加更多种类的 trace 功能。
Ftrace 由 RedHat 的 Steve Rostedt 负责维护。
1. 内核编译(打开ftrace)
CONFIG_KPROBES_ON_FTRACE=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_FTRACE=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y
CONFIG_DYNAMIC_FTRACE_WITH_ARGS=y
CONFIG_FTRACE_SYSCALLS=y
CONFIG_FTRACE_MCOUNT_RECORD=y
CONFIG_FTRACE_MCOUNT_USE_CC=y
CONFIG_HAVE_SAMPLE_FTRACE_DIRECT=y
CONFIG_HAVE_SAMPLE_FTRACE_DIRECT_MULTI=y
2. 挂载debugfs
mount -t debugfs debugfs /sys/kernel/debug/
3. 查看函数调用栈(以vfs_read()函数为例)
以下命令都是在:/sys/kernel/debug/tracing/
目录下执行:
echo 1 > options/func_stack_trace
echo vfs_read > set_ftrace_filter
echo 1 > tracing_on
echo function > current_tracer
echo 0 > tracing_on
cat trace | head -n 20
结果如下:
/sys/kernel/debug/tracing # cat trace | head -n 20
# tracer: function
#
# entries-in-buffer/entries-written: 418/418 #P:1
#
# _-----=> irqs-off/BH-disabled
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / _-=> migrate-disable
# |||| / delay
# TASK-PID CPU# ||||| TIMESTAMP FUNCTION
# | | | ||||| | |
sh-78 [000] ..... 332.939292: vfs_read <-ksys_read
sh-78 [000] ..... 332.940093: <stack trace>
=> 0xffffffffc0333083
=> vfs_read
=> ksys_read
=> do_syscall_64
=> entry_SYSCALL_64_after_hwframe
sh-78 [000] ..... 332.940622: vfs_read <-ksys_read
4. 查看函数子函数调用(以vfs_read()函数为例)
以下命令都是在:/sys/kernel/debug/tracing/
目录下执行:
echo function_graph > current_tracer
echo vfs_read > set_graph_function
echo 5 > max_graph_depth
echo 1 > tracing_on
echo 0 > tracing_on
cat trace | head -n 50
结果如下:
/sys/kernel/debug/tracing # cat trace | head -n 40
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
0) + 26.397 us | mutex_unlock();
0) | vfs_read() {
0) | rw_verify_area() {
0) | security_file_permission() {
0) | selinux_file_permission() {
0) 2.639 us | __inode_security_revalidate();
0) 1.278 us | avc_policy_seqno();
0) + 11.926 us | }
0) + 15.646 us | }
0) + 19.372 us | }
0) | new_sync_read() {
0) | tty_read() {
0) 1.007 us | tty_paranoia_check();
0) | tty_ldisc_ref_wait() {
0) 1.485 us | ldsem_down_read();
0) 3.461 us | }
0) | n_tty_read() {
0) 1.854 us | mutex_lock_interruptible();
0) 1.532 us | down_read();
0) 3.663 us | add_wait_queue();
0) 2.981 us | copy_from_read_buf();
0) 5.688 us | n_tty_check_unthrottle();
0) 1.014 us | n_tty_kick_worker();
0) 1.039 us | up_read();
0) 2.701 us | remove_wait_queue();
0) 0.978 us | mutex_unlock();
0) + 36.193 us | }
0) | tty_ldisc_deref() {
0) 1.004 us | ldsem_up_read();
0) 2.937 us | }
0) 0.978 us | ktime_get_real_seconds();
0) + 56.611 us | }
0) + 60.801 us | }
0) + 93.459 us | }
0) | vfs_read() {
0) | rw_verify_area() {
tracepoint
tracepoint是预先在函数的插入点中插桩,当执行到函数的插入点,则执行插桩函数,进而触发与插入点预先绑定的probe函数,probe函数可以是一个或者多个,probe函数可以定义为任意的行为,从而可以起到对函数内部观测的作用。
1. 内核编译(打开tracepoint)
CONFIG_TRACEPOINTS=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
2. 挂载debugfs
mount -t debugfs debugfs /sys/kernel/debug/
3. 使用tracepoint(以系统调用mkdir为例)
以下命令都是在:/sys/kernel/debug/tracing/
目录下执行:
echo 1 > events/syscalls/sys_enter_mkdir/enable
echo 1 > tracing_on
cd ~
mkdir haha
cd sys/kernel/debug/tracing/
cat trace
结果如下:
/sys/kernel/debug/tracing # cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 2/2 #P:1
#
# _-----=> irqs-off/BH-disabled
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / _-=> migrate-disable
# |||| / delay
# TASK-PID CPU# ||||| TIMESTAMP FUNCTION
# | | | ||||| | |
mkdir-84 [000] ...1. 697.463523: sys_mkdir(pathname: 7ffe07d21f4d, mode: 1ff)
mkdir-85 [000] ...1. 707.071031: sys_mkdir(pathname: 7fff72d3af6c, mode: 1ff)