Ftrace 基本用法
Ftrace是内核自带的调试工具,从2.6就开始支持了,可以辅助定位内核问题。
目前可支持的trace类型:
function
function_graph
blk
hwlat
irqsoff
preemptoff
preemptirqsoff
wakeup
wakeup_rt
wakeup_dl
mmiotrace
branch
nop
内核配置
-> Kernel hacking
-> Tracers
Ftrace三个CL
1.设置tracer类型
2.设置tracer参数
3.使能tracer
Ftrace实例
以 跟踪dev_attrs_show函数所需要的所有命令,并且有注释。
进入ftrace工作目录
$ cd /sys/kernel/debug/tracing
查看系统支持的trace类型:
$ cat available_tracers
hwlat blk mmiotrace function_graph wakeup_dl wakeup_rt wakeup function nop
关闭tracer:
$ echo 0 > tracing_on
测试function 实例
$ echo function > current_tracer
set_ftrace_filter 表示要跟踪的函数,这里使用 dev_attr_show函数:
$ echo dev_attr_show > set_ftrace_filter
开启tracer:
$ echo 1 > tracing_on
查看trace结果:
# cat trace
# tracer: function
#
# entries-in-buffer/entries-written: 3/3 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
thermald-961 [000] .... 345.540878: dev_attr_show <-sysfs_kf_seq_show
thermald-961 [000] .... 345.540905: dev_attr_show <-sysfs_kf_seq_show
thermald-961 [000] .... 345.540927: dev_attr_show <-sysfs_kf_seq_show
从 trace 信息我们可以获取很多重要信息:
- 进程信息,TASK-PID
- 进程运行的 CPU
- 执行函数时的系统状态,包括中断,抢占等状态信息
- 执行函数的时间辍
从上图可以看到 function trace 一个函数的方法基本就是三板斧:
- 设置 current_tracer 为 function
- 设置要 trace 的函数
- 打开 trace 开关,开始 trace
- 提取 trace 结果
测试function_graph 实例
$ cat available_tracers
hwlat blk mmiotrace function_graph wakeup_dl wakeup_rt wakeup function nop
$ echo function_graph > current_tracer
set_graph_function表示要跟踪的函数
$ echo dev_attr_show > set_graph_function
$ echo 1 > tracing_on
查看trace内容:
# cat trace
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
0) | dev_attr_show() {
0) | energy_uj_show() {
0) | get_energy_counter [intel_rapl_common]() {
0) | cpus_read_lock() {
0) | _cond_resched() {
0) 0.283 us | rcu_all_qs();
0) 0.880 us | }
0) 1.521 us | }
0) | rapl_read_data_raw [intel_rapl_common]() {
0) | rapl_msr_read_raw [intel_rapl_msr]() {
0) | rdmsrl_safe_on_cpu() {
0) | rdmsr_safe_on_cpu() {
0) 0.271 us | __init_waitqueue_head();
0) | smp_call_function_single_async() {
0) | generic_exec_single() {
0) | __rdmsr_safe_on_cpu() {
0) | complete() {
0) 0.268 us | _raw_spin_lock_irqsave();
0) | __wake_up_locked() {
0) 0.298 us | __wake_up_common();
0) 0.836 us | }
0) 0.281 us | _raw_spin_unlock_irqrestore();
0) 2.476 us | }
0) 3.610 us | }
0) 4.315 us | }
0) 4.941 us | }
0) | wait_for_completion() {
0) | _cond_resched() {
0) 0.270 us | rcu_all_qs();
0) 0.800 us | }
0) 0.271 us | _raw_spin_lock_irq();
0) 1.920 us | }
0) 8.231 us | }
0) 8.824 us | }
0) 9.413 us | }
0) + 10.670 us | }
0) 0.269 us | cpus_read_unlock();
0) + 13.757 us | }
0) + 15.976 us | }
0) + 17.029 us | }
跟踪的是 dev_attr_show
函数,但是 function_graph tracer 会跟踪函数内的调用关系和函数执行时间,可以协助我们确定代码执行流程。比如一个函数内部执行了很多函数指针,不能确定到底执行的是什么函数,可以用 function_graph tracer 跟踪一下。
trace event用法
trace event 就是利用 ftrace 框架,实现低性能损耗,对执行流无影响的一种信息输出机制。相比 printk,trace event:
- 不开启没有性能损耗
- 开启后不影响代码流程
- 不需要重新编译内核即可获取 debug 信息
系统支持的所有 trace event 都位于 /sys/kernel/debug/tracing/events
目录。
使用实例
设置trace event
打开 sched_switch event.
$ cd /sys/kernel/debug/tracing/events/sched/sched_switch
$ echo 1 > enable
查看信息:
# echo 0 > trace
# cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 221/221 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
<idle>-0 [003] d... 3408.075314: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=chrome next_pid=4737 next_prio=120
chrome-4737 [003] d... 3408.075465: sched_switch: prev_comm=chrome prev_pid=4737 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120
<idle>-0 [003] d... 3408.100181: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=chrome next_pid=4737 next_prio=120
chrome-4737 [003] d... 3408.100333: sched_switch: prev_comm=chrome prev_pid=4737 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120
<idle>-0 [003] d... 3408.125190: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=chrome next_pid=4737 next_prio=120
chrome-4737 [003] d... 3408.125339: sched_switch: prev_comm=chrome prev_pid=4737 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120
<idle>-0 [003] d... 3408.150474: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=chrome next_pid=4737 next_prio=120
chrome-4737 [003] d... 3408.150680: sched_switch: prev_comm=chrome prev_pid=4737 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120
<idle>-0 [003] d... 3408.175599: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=chrome next_pid=4737 next_prio=120
过滤信息:
# cd /sys/kernel/debug/tracing/events/sched/sched_switch# cat formatname: sched_switch...print fmt: "prev_comm=%s prev_pid=%d prev_prio=%d prev_state=%s%s ==> next_comm=%s next_pid=%d next_prio=%d", REC->prev_comm, REC->prev_pid, REC->prev_prio, (REC->prev_state & ((((0x0000 | 0x0001 | 0x0002 | 0x0004 | 0x0008 | 0x0010 | 0x0020 | 0x0040) + 1) << 1) - 1)) ? __print_flags(REC->prev_state & ((((0x0000 | 0x0001 | 0x0002 | 0x0004 | 0x0008 | 0x0010 | 0x0020 | 0x0040) + 1) << 1) - 1), "|", { 0x0001, "S" }, { 0x0002, "D" }, { 0x0004, "T" }, { 0x0008, "t" }, { 0x0010, "X" }, { 0x0020, "Z" }, { 0x0040, "P" }, { 0x0080, "I" }) : "R", REC->prev_state & (((0x0000 | 0x0001 | 0x0002 | 0x0004 | 0x0008 | 0x0010 | 0x0020 | 0x0040) + 1) << 1) ? "+" : "", REC->next_comm, REC->next_pid, REC->next_prio
从上面的 format 信息可以看出 sched_switch
打印的信息格式,基于上面提供的关键字可以实现信息过滤,比如下面的过滤命令可以只显示 chrome 进程的切换信息。
# cd /sys/kernel/debug/tracing/events/sched/sched_switch
# echo "prev_comm == 'chrome' || next_comm == 'chrome'" > filter
查看函数调用栈
trace函数的时候,设置 echo 1 > options/func_stack_trace 即可再trace结果中获取追踪函数的调用栈。
$ cd /sys/kernel/debug/tracing
$ sudo -s
# echo 0 > tracing_on
# echo function > current_tracer
# echo dev_attr_show > set_ftrace_filter
// 设置 func_stack_trace
# echo 1 > options/func_stack_trace
# echo 1 > tracing_on
# cat trace
# tracer: function
#
# entries-in-buffer/entries-written: 8/8 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
top-3008 [003] .... 621.507777: dev_attr_show <-sysfs_kf_seq_show
top-3008 [003] .... 621.507784: <stack trace>
=> dev_attr_show
=> sysfs_kf_seq_show
=> kernfs_seq_show
=> seq_read
=> kernfs_fop_read
=> __vfs_read
=> vfs_read
=> ksys_read
=> __x64_sys_read
=> do_syscall_64
=> entry_SYSCALL_64_after_hwframe
=> 0
=> 0
=> 0
=> 0
=> 0
=> 0
=> 0
=> 0