前言
本文是对内核文档 《Kprobe-based Event Tracing》的翻译和整理。
kprobe可以在except those with __kprobes/nokprobe_inline annotation and those marked NOKPROBE_SYMBOL的任何函数设置trace event。
使用前需要打开内核选项:CONFIG_KPROBE_EVENTS=y.
可以通过/sys/kernel/debug/tracing/kprobe_events来增加kprobe跟踪点,然后通过写入/sys/kernel/debug/tracing/events/kprobes//enabled使能
Synopsis of kprobe_events
p[:[GRP/]EVENT] [MOD:]SYM[+offs]|MEMADDR [FETCHARGS] : Set a probe
r[MAXACTIVE][:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS] : Set a return probe
p:[GRP/]EVENT] [MOD:]SYM[+0]%return [FETCHARGS] : Set a return probe
-:[GRP/]EVENT : Clear a probe
GRP : Group name. If omitted, use "kprobes" for it.
EVENT : Event name. If omitted, the event name is generated
based on SYM+offs or MEMADDR.
MOD : Module name which has given SYM.
SYM[+offs] : Symbol+offset where the probe is inserted.
SYM%return : Return address of the symbol
MEMADDR : Address where the probe is inserted.
MAXACTIVE : Maximum number of instances of the specified function that
can be probed simultaneously, or 0 for the default value
as defined in Documentation/trace/kprobes.rst section 1.3.1.
FETCHARGS : Arguments. Each probe can have up to 128 args.
%REG : Fetch register REG
@ADDR : Fetch memory at ADDR (ADDR should be in kernel)
@SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol)
$stackN : Fetch Nth entry of stack (N >= 0)
$stack : Fetch stack address.
$argN : Fetch the Nth function argument. (N >= 1) (\*1)
$retval : Fetch return value.(\*2)
$comm : Fetch current task comm.
+|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*3)(\*4)
\IMM : Store an immediate value to the argument.
NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
(u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types
(x8/x16/x32/x64), "string", "ustring" and bitfield
are supported.
(\*1) only for the probe on function entry (offs == 0).
(\*2) only for return probe.
(\*3) this is useful for fetching a field of data structures.
(\*4) "u" means user-space dereference. See :ref:`user_mem_access`.
Types
fetch-args支持几种类型。Kprobe追踪器将按照给定的类型访问内存。前缀’s’和’u’分别意味着这些类型是有符号和无符号的。前缀’x’意味着它是无符号的。追踪的参数以十进制(‘s’和’u’)或十六进制(‘x’)显示。在没有类型转换的情况下,‘x32’或’x64’的使用取决于架构(例如,x86-32使用x32,x86-64使用x64)。这些值类型可以是一个数组。为了记录数组数据,你可以在基本类型中加入’[N]’(其中N是一个固定的数字,小于64)。例如,'x16[4]'意味着一个有4个元素的x16(2bytes hex)数组。注意,数组可以应用于内存类型的fetchargs,你不能把它应用于寄存器/堆栈条目等(例如,' s t a c k 1 : x 8 [ 8 ] ′ 是 错 误 的 , 但 是 ′ + 8 ( stack1:x8[8]'是错误的,但是'+8( stack1:x8[8]′是错误的,但是′+8(stack):x8[8]'是可以的)。字符串类型是一个特殊的类型,它从内核空间获取一个 "空尾 "字符串。这意味着如果字符串容器已经被分页出去,它将失败并存储NULL。"ustring "类型是用户空间的字符串的一个替代品。更多信息请参见用户内存访问。字符串数组类型与其他类型有些不同。对于其他基本类型,[1]等于(例如,+0(%di):x32[1]等于+0(%di):x32。) 但是string[1]不等于string。字符串类型本身代表 “char数组”,但字符串数组类型代表 “char*数组”。因此,例如,+0(%di):string[1]等于+0(+0(%di)):string。位域是另一种特殊的类型,它需要3个参数,位宽、位偏移和容器大小(通常是32)。语法是。
b@/。
符号类型(‘symbol’)是u32或u64类型的别名(取决于BITS_PER_LONG),它以 "符号+偏移量 "的方式显示给定指针。对于$comm,默认类型是 “字符串”;任何其他类型都是无效的。
User Memory Access
略
Per-Probe Event Filtering
如果添加了一个kprobe event,则在 tracing/events/kprobes/目录下可以看到如下:
- enabled:
You can enable/disable the probe by writing 1 or 0 on it. - format:
This shows the format of this probe event. - filter:
You can write filtering rules of this event. - id:
This shows the id of this probe event.
profiling
root@VM-0-9-ubuntu:/sys/kernel/debug/tracing# cat kprobe_profile
myprobe 262235 0
myretprobe 217254 0
可以通过/sys/kernel/debug/tracing/kprobe_profile来检查probe命中和未命中的次数. The first column is event name, the second is the number of probe hits, the third is the number of probe miss-hits
Kernel Boot Parameter
可以通过 "kprobe_event="参数在启动内核时添加和启用新的kprobe事件。该参数接受一个以分号分隔的kprobe事件,其格式与kprobe_events相似。不同的是,kprobe定义参数是以逗号分隔的,而不是空格。例如,在do_sys_open上添加myprobe事件,如下所示:
p:myprobe do_sys_open dfd=%ax filename=%dx flags=%cx mode=+4($stack)
对于kernel boot参数,只是将空格用逗号替代
p:myprobe,do_sys_open,dfd=%ax,filename=%dx,flags=%cx,mode=+4($stack)
使用范例
- 在函数的入口,增加一个新的kprobe event
echo 'p:myprobe do_sys_open dfd=%ax filename=%dx flags=%cx mode=+4($stack)' > /sys/kernel/debug/tracing/kprobe_events
This sets a kprobe on the top of do_sys_open() function with recording 1st to 4th arguments as “myprobe” event.
Note, which register/stack entry is assigned to each function argument depends on arch-specific ABI. If you unsure the ABI, please try to use probe subcommand of perf-tools (you can find it under tools/perf/). As this example shows, users can choose more familiar names for each arguments.
可以通过命令查看新加kprobe event的情况:
cat /sys/kernel/debug/tracing/events/kprobes/myprobe/format
name: myprobe
ID: 780
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1;signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:unsigned long __probe_ip; offset:12; size:4; signed:0;
field:int __probe_nargs; offset:16; size:4; signed:1;
field:unsigned long dfd; offset:20; size:4; signed:0;
field:unsigned long filename; offset:24; size:4; signed:0;
field:unsigned long flags; offset:28; size:4; signed:0;
field:unsigned long mode; offset:32; size:4; signed:0;
print fmt: "(%lx) dfd=%lx filename=%lx flags=%lx mode=%lx", REC->__probe_ip,
REC->dfd, REC->filename, REC->flags, REC->mode
在函数的返回点,增加一个新的kretprobe event
echo 'r:myretprobe do_sys_open $retval' >> /sys/kernel/debug/tracing/kprobe_events
This sets a kretprobe on the return point of do_sys_open() function with recording return value as “myretprobe” event.
使能新建的kprobe event
echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable
echo 1 > /sys/kernel/debug/tracing/events/kprobes/myretprobe/enable
可以通过trace节点查看新加kprobe的信息
cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
# TASK-PID CPU# TIMESTAMP FUNCTION
# | | | | |
<...>-1447 [001] 1038282.286875: myprobe: (do_sys_open+0x0/0xd6) dfd=3 filename=7fffd1ec4440 flags=8000 mode=0
<...>-1447 [001] 1038282.286878: myretprobe: (sys_openat+0xc/0xe <- do_sys_open) $retval=fffffffffffffffe
<...>-1447 [001] 1038282.286885: myprobe: (do_sys_open+0x0/0xd6) dfd=ffffff9c filename=40413c flags=8000 mode=1b6
<...>-1447 [001] 1038282.286915: myretprobe: (sys_open+0x1b/0x1d <- do_sys_open) $retval=3
<...>-1447 [001] 1038282.286969: myprobe: (do_sys_open+0x0/0xd6) dfd=ffffff9c filename=4041c6 flags=98800 mode=10
<...>-1447 [001] 1038282.286976: myretprobe: (sys_open+0x1b/0x1d <- do_sys_open) $retval=3
清空probe event
echo > /sys/kernel/debug/tracing/kprobe_events
参考文档
https://www.kernel.org/doc/html/v4.17/trace/kprobetrace.html