Kprobe-based Event Tracing

前言

本文是对内核文档 《Kprobe-based Event Tracing》的翻译和整理。
kprobe可以在except those with __kprobes/nokprobe_inline annotation and those marked NOKPROBE_SYMBOL的任何函数设置trace event。
使用前需要打开内核选项:CONFIG_KPROBE_EVENTS=y.

可以通过/sys/kernel/debug/tracing/kprobe_events来增加kprobe跟踪点,然后通过写入/sys/kernel/debug/tracing/events/kprobes//enabled使能

Synopsis of kprobe_events

 p[:[GRP/]EVENT] [MOD:]SYM[+offs]|MEMADDR [FETCHARGS]  : Set a probe
 r[MAXACTIVE][:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS]  : Set a return probe
 p:[GRP/]EVENT] [MOD:]SYM[+0]%return [FETCHARGS]       : Set a return probe
 -:[GRP/]EVENT                                         : Clear a probe

GRP            : Group name. If omitted, use "kprobes" for it.
EVENT          : Event name. If omitted, the event name is generated
                 based on SYM+offs or MEMADDR.
MOD            : Module name which has given SYM.
SYM[+offs]     : Symbol+offset where the probe is inserted.
SYM%return     : Return address of the symbol
MEMADDR        : Address where the probe is inserted.
MAXACTIVE      : Maximum number of instances of the specified function that
                 can be probed simultaneously, or 0 for the default value
                 as defined in Documentation/trace/kprobes.rst section 1.3.1.

FETCHARGS      : Arguments. Each probe can have up to 128 args.
 %REG          : Fetch register REG
 @ADDR         : Fetch memory at ADDR (ADDR should be in kernel)
 @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol)
 $stackN       : Fetch Nth entry of stack (N >= 0)
 $stack        : Fetch stack address.
 $argN         : Fetch the Nth function argument. (N >= 1) (\*1)
 $retval       : Fetch return value.(\*2)
 $comm         : Fetch current task comm.
 +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*3)(\*4)
 \IMM          : Store an immediate value to the argument.
 NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
 FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
                 (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types
                 (x8/x16/x32/x64), "string", "ustring" and bitfield
                 are supported.

 (\*1) only for the probe on function entry (offs == 0).
 (\*2) only for return probe.
 (\*3) this is useful for fetching a field of data structures.
 (\*4) "u" means user-space dereference. See :ref:`user_mem_access`.

Types

fetch-args支持几种类型。Kprobe追踪器将按照给定的类型访问内存。前缀’s’和’u’分别意味着这些类型是有符号和无符号的。前缀’x’意味着它是无符号的。追踪的参数以十进制(‘s’和’u’)或十六进制(‘x’)显示。在没有类型转换的情况下,‘x32’或’x64’的使用取决于架构(例如,x86-32使用x32,x86-64使用x64)。这些值类型可以是一个数组。为了记录数组数据,你可以在基本类型中加入’[N]’(其中N是一个固定的数字,小于64)。例如,'x16[4]'意味着一个有4个元素的x16(2bytes hex)数组。注意,数组可以应用于内存类型的fetchargs,你不能把它应用于寄存器/堆栈条目等(例如,' s t a c k 1 : x 8 [ 8 ] ′ 是 错 误 的 , 但 是 ′ + 8 ( stack1:x8[8]'是错误的,但是'+8( stack1:x8[8]+8(stack):x8[8]'是可以的)。字符串类型是一个特殊的类型,它从内核空间获取一个 "空尾 "字符串。这意味着如果字符串容器已经被分页出去,它将失败并存储NULL。"ustring "类型是用户空间的字符串的一个替代品。更多信息请参见用户内存访问。字符串数组类型与其他类型有些不同。对于其他基本类型,[1]等于(例如,+0(%di):x32[1]等于+0(%di):x32。) 但是string[1]不等于string。字符串类型本身代表 “char数组”,但字符串数组类型代表 “char*数组”。因此,例如,+0(%di):string[1]等于+0(+0(%di)):string。位域是另一种特殊的类型,它需要3个参数,位宽、位偏移和容器大小(通常是32)。语法是。

b@/。
符号类型(‘symbol’)是u32或u64类型的别名(取决于BITS_PER_LONG),它以 "符号+偏移量 "的方式显示给定指针。对于$comm,默认类型是 “字符串”;任何其他类型都是无效的。

User Memory Access

Per-Probe Event Filtering

如果添加了一个kprobe event,则在 tracing/events/kprobes/目录下可以看到如下:

  • enabled:
    You can enable/disable the probe by writing 1 or 0 on it.
  • format:
    This shows the format of this probe event.
  • filter:
    You can write filtering rules of this event.
  • id:
    This shows the id of this probe event.

profiling

root@VM-0-9-ubuntu:/sys/kernel/debug/tracing# cat kprobe_profile 
  myprobe                                               262235               0
  myretprobe                                            217254               0

可以通过/sys/kernel/debug/tracing/kprobe_profile来检查probe命中和未命中的次数. The first column is event name, the second is the number of probe hits, the third is the number of probe miss-hits

Kernel Boot Parameter

可以通过 "kprobe_event="参数在启动内核时添加和启用新的kprobe事件。该参数接受一个以分号分隔的kprobe事件,其格式与kprobe_events相似。不同的是,kprobe定义参数是以逗号分隔的,而不是空格。例如,在do_sys_open上添加myprobe事件,如下所示:

p:myprobe do_sys_open dfd=%ax filename=%dx flags=%cx mode=+4($stack)

对于kernel boot参数,只是将空格用逗号替代

p:myprobe,do_sys_open,dfd=%ax,filename=%dx,flags=%cx,mode=+4($stack)

使用范例

  1. 在函数的入口,增加一个新的kprobe event
echo 'p:myprobe do_sys_open dfd=%ax filename=%dx flags=%cx mode=+4($stack)' > /sys/kernel/debug/tracing/kprobe_events

This sets a kprobe on the top of do_sys_open() function with recording 1st to 4th arguments as “myprobe” event.

Note, which register/stack entry is assigned to each function argument depends on arch-specific ABI. If you unsure the ABI, please try to use probe subcommand of perf-tools (you can find it under tools/perf/). As this example shows, users can choose more familiar names for each arguments.

可以通过命令查看新加kprobe event的情况:

cat /sys/kernel/debug/tracing/events/kprobes/myprobe/format
name: myprobe
ID: 780
format:
        field:unsigned short common_type;       offset:0;       size:2; signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; signed:0;
        field:unsigned char common_preempt_count;       offset:3; size:1;signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;

        field:unsigned long __probe_ip; offset:12;      size:4; signed:0;
        field:int __probe_nargs;        offset:16;      size:4; signed:1;
        field:unsigned long dfd;        offset:20;      size:4; signed:0;
        field:unsigned long filename;   offset:24;      size:4; signed:0;
        field:unsigned long flags;      offset:28;      size:4; signed:0;
        field:unsigned long mode;       offset:32;      size:4; signed:0;


print fmt: "(%lx) dfd=%lx filename=%lx flags=%lx mode=%lx", REC->__probe_ip,
REC->dfd, REC->filename, REC->flags, REC->mode

在函数的返回点,增加一个新的kretprobe event

echo 'r:myretprobe do_sys_open $retval' >> /sys/kernel/debug/tracing/kprobe_events

This sets a kretprobe on the return point of do_sys_open() function with recording return value as “myretprobe” event.

使能新建的kprobe event

echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable
echo 1 > /sys/kernel/debug/tracing/events/kprobes/myretprobe/enable

可以通过trace节点查看新加kprobe的信息

cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
#           TASK-PID    CPU#    TIMESTAMP  FUNCTION
#              | |       |          |         |
           <...>-1447  [001] 1038282.286875: myprobe: (do_sys_open+0x0/0xd6) dfd=3 filename=7fffd1ec4440 flags=8000 mode=0
           <...>-1447  [001] 1038282.286878: myretprobe: (sys_openat+0xc/0xe <- do_sys_open) $retval=fffffffffffffffe
           <...>-1447  [001] 1038282.286885: myprobe: (do_sys_open+0x0/0xd6) dfd=ffffff9c filename=40413c flags=8000 mode=1b6
           <...>-1447  [001] 1038282.286915: myretprobe: (sys_open+0x1b/0x1d <- do_sys_open) $retval=3
           <...>-1447  [001] 1038282.286969: myprobe: (do_sys_open+0x0/0xd6) dfd=ffffff9c filename=4041c6 flags=98800 mode=10
           <...>-1447  [001] 1038282.286976: myretprobe: (sys_open+0x1b/0x1d <- do_sys_open) $retval=3

清空probe event

echo  > /sys/kernel/debug/tracing/kprobe_events

参考文档

https://www.kernel.org/doc/html/v4.17/trace/kprobetrace.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值