SystemTap使用技巧【二】

最新推荐文章于 2023-10-14 15:23:36 发布

zuxi

最新推荐文章于 2023-10-14 15:23:36 发布

阅读量5.9k

点赞数 4

分类专栏： systemtap 文章标签： systemtap

本文链接：https://blog.csdn.net/wangzuxi/article/details/42976577

版权

systemtap 专栏收录该内容

5 篇文章 14 订阅

订阅专栏

1、获取数据结构成员

比如数据结构如下：

root@jusse ~/develop# cat -n cc_stap_test.c
     1  #include <stdio.h>
     2  
     3  typedef struct str {
     4      int    len;
     5      char *data;
     6  } str_t;
     7  
     8  typedef struct policy {
     9      str_t    name;
    10      int     id;
    11  } policy_t;
    12  
    13  int main(int argc, char *argv[])
    14  {
    15      policy_t policy;
    16      policy_t *p = &policy;
    17  
    18      p->id = 111;
    19      p->name.data = "test";
    20      p->name.len = sizeof("test")-1;
    21  
    22      printf("p->id: %d, p->name.data: %s, p->name.len: %d\n", p->id, p->name.data, p->name.len);
    23  
    24      return 0;
    25  }

root@jusse ~/develop# gcc -Wall -g -o cc_stap_test ./cc_stap_test.c

如果要在第22行设置个statement探测点获取局部变量p中name的data，那怎么写stp呢，这个问题在刚学SystemTap时困扰了我两天，不过后来仔细看官方文档时才知道，最直觉的写法就像下面这个：

probe process("./cc_stap_test").statement("main@./cc_stap_test.c:22")
{
    printf("policy name: %s\n", $p->name.data);
}

$p是指针，通过->箭头可以直接访问name，这个没有问题，但name不是指针而是一个结构啊，难道不是像C语言一样用点号来访问它里面的成员吗。这是C语言的思路，可惜是不对的，SystemTap已经把点号用来做字符串连接符了，所以SystemTap把结构变量或者结构指针变量统一对待，都是通过->来访问结构里面或者结构指针所指结构里面的成员，上面例子正确写法如下：

probe process("./cc_stap_test").statement("main@./cc_stap_test.c:22")
{
    printf("policy name: %s\n", $p->name->data);
}

2、打印整个数据结构

在调试的时候，有时我们需要输出整个数据结构，看看结构里面变量的值到底是什么。比如上面的例子中我们想打印变量p中所有成员的值，应该是这样：

root@jusse ~/develop# cat cc_stap_test.stp
probe process("./cc_stap_test").statement("main@./cc_stap_test.c:22")
{
    printf("$p$: %s, $p$$: %s\n", $p$, $p$$);
}

root@jusse ~/develop# stap cc_stap_test.stp -c './cc_stap_test'        
p->id: 111, p->name.data: test, p->name.len: 4
$p$: {.name={...}, .id=111}, $p$$: {.name={.len=4, .data="test"}, .id=111}

root@jusse ~/develop#

就是在变量的后面加$或者$$，其中加$是表示获取结构中基本数据类型的字符串值，像上面$p$就只输出id的值，name就省略了；变量后面加$$功能与加一个$类似，只是它要展开结构里面的结构，上面的输出已经很明显了。

3、修改函数变量

还是上面的例子，cc_stap_test.c第18行给id赋了值111，在第20行，我们也可以用SystamTap修改id的值：

root@jusse ~/develop# cat cc_stap_set_var.stp 
probe process("./cc_stap_test").statement("main@./cc_stap_test.c:20")
{
    $p->id = 222;
    printf("$p$: %s, $p$$: %s\n", $p$, $p$$);
}

root@jusse ~/develop# stap -g cc_stap_set_var.stp -c './cc_stap_test'
p->id: 222, p->name.data: test, p->name.len: 4
$p$: {.name={...}, .id=222}, $p$$: {.name={.len=0, .data="test"}, .id=222}

直接赋值即可，只是需要注意的是stap要加-g参数在guru模式下才能修改变量的值。

4、可选探测点和尝试探测点用法

如果你看SystemTap的tapset，里面有些语法可能看不懂，比如设置probe时后面的问号或者感叹号是什么意思，看看/usr/local/share/systemtap/tapset/linux/memory.stp的一个例子：

这个probe里面有两个!和两个?，简单地说?定义可选探测点，!定义尝试探测点。可选探测点就是即使不能在这里设置探测点就不报错了而直接忽略这个探测点。尝试探测点是前面探测点设置失败之后再尝试设置后面的探测点，像上面这个例子，SystemTap会尝试设置kernel.function("vm_mmap")这个探测点，如果设置成功，那么后面的kernel.function("do_mmap_pgoff") !, kernel.function("do_mmap") ?, kernel.function("do_mmap2") ? 就直接忽略了，如果设置失败，那继续尝试设置 kernel.function("do_mmap_pgoff")这个探测点，同样成功的话后面的两个探测点就不用设置了，失败了才设置kernel.function("do_mmap")。

5、跟踪进程执行流程

当我在学习新代码时，首先想要了解的是代码的处理流程，比如在学haproxy和nginx的时候，首先想看看它们是从main函数开始后怎么从内核收发数据，这个是我学习这两款开源代码的切入点，在还没学会SystemTap之前，硬是看了好几遍代码啊，而且很多地方有条件编译或者函数指针就比较难分析了，后来用gdb设断点分析总算大体流程能分析明白了，但还是不太理想。最近才发现SystemTap分析起来太方便了，而且能实时记录函数的耗费时间，整个函数调用流程就实时呈现到眼前了，太太爽了。看看下面这个是分析nginx worker进程的执行流程：

root@jusse ~/systemtap# cat trace_nginx.stp           
probe process("/opt/nginx-dso/sbin/nginx").function("*").call
{
    printf("%s -> %s\n", thread_indent(4), ppfunc());
}

probe process("/opt/nginx-dso/sbin/nginx").function("*").return
{
    printf("%s <- %s\n", thread_indent(-4), ppfunc());
}

root@jusse ~/systemtap# stap -x 29774 trace_nginx.stp 
WARNING: function _start return probe is blacklisted: keyword at trace_nginx.stp:6:1
 source: probe process("/opt/nginx-dso/sbin/nginx").function("*").return
         ^
     0 nginx(29774):    -> ngx_time_update
    10 nginx(29774):    <- ngx_time_update
     0 nginx(29774):    -> ngx_event_process_posted
     3 nginx(29774):    <- ngx_event_process_posted
     0 nginx(29774):    -> ngx_event_expire_timers
     3 nginx(29774):    <- ngx_event_expire_timers
     0 nginx(29774):    -> ngx_event_process_posted
     2 nginx(29774):    <- ngx_event_process_posted
     0 nginx(29774):    -> ngx_process_events_and_timers
     5 nginx(29774):        -> ngx_event_find_timer
     8 nginx(29774):        <- ngx_event_find_timer
    11 nginx(29774):        -> ngx_trylock_accept_mutex
    16 nginx(29774):            -> ngx_shmtx_trylock
    19 nginx(29774):            <- ngx_shmtx_trylock
    21 nginx(29774):        <- ngx_trylock_accept_mutex
    25 nginx(29774):        -> ngx_epoll_process_events
500591 nginx(29774):            -> ngx_time_update
500604 nginx(29774):                -> ngx_gmtime
500608 nginx(29774):                <- ngx_gmtime
500617 nginx(29774):                -> ngx_vslprintf
500623 nginx(29774):                    -> ngx_sprintf_num
……
500655 nginx(29774):                    <- ngx_sprintf_num
500657 nginx(29774):                <- ngx_vslprintf
500659 nginx(29774):            <- ngx_sprintf
500664 nginx(29774):            -> ngx_localtime
500689 nginx(29774):            <- ngx_localtime
500695 nginx(29774):            -> ngx_vslprintf
500699 nginx(29774):                -> ngx_sprintf_num
……
500732 nginx(29774):                <- ngx_sprintf_num
500734 nginx(29774):            <- ngx_vslprintf
500736 nginx(29774):        <- ngx_sprintf
500741 nginx(29774):        -> ngx_vslprintf
500745 nginx(29774):            -> ngx_sprintf_num
……
500784 nginx(29774):            <- ngx_sprintf_num
500786 nginx(29774):        <- ngx_vslprintf
500788 nginx(29774):    <- ngx_sprintf
     0 nginx(29774):    -> ngx_vslprintf
     4 nginx(29774):        -> ngx_sprintf_num
……
    48 nginx(29774):        <- ngx_sprintf_num
    50 nginx(29774):    <- ngx_vslprintf
     0 nginx(29774): <- ngx_sprintf
     0 nginx(29774):    -> ngx_vslprintf
     5 nginx(29774):        -> ngx_sprintf_num
……
    25 nginx(29774):        <- ngx_sprintf_num
    27 nginx(29774):    <- ngx_vslprintf
     0 nginx(29774): <- ngx_sprintf
     0 nginx(29774): <- ngx_time_update
     0 nginx(29774): <- ngx_epoll_process_events

（省略号是删去一些重复的输出，下同）。前面的数字是这个函数调用开始或者结束时间，单位是微秒。

这个效果是 thread_indent函数实现的，参数是添加空格的数量。不过这个函数有点问题，打印出来补的空格不太对，先来看看它的代码：

默认代码路径在/usr/local/share/systemtap/tapset/indent.stp，从上面截图可以看出thread_indent直接调用_generic_indent，_generic_indent这个函数又调用_generic_indent_depth来获取对齐的空格数depth，_generic_indent_depth这个函数把thread_indent传进来的参数delta更新到_indent_counters全局数组中，但是thread_indent(4)和thread_indent(-4)不对称也就是调用次数不一样的时候就容易出问题了，在分析运行中的进程时就容易出现这种情况，这是因为在function return探测点中调用thread_indent(-4)，调试运行中的进程时可能就直接从一个函数中间开始分析，这时就只捕获到return探测点而捕获不到call探测点，这就导致_indent_counters[idx]一直小于0，_generic_indent_depth返回的x就一直是0，从而导致输出对齐一直是0相当于没对齐，效果像下面这样：

root@jusse ~/systemtap# stap -x 29774 trace_nginx.stp
0 nginx(29774):    -> ngx_vslprintf
     4 nginx(29774):        -> ngx_sprintf_num
……
    48 nginx(29774):        <- ngx_sprintf_num
    50 nginx(29774):    <- ngx_vslprintf
     0 nginx(29774): <- ngx_sprintf
     5 nginx(29774): -> ngx_vslprintf
     0 nginx(29774):    -> ngx_sprintf_num
……
     2 nginx(29774):    <- ngx_sprintf_num
     0 nginx(29774): <- ngx_vslprintf
     2 nginx(29774): <- ngx_sprintf
     3 nginx(29774): <- ngx_time_update
     5 nginx(29774): <- ngx_epoll_process_events
    11 nginx(29774): -> ngx_event_process_posted
    14 nginx(29774): <- ngx_event_process_posted
    19 nginx(29774): -> ngx_event_expire_timers
    22 nginx(29774): <- ngx_event_expire_timers
    25 nginx(29774): -> ngx_event_process_posted
    28 nginx(29774): <- ngx_event_process_posted
    30 nginx(29774): <- ngx_process_events_and_timers
    35 nginx(29774): -> ngx_process_events_and_timers

前面部分输出效果是对的，后面就没效果了。

_generic_indent_depth应该修改成这样（其实就是限制_indent_counters[idx]最小值为0）：

6、跟踪特定进程

跟踪一个进程直接用stap的-x参数指定特定进程的PID就可以了（有时候也要在stp代码里面用target()进行过滤），但是像haproxy或者nginx这种多个子进程而我们想跟踪所有子进程系统调用的话就得在代码里面实现了，其实也比较简单，用execname()获取进程名，再进行匹配过滤就可以了，比如下面是跟踪haproxy系统调用的例子：

root@jusse ~/systemtap# ps aux | grep haproxy          
root       314  0.0  0.0   9388   912 pts/11   S+   09:08   0:00 grep --color haproxy
99       22785  0.0  0.4  11428  4440 ?        Ss   Jan22   0:22 ./haproxy -f ./haproxy.cfg
99       22786  0.0  0.3  10140  3132 ?        Ss   Jan22   0:21 ./haproxy -f ./haproxy.cfg
99       22787  0.0  0.2   9020  2084 ?        Ss   Jan22   0:21 ./haproxy -f ./haproxy.cfg
99       22788  0.0  0.2   9168  2076 ?        Ss   Jan22   0:21 ./haproxy -f ./haproxy.cfg

root@jusse ~/systemtap# cat trace_haproxy_syscall.stp 
probe nd_syscall.*
{
    procname = execname();
    if (procname =~ "haproxy.*") {
        printf("%s[%d]: %s\n", procname, pid(), name);
    }
}

root@jusse ~/systemtap# stap trace_haproxy_syscall.stp 
haproxy[22785]: gettimeofday
haproxy[22785]: gettimeofday
haproxy[22785]: epoll_wait
haproxy[22787]: gettimeofday
haproxy[22787]: gettimeofday
haproxy[22787]: epoll_wait
haproxy[22787]: gettimeofday