ptrace原理与性能对比

strace

starce在Linux上面是一个比较实用的工具,平常可以跟踪线上进程执行了哪些协同调用的工具,并在debug调试的情况下也能很好的使用,这一切的功能也都是基于Linux提供的强大的ptrace系统调用函数。

ptrace原理

根据官方介绍:

			 The ptrace() system call provides a means by which one process
       (the "tracer") may observe and control the execution of another
       process (the "tracee"), and examine and change the tracee's
       memory and registers.  It is primarily used to implement
       breakpoint debugging and system call tracing.

       A tracee first needs to be attached to the tracer.  Attachment
       and subsequent commands are per thread: in a multithreaded
       process, every thread can be individually attached to a
       (potentially different) tracer, or left not attached and thus not
       debugged.  Therefore, "tracee" always means "(one) thread", never
       "a (possibly multithreaded) process".  Ptrace commands are always
       sent to a specific tracee using a call of the form

           ptrace(PTRACE_foo, pid, ...)

       where pid is the thread ID of the corresponding Linux thread.

       (Note that in this page, a "multithreaded process" means a thread
       group consisting of threads created using the clone(2)
       CLONE_THREAD flag.)

       A process can initiate a trace by calling fork(2) and having the
       resulting child do a PTRACE_TRACEME, followed (typically) by an
       execve(2).  Alternatively, one process may commence tracing
       another process using PTRACE_ATTACH or PTRACE_SEIZE.

       While being traced, the tracee will stop each time a signal is
       delivered, even if the signal is being ignored.  (An exception is
       SIGKILL, which has its usual effect.)  The tracer will be
       notified at its next call to waitpid(2) (or one of the related
       "wait" system calls); that call will return a status value
       containing information that indicates the cause of the stop in
       the tracee.  While the tracee is stopped, the tracer can use
       various ptrace requests to inspect and modify the tracee.  The
       tracer then causes the tracee to continue, optionally ignoring
       the delivered signal (or even delivering a different signal
       instead).

       If the PTRACE_O_TRACEEXEC option is not in effect, all successful
       calls to execve(2) by the traced process will cause it to be sent
       a SIGTRAP signal, giving the parent a chance to gain control
       before the new program begins execution.

       When the tracer is finished tracing, it can cause the tracee to
       continue executing in a normal, untraced mode via PTRACE_DETACH.

ptrace主要是基于信号进行通信,并且有一个接受信号的进程与被监控的进程,进程之间通过信号来通信,基于如上的提示,我们简单实现一个监控指定进程的系统调用的程序。

首先在测试环境启动了一个codis-server,并且后续的请求都是基于codis-server进行的观察测试。

ps -ef|grep 6501
root      9491     1  4 Mar02 ?        01:04:32 /data/go/src/github.com/CodisLabs/codis/bin/codis-server *:6501

然后编写如下简单程序。

#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <sys/user.h>
#include <sys/reg.h>

int main()
{
    // 运行的codis-server的进程pid
    pid_t traced_process=9491;
    long orig_eax;
struct user_regs_struct regs;
    long ins;
    ptrace(PTRACE_ATTACH, traced_process, NULL, NULL);

    int flag = 0;
    long syscallID,returnValue;
    ptrace(PTRACE_SYSCALL,traced_process,NULL,NULL);
    while(1) {
         wait(NULL);
         if(flag==0) //第一次(进入系统调用),获取系统调用的参数
            {
                syscallID=ptrace(PTRACE_PEEKUSER, traced_process, ORIG_RAX*8, NULL);
                printf("Process executed system call ID = %ld ",syscallID);
                flag=1;
            }
          else //第二次(退出系统调用),获取系统调用的返回值
            {
                returnValue=ptrace(PTRACE_PEEKUSER, traced_process, RAX*4, NULL);
                printf("with return value= %ld \n", returnValue);
                flag=0;
            }
            ptrace(PTRACE_SYSCALL,traced_process,NULL,NULL);
    }
    ptrace(PTRACE_DETACH, traced_process,
           NULL, NULL);

    return 0;
}

然后进行编译运行。

gcc ptrace_redis.c -o ptrace_redis && ./ptrace_redis
Process executed system call ID = 232 with return value= 140361692218736
Process executed system call ID = 232 with return value= 140361692218736
Process executed system call ID = 2 with return value= 140361692218736
Process executed system call ID = 0 with return value= 6
Process executed system call ID = 3 with return value= 6
Process executed system call ID = 232 with return value= 140361692218736
Process executed system call ID = 2 with return value= 140361692218736
Process executed system call ID = 0 with return value= 6
Process executed system call ID = 3 with return value= 6
Process executed system call ID = 232 with return value= 140361692218736
Process executed system call ID = 2 with return value= 140361692218736
Process executed system call ID = 0 with return value= 6
Process executed system call ID = 3 with return value= 6

此时我们同样通过strace -p 9491来进行跟踪。

# strace -p 9491
strace: Process 9491 attached
epoll_wait(3, 0x7fa880d47f40, 10128, 100) = -1 ENOSYS (Function not implemented)
epoll_wait(3, 0x7fa880d47f40, 10128, 92) = -1 ENOSYS (Function not implemented)
open("/proc/9491/stat", O_RDONLY)       = -1 ENOSYS (Function not implemented)
read(6, 0x7ffeda4e74c0, 4096)           = -1 ENOSYS (Function not implemented)
close(6)                                = -1 ENOSYS (Function not implemented)
epoll_wait(3, 0x7fa880d47f40, 10128, 100) = -1 ENOSYS (Function not implemented)
open("/proc/9491/stat", O_RDONLY)       = -1 ENOSYS (Function not implemented)
read(6, 0x7ffeda4e74c0, 4096)           = -1 ENOSYS (Function not implemented)
close(6)                                = -1 ENOSYS (Function not implemented)
epoll_wait(3, 0x7fa880d47f40, 10128, 100) = -1 ENOSYS (Function not implemented)
open("/proc/9491/stat", O_RDONLY)       = -1 ENOSYS (Function not implemented)
read(6, 0x7ffeda4e74c0, 4096)           = -1 ENOSYS (Function not implemented)
close(6)                                = -1 ENOSYS (Function not implemented)
epoll_wait(3, 0x7fa880d47f40, 10128, 100) = -1 ENOSYS (Function not implemented)

我们通过查阅系统调用号可知,0为read,2为open,3为close,232为epoll_wait。通过两个程序对比就可以知道strace获取的内容与编写的程序获取的差不多一致。故ptrace的基本原理就是这样,strace同样基于此原理实现。

strace对性能的影响

虽然ptrace使用起来方便快捷,并且功能足够强大,使用该功能对性能有多少影响呢。

codis-server压测

首先进行codis-server的压测,并且在压测过程中使用perf top来监控codis-server进程。

# redis-benchmark -h ****  -p 6501  -n 100000 -c 10 -q
PING_INLINE: 12746.97 requests per second
PING_BULK: 11583.46 requests per second
SET: 11418.13 requests per second
GET: 12140.34 requests per second
INCR: 12886.60 requests per second
LPUSH: 12506.25 requests per second
RPUSH: 13262.60 requests per second
LPOP: 11645.51 requests per second
RPOP: 14249.07 requests per second
SADD: 13054.83 requests per second
HSET: 11622.50 requests per second
SPOP: 14427.93 requests per second
LPUSH (needed to benchmark LRANGE): 13670.54 requests per second
LRANGE_100 (first 100 elements): 9206.41 requests per second
LRANGE_300 (first 300 elements): 4941.20 requests per second
LRANGE_500 (first 450 elements): 3940.11 requests per second
LRANGE_600 (first 600 elements): 3397.66 requests per second
MSET (10 keys): 15583.61 requests per second

执行的perf record-p 9491 -g压测完成之后进行perf report进行数据展示,如下。

+   33.10%     0.02%  codis-server  [kernel.kallsyms]   [k] system_call_fastpath
+   23.11%     0.00%  codis-server  libpthread-2.17.so  [.] 0x00007fa887aa36fd
+   21.85%     0.15%  codis-server  [kernel.kallsyms]   [k] sys_write
+   21.14%     0.45%  codis-server  [kernel.kallsyms]   [k] vfs_write
+   20.22%     0.13%  codis-server  [kernel.kallsyms]   [k] do_sync_write
+   19.96%     0.15%  codis-server  [kernel.kallsyms]   [k] sock_aio_write
+   19.76%     0.22%  codis-server  [kernel.kallsyms]   [k] inet_sendmsg
+   18.47%     0.47%  codis-server  [kernel.kallsyms]   [k] tcp_sendmsg
+   15.33%     0.08%  codis-server  [kernel.kallsyms]   [k] tcp_push
+   15.29%     0.11%  codis-server  [kernel.kallsyms]   [k] __tcp_push_pending_frames
+   15.07%     0.34%  codis-server  [kernel.kallsyms]   [k] tcp_write_xmit
+   13.24%     0.41%  codis-server  [kernel.kallsyms]   [k] tcp_transmit_skb
+   12.04%     0.20%  codis-server  [kernel.kallsyms]   [k] ip_queue_xmit
+   11.67%     0.06%  codis-server  [kernel.kallsyms]   [k] ip_local_out_sk
+   11.57%     0.14%  codis-server  [kernel.kallsyms]   [k] ip_output
+   11.34%     0.51%  codis-server  [kernel.kallsyms]   [k] ip_finish_output
+    9.98%     0.05%  codis-server  [kernel.kallsyms]   [k] dev_queue_xmit
+    9.85%     0.19%  codis-server  [kernel.kallsyms]   [k] __dev_queue_xmit
+    9.45%     0.09%  codis-server  [kernel.kallsyms]   [k] sch_direct_xmit
+    9.42%     8.59%  codis-server  codis-server        [.] addReply
+    9.30%     0.02%  codis-server  [kernel.kallsyms]   [k] do_softirq
+    9.28%     0.00%  codis-server  [kernel.kallsyms]   [k] call_softirq
+    9.23%     0.00%  codis-server  [unknown]           [.] 0x00007fa87e80f140
+    9.21%     0.00%  codis-server  [unknown]           [.] 0x00007fa87e816d60
+    9.19%     2.73%  codis-server  [kernel.kallsyms]   [k] __do_softirq
+    8.87%     0.19%  codis-server  [kernel.kallsyms]   [k] dev_hard_start_xmit
+    8.52%     0.17%  codis-server  [kernel.kallsyms]   [k] start_xmit
+    8.50%     0.00%  codis-server  [kernel.kallsyms]   [k] irq_exit
+    8.10%     0.00%  codis-server  [unknown]           [.] 0x00007fa87e80f280
+    7.59%     6.92%  codis-server  codis-server        [.] _addReplyToBuffer
+    7.55%     0.00%  codis-server  [kernel.kallsyms]   [k] ret_from_intr
+    7.55%     0.00%  codis-server  [kernel.kallsyms]   [k] __irqentry_text_start
+    6.69%     0.00%  codis-server  libc-2.17.so        [.] 0x00007fa8877c5f43

从结果上看都是一些正常的业务处理过程。

codis-server利用strace进行跟踪压测

首先先进程strace将strace attach到codis-server进程上面。

strace -p 9491 -o s.log

然后启动压测程序,并使用perf 同时记录执行strace和codis-server两个进程,然后开始压测。

# redis-benchmark -h  *.*.*.*  -p 6501  -n 100000 -c 10 -q
PING_INLINE: 1281.92 requests per second
PING_BULK: 1272.77 requests per second
SET: 1264.41 requests per second
GET: 1275.93 requests per second
INCR: 1271.47 requests per second
LPUSH: 1264.77 requests per second
RPUSH: 1268.50 requests per second
LPOP: 1261.34 requests per second
RPOP: 1261.59 requests per second
SADD: 1267.09 requests per second
HSET: 1243.95 requests per second
SPOP: 1293.98 requests per second
LPUSH (needed to benchmark LRANGE): 1261.37 requests per second
LRANGE_100 (first 100 elements): 1216.75 requests per second
LRANGE_300 (first 300 elements): 1132.63 requests per second
LRANGE_500 (first 450 elements): 1070.72 requests per second
LRANGE_600 (first 600 elements): 1034.86 requests per second
MSET (10 keys): 1238.67 requests per second

此时对比strace程序的perf report。

+   76.45%     0.09%  strace   [kernel.kallsyms]   [k] system_call_fastpath
+   32.52%     0.47%  strace   libc-2.17.so        [.] ptrace
+   30.25%     0.45%  strace   [kernel.kallsyms]   [k] sys_ptrace
+   28.53%     0.76%  strace   libc-2.17.so        [.] __GI___wait4
+   26.51%     0.00%  strace   [unknown]           [k] 0x2d00313934390070
+   26.51%     0.00%  strace   [unknown]           [k] 0x00007fff18e1c5b9
+   26.37%     0.36%  strace   [kernel.kallsyms]   [k] arch_ptrace
+   25.98%     0.54%  strace   [kernel.kallsyms]   [k] ptrace_request
+   25.57%    25.57%  strace   [kernel.kallsyms]   [k] _raw_spin_unlock_irqrestore
+   24.31%     0.64%  strace   [kernel.kallsyms]   [k] sys_wait4
+   23.51%     0.20%  strace   [kernel.kallsyms]   [k] ptrace_resume
+   23.33%     0.15%  strace   [kernel.kallsyms]   [k] wake_up_state
+   23.22%     0.07%  strace   [kernel.kallsyms]   [k] try_to_wake_up
+   22.89%     1.14%  strace   [kernel.kallsyms]   [k] do_wait
+   17.70%     0.21%  strace   libc-2.17.so        [.] __GI___libc_write
+   16.46%     0.39%  strace   [kernel.kallsyms]   [k] sys_write
+   14.60%     0.36%  strace   [kernel.kallsyms]   [k] vfs_write
+   13.79%     0.32%  strace   [kernel.kallsyms]   [k] schedule
+   13.47%     0.41%  strace   [kernel.kallsyms]   [k] __schedule
+   12.71%    12.70%  strace   [kernel.kallsyms]   [k] finish_task_switch
+   12.52%     0.16%  strace   [kernel.kallsyms]   [k] do_sync_write
+   12.21%     0.76%  strace   [kernel.kallsyms]   [k] xfs_file_aio_write
+   11.03%     0.20%  strace   [kernel.kallsyms]   [k] xfs_file_buffered_aio_write
+    6.07%     0.16%  strace   libc-2.17.so        [.] process_vm_readv
+    5.66%     0.26%  strace   [kernel.kallsyms]   [k] generic_file_buffered_write
+    5.41%     0.12%  strace   [kernel.kallsyms]   [k] sys_process_vm_readv
+    5.23%     0.14%  strace   [kernel.kallsyms]   [k] process_vm_rw
+    4.69%     2.97%  strace   [kernel.kallsyms]   [k] wait_consider_task
+    4.59%     0.40%  strace   libc-2.17.so        [.] __vfprintf_chk
+    4.27%     0.41%  strace   [kernel.kallsyms]   [k] process_vm_rw_core.isra.1
+    3.48%     0.14%  strace   [kernel.kallsyms]   [k] xfs_file_aio_write_checks
+    3.30%     0.22%  strace   [kernel.kallsyms]   [k] xfs_vm_write_begin

此时codis-server的perf report。

+   55.42%     1.22%  codis-server  [kernel.kallsyms]   [k] ptrace_notify
+   54.16%     0.00%  codis-server  [kernel.kallsyms]   [k] ptrace_do_notify
+   53.44%     2.89%  codis-server  [kernel.kallsyms]   [k] ptrace_stop
+   51.05%     0.27%  codis-server  [kernel.kallsyms]   [k] tracesys
+   45.11%     0.00%  codis-server  libpthread-2.17.so  [.] 0x00007fa887aa36fd
+   31.81%     0.00%  codis-server  libpthread-2.17.so  [.] 0x00007fa887aa375d
+   31.60%     0.39%  codis-server  [kernel.kallsyms]   [k] do_notify_parent_cldstop
+   30.72%    30.72%  codis-server  [kernel.kallsyms]   [k] _raw_spin_unlock_irqrestore
+   29.31%     0.25%  codis-server  [kernel.kallsyms]   [k] int_check_syscall_exit_work
+   29.03%     0.24%  codis-server  [kernel.kallsyms]   [k] syscall_trace_leave
+   28.68%     0.54%  codis-server  [kernel.kallsyms]   [k] syscall_trace_enter
+   28.08%     0.25%  codis-server  [kernel.kallsyms]   [k] tracehook_report_syscall_exit
+   18.79%     0.32%  codis-server  [kernel.kallsyms]   [k] schedule
+   18.46%     0.56%  codis-server  [kernel.kallsyms]   [k] __schedule
+   17.44%    17.44%  codis-server  [kernel.kallsyms]   [k] finish_task_switch
+   16.49%     0.20%  codis-server  [kernel.kallsyms]   [k] sys_write
+   15.60%     0.39%  codis-server  [kernel.kallsyms]   [k] vfs_write
+   14.59%     0.07%  codis-server  [kernel.kallsyms]   [k] do_sync_write
+   14.37%     0.14%  codis-server  [kernel.kallsyms]   [k] sock_aio_write
+   14.19%     0.25%  codis-server  [kernel.kallsyms]   [k] inet_sendmsg
+   13.47%     0.38%  codis-server  [kernel.kallsyms]   [k] tcp_sendmsg
+   11.05%     0.06%  codis-server  [kernel.kallsyms]   [k] tcp_push
+   10.98%     0.07%  codis-server  [kernel.kallsyms]   [k] __tcp_push_pending_frames
+   10.84%     0.26%  codis-server  [kernel.kallsyms]   [k] tcp_write_xmit
+    9.53%     0.33%  codis-server  [kernel.kallsyms]   [k] tcp_transmit_skb
+    8.80%     0.18%  codis-server  [kernel.kallsyms]   [k] ip_queue_xmit
+    8.48%     0.05%  codis-server  [kernel.kallsyms]   [k] ip_local_out_sk
+    8.41%     0.11%  codis-server  [kernel.kallsyms]   [k] ip_output
+    8.25%     0.47%  codis-server  [kernel.kallsyms]   [k] ip_finish_output
+    7.58%     0.04%  codis-server  [kernel.kallsyms]   [k] dev_queue_xmit
+    7.48%     0.19%  codis-server  [kernel.kallsyms]   [k] __dev_queue_xmit
+    7.12%     0.08%  codis-server  [kernel.kallsyms]   [k] sch_direct_xmit
+    6.65%     0.16%  codis-server  [kernel.kallsyms]   [k] dev_hard_start_xmit

从两者的对比中可以看出,被ptrace的codis-server进程做的最多的就行ptrace_notify来进行信号的通信,并且该通信严重影响了codis-server本身对外提供服务的能力。

总结

本文简单的概述了一下ptrace的基本原理,详细的调用的流程可以参考Linux5.16版本,因为ptrace基于信号的通信方式,并且会有信号阻塞的情况,在业务性能敏感的情况一定要慎用该方式去调试监控生产环境的服务。在本文的codis-server的压测结果中,加了strace跟踪的情况下,codis-server的性能下降了大概十倍左右。由于本人才疏学浅,如有错误请批评指正。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值