IBM utrace ptrace uprobes论文

Ptrace, Utrace, Uprobes: Lightweight, Dynamic Tracing of User Apps

Jim Keniston Ananth Mavinakayanahalli Prasanna Panchamukhi

IBM IBM IBM

jkenisto@us.ibm.com ananth@in.ibm.com prasanna@in.ibm.com

Vara Prasad

IBM

varap@us.ibm.com

Abstract

Details of the design and implementation of uprobes

form the major portion of this paper.

The ptrace system-call API, though useful for many

tools such as gdb and strace, generally proves unsat-

isfactory when tracing multithreaded or multi-process

applications, especially in timing-dependent debugging

scenarios. With the utrace kernel API, a kernel-side

instrumentation module can track interesting events in

traced processes. The uprobes kernel API exploits

and extends utrace to provide kprobes-like, breakpoint-

based probing of user applications.

We describe how utrace, uprobes, and kprobes together

provide an instrumentation facility that overcomes some

limitations of ptrace. For attendees, familiarity with a

tracing API such as ptrace or kprobes will be helpful

but not essential.

1

Introduction

For a long time now, debugging user-space applications

has been dependent on the ptrace system call. Though

ptrace has been very useful and will almost certainly

continue to prove its worth, some of the requirements it

imposes on its clients are considered limiting. One im-

portant limitation is performance, which is influenced

by the high context-switch overheads inherent in the

ptrace approach.

We start by discussing the current situation in the user-

space tracing world. Sections 2 and 3 discuss the var-

ious instrumentation approaches possible and/or avail-

able. Section 4 goes on to discuss the goals that led

to the current uprobes design, while Section 5 details

the implementation. In the later sections, we put forth

some of the challenges, especially with regard to mod-

ifying text and handling of multithreaded applications.

Further, there is a brief discussion on how and where

we envision this infrastructure can be put to use. We fi-

nally conclude with a discussion on where this work is

headed.

2

Ptrace-based Application Tracing

Like many other flavors of UNIX, Linux provides the

ptrace system-call interface for tracing a running pro-

cess. This interface was designed mainly for debugging,

but it has been used for tracing purposes as well. This

section surveys some of the ptrace-based tracing tools

and presents limitations of the ptrace approach for low-

impact tracing.

Ptrace supports the following types of requests:

The utrace patchset [3] mitigates this to a large extent.

Utrace provides in-kernel callbacks for the same sorts

of events reported by ptrace. The utrace patchset re-

implements ptrace as a client of utrace.

Uprobes is another utrace client. Analogous to kprobes

for the Linux R kernel, uprobes provides a simple, easy-

to-use API to dynamically instrument user applications.

• 215 •

• Attach to, or detach from, the process being traced

(the “tracee”).

• Read or write the process’s memory, saved regis-

ters, or user area.

• Continue execution of the process, possibly until a

particular type of event occurs (e.g., a system call

is called or returns).

216 • Ptrace, Utrace, Uprobes: Lightweight, Dynamic Tracing of User Apps

• Ptrace is not a POSIX system call. Its behavior

varies from operating system to operating system,

and has even varied from version to version in

Linux. Vital operational details (“Why do I get two

SIGCHLDs here? Am I supposed to pass the pro-

cess a SIGCONT or no signal at all here?”) are not

documented, and are not easily gleaned from the

kernel source.

Events in the tracee turn into SIGCHLD signals that

are delivered to the tracing process. The associated

siginfo_t specifies the type of event.

2.1

Gdb

gdb is the most widely used application debugger in

Linux, and it runs on other flavors of UNIX as well.

gdb controls the program to be debugged using ptrace

requests. gdb is used mostly as an interactive debugger,

but also provides a batch option through which a series

of gdb commands can be executed, without user inter-

vention, each time a breakpoint is hit. This method of

tracing has significant performance overhead. gdb’s ap-

proach to tracing multithreaded applications is to stop

all threads whenever any thread hits a breakpoint.

2.2

Ltrace

The ltrace command is similar to strace, but it traces

calls and returns from dynamic library functions. It can

also trace system calls, and extern functions in the traced

program itself. ltrace uses ptrace to place breakpoints at

the entry point and return address of each probed func-

tion. ltrace is a useful tool, but it suffers from the per-

formance limitations inherent in ptrace-based tools. It

also appears not to work for multithreaded programs.

2.4

• Overheads associated with accessing the tracee’s

memory and registers are enormous—on the order

of 10x to 100x or more, compared with equivalent

in-kernel access. Ptrace’s PEEK-and-POKE inter-

face provides very low bandwidth and incurs nu-

merous context switches.

Strace

The strace command provides the ability to trace calls

and returns from all the system calls executed by

the traced process. strace exploits ptrace’s PTRACE_

SYSCALL request, which directs ptrace to continue ex-

ecution of the tracee until the next entry or exit from a

system call. strace handles multithreaded applications

well, and it has significantly less performance overhead

than the gdb scripting method; but performance is still

the number-one complaint about strace. Using strace

to trace itself shows that each system call in the tracee

yields several system calls in strace.

2.3

• The amount of perseverence and/or luck you need

to get a working program goes up as you try to

monitor more than one process or more than one

thread.

Ptrace Limitations

If gdb, strace, and ltrace don’t give you the type of in-

formation you’re looking for, you might consider writ-

ing your own ptrace-based tracing tool. But consider the

following ptrace limitations first:

• In order to trace a process, the tracer must become

the tracee’s parent. To attach to an already run-

ning process, then, the tracer must muck with the

tracee’s lineage. Also, if you decide you want to

apply more instrumentation to the same process,

you have to detach the tracer already in place.

3

Kernel-based Tracing

In the early days of Linux, the kernel code base was

manageable and most people working on the kernel

knew their core areas intimately. There was a definite

pushback from the kernel community towards including

any tracing and/or debugging features in the mainline

kernel.

Over time, Linux became more popular and the num-

ber of kernel contributors increased. A need for a flexi-

ble tracing infrastructure was recognized. To that end, a

number of projects sprung up and have achieved varied

degrees of success.

We will look at a few of these projects in this section.

Most of them are based on the kernel-module approach.

3.1

Kernel-module approach

The common thread among the following approaches

is that the instrumentation code needs to run in kernel

2007 Linux Symposium, Volume One • 217

mode. Since we wouldn’t want to burden the kernel at • Event reporting: Utrace clients register callbacks

  times when the instrumentation isn’t in use, such code to be run when the thread encounters specific

    is introduced only when needed, in the form of a kernel events of interest. These include system call en-

    module. try/exit, signals, exec, clone, exit, etc.

3.1.1 • Thread control: Utrace clients can inject signals,

       request that a thread be stopped from running in

      user-space, single-step, block-step, etc.

Kprobes

Kprobes [2] is perhaps the most widely accepted of all

dynamic instrumentation mechanisms currently avail-

able for the Linux kernel. Kprobes traces its roots back

to DProbes (discussed later). In fact, the first version of

kprobes was a patch created by just taking the minimal,

kernel-only portions of the DProbes framework.

Kprobes allows a user to dynamically insert “probes”

into specific locations in the Linux kernel. The user

specifies “handlers” (instrumentation functions) that run

before and/or after the probed instruction is executed.

When a probepoint (which typically is an architecture-

specific breakpoint instruction) is hit, control is passed

to the kprobes infrastructure, which takes care of execut-

ing the user-specified handlers. [2] provides an in-depth

treatment of the kprobes infrastructure (which, inciden-

tally, includes jprobes and function-return probes).

The kernel-module approach was a natural choice for

kprobes: after all, the goal is to access and instrument

the Linux kernel. Given the privilege and safety re-

quirements necessary to access kernel data structures,

the kernel-module approach works very well.

3.1.2

Utrace

• Thread machine state access: While in a callback,

a client can inspect and/or modify the thread’s core

state, including the registers and u-area.

Utrace works by placing tracepoints at strategic points in

kernel code. For traced threads, these tracepoints yield

calls into the registered utrace clients. These callbacks,

though happening in the context of a user process, hap-

pen when the process is executing in kernel mode. In

other words, utrace clients run in the kernel. Ptrace is

one utrace client that lives in the kernel. Other clients—

especially those used for ad hoc instrumentation—may

be implemented as kernel modules.

3.1.3

Uprobes

Uprobes is another client of utrace. As such, uprobes

can be seen as a flexible user-space probing mecha-

nism that comes with all the power, but not all the con-

straints, of ptrace. Just as kprobes creates and manages

probepoints in kernel code, uprobes creates and man-

ages probepoints in user applications. The uprobes in-

frastructure, like ptrace, lives in the kernel.

A relatively new entrant to this instrumentation space A uprobes user writes a kernel module, specifying for

is utrace. This infrastructure is intended to serve as an each desired probepoint the process and virtual address

abstraction layer to write the next generation of user- to be probed and the handler to run when the probepoint

space tracing and debugging applications. is hit.

One of the primary grouses kernel hackers have had A uprobes-based module can also use the utrace and/or

about ptrace is the lack of separation/layering of kprobes APIs. Thus a single instrumentation module

code between architecture-specific and -agnostic parts. can collect and correlate information from the user ap-

  Utrace aims to mitigate this situation. Ptrace is now but plication(s), shared libraries, the kernel/user interfaces,

  one of the clients of utrace. and the kernel itself.

Utrace, at a very basic level, is an infrastructure to mon-

itor individual Linux “threads”—each represented by a

task_struct in the kernel. An “engine” is utrace’s

basic control unit. Typically, each utrace client estab-

lishes an engine for each thread of interest. Utrace pro-

vides three basic facilities on a per-engine basis:

3.1.4

SystemTap

SystemTap [4] provides a mechanism to use the kernel

(and later, user) instrumentation tools, through a simple

218 • Ptrace, Utrace, Uprobes: Lightweight, Dynamic Tracing of User Apps

Tool 

From a user’s script, the stap command produces and

loads a kernel module containing calls to the kprobes

API. Trace data from this module is passed to user space

using efficient mechanisms (such as relay channels), and

the SystemTap post-processing infrastructure takes care

of deciphering and presenting the gathered information

in the format requested by the user, on the stap com-

mand’s standard output.

SystemTap can be viewed as a wrapper that enables easy

use of kernel instrumentation mechanisms, including,

but not limited to, kprobes. Plans are afoot to exploit

utrace, uprobes, the “markers” static tracing infrastruc-

ture, and a performance-monitoring-hardware infras-

tructure (perfmon), as they become part of the mainline

kernel.

3.2

Interpreter-based approaches

Two more instrumentation tools, DProbes and DTrace,

bear mention. Both tools endeavor to provide integrated

tracing support for user applications and kernel code.

DProbes [1] traces its roots to IBM’s OS/2 R operating

system, but has never found a foothold in Linux, ex-

cept as the forebear of kprobes. DProbes instrumenta-

tion is written in the RPN programming language, the

assembly language for a virtual machine whose inter-

preter resides in the kernel. For the i386 architecture,

the DProbes C Compiler (dpcc) translates instrumenta-

tion in a C-like language into RPN. Like kprobes and

uprobes, DProbes allows the insertion of probepoints

anywhere in user or kernel code.

DTrace is in use on Solaris and FreeBSD. Instrumenta-

tion is written in the D programming language, which

provides aggregation and presentation facilities similar

to those of SystemTap. As with DProbes, DTrace instru-

mentation runs on an interpreter in the kernel. DTrace

kernel probepoints are limited to a large but predefined

set. A DTrace probe handler is limited to a single basic

block (no ifs or loops).

Event Counted

ltrace -c 

gdb -batch 

strace -c 

kprobes 

uprobes 

utrace 

awk-like event/action scripting language. A SystemTap

script specifies code points to probe, and for each, the

instrumentation code to run when the probepoint is hit.

The scripting language provides facilities to aggregate,

collate, filter, present, and analyze trace data in a manner

that is intuitive to the user.

function calls

function calls

system calls

function calls

function calls

system calls

Overhead (usec)

per Event

22

265

25

0.25

3.4

0.16

Table 1: Performance of ptrace-based tools (top) vs. ad

hoc kernel modules (bottom)

3.3

Advantages of kernel-based application tracing

• Kernel-based instrumentation is inherently fast.

Table 1 shows comparative performance numbers

of ptrace-based tools vs. kernel modules using

kprobes, uprobes, and utrace. These measurements

were taken on a Pentium R M (1495 MHz) running

the utrace implementation of ptrace.

• Most systemic problems faced by field engineers

involve numerous components. Problems can per-

colate from a user-space application all the way up

to a core kernel component, such as the block layer

or the networking stack. Providing a unified view

of the flow of control and/or data across user space

and kernel is possible only via kernel-based trac-

ing.

• Kernel code runs at the highest privilege and as

such, can access all kernel data structures. In ad-

dition, the kernel has complete freedom and access

to the process address space of all user-space pro-

cesses. By using safe mechanisms provided in ker-

nel to access process address spaces, information

related to the application’s data can be gleaned eas-

ily.

• Ptrace, long the standard method for user-space in-

strumentation, has a number of shortcomings, as

discussed in Section 2.4.

3.4

Drawbacks of kernel-based application tracing

• The kernel runs at a higher privilege level and in

a more restricted environment than applications.

Assumptions and programming constructs that are

valid for user space don’t necessarily hold for the

kernel.

2007 Linux Symposium, Volume One • 219

• Not everybody is a kernel developer. It’s not pru- processes of any type (related or unrelated), and

    dent to expect someone to always write “correct” can probe multithreaded applications of all sorts.

        kernel code. And, while dealing with the kernel, Probes can be established or removed at any stage

        one mistake may be too many. Most application in a process’s lifetime. Multiple instrumentation

        developers and system administrators, understand- modules can probe the same processes (and even

        ably, are hesitant to write kernel modules. the same probepoints) simultaneously.

• The kernel has access to user-space data, but can’t • Probe processes, not executables. In earlier ver-

    easily get at all the information (e.g., symbol table, sions of uprobes, a probepoint referred to a partic-

    debug information) to decode it. This information ular instruction in a specified executable or shared

    must be provided in or communicated to the kernel library. Thus a probepoint affected all processes

    module, or the kernel must rely on post-processing (current and future) running that executable. This

    in user space. made it relatively easy to probe a process right

                  from exec time, but the implications of this im-

                 plementation (e.g., inconsistency between the in-

                memory and on-disk images of a probed page)

               were unacceptable. Like ptrace, uprobes now

              probes per-process, and uses access_process_

             vm() to ensure that a probed process gets its own

            copy of the probed page when a probepoint is in-

           serted.

SystemTap goes a long way in mitigating these draw-

backs. For kernel tracing, SystemTap uses the running

kernel’s debug information to determine source file, line

number, location of variables, and the like. For applica-

tions that adhere to the DWARF format, it wouldn’t be

hard for SystemTap to provide address/symbol resolu-

tion.

4

Uprobes Design Goals

The idea of user-space probes has been around for years,

and in fact there have been a variety of proposed imple-

mentations. For the current, utrace-based implementa-

tion, the design goals were as follows:

• Support the kernel-module approach. Uprobes fol-

lows the kprobes model of dynamic instrumenta-

tion: the uprobes user creates an ad hoc kernel

module that defines the processes to be probed, the

probepoints to establish, and the kernel-mode in-

strumentation handlers to run when probepoints are

hit. The module’s init function establishes the

probes, and its cleanup function removes them.

• Interoperate with kprobes and utrace. An instru-

mentation module can establish probepoints in the

kernel (via kprobes) as well as in user-mode pro-

grams. A uprobe or utrace handler can dynamically

establish or remove kprobes, uprobes, or utrace-

event callbacks.

• Minimize limitations on the types of applications

that can be probed and the way in which they

can be probed. In particular, a uprobes-based in-

strumentation module can probe any number of

• Handlers can sleep. As discussed later in Sec-

tion 5, uprobes learns of each breakpoint hit via

utrace’s signal-callback mechanism. As a result,

the user-specified handler runs in a context where it

can safely sleep. Thus, the handler can access any

part of the probed process’s address space, resident

or not, and can undertake other operations (such as

registering and unregistering probes) that a kprobe

handler cannot. Compared with earlier implemen-

tations of uprobes, this provides more flexibility at

the cost of some additional per-hit overhead.

• Performance. Since a uprobe handler runs in the

context of the probed process, ptrace’s context

switches between the probed process and the in-

strumentation parent on every event are eliminated.

The result is significantly less overhead per probe

hit than in ptrace, though significantly more than in

kprobes.

• Safe from user-mode interference. Both uprobes

and the instrumentation author must account for the

possibility of unexpected tinkering with the probed

process’s memory—e.g., by the probed process it-

self or by a malicious ptrace-based program—and

ensure that while the sabotaged process may crash,

the kernel’s operation is not affected.

• Minimize intrusion on existing Linux code. For

hooks into a process’s events of interest, uprobes

220 • Ptrace, Utrace, Uprobes: Lightweight, Dynamic Tracing of User Apps

uses those provided by utrace. Uprobes creates

per-task and per-process data structures, but main-

tains them independently of the corresponding data

structures in the core kernel and in utrace. As a

result, at this writing, the base uprobes patch, in-

cluding i386 support, includes only a few lines of

patches against the mainline kernel source.

• Portable to multiple architectures. At the time of

this writing, uprobes runs on the i386 architec-

ture. Ports are underway to PowerPC, x86_64, and

zLinux (s390x). Except for the code for single-

stepping out of line, which you need (adapting

from kprobes, typically) if you don’t want to miss

probepoints in multithreaded applications, a typi-

cal uprobes port is on the order of 50 lines.

5

Uprobes Implementation

Overview

Uprobes can be thought of as containing the following

overlapping pieces:

• data structures;

• the register/unregister API;

• utrace callbacks; and

• architecture-specific code.

In this section, we’ll describe each of these pieces

briefly.

5.1

Data structures

Uprobes creates the following internal data structures:

• uprobe_process – one for each probed pro-

cess;

• uprobe_task – one for each thread (task) in a

probed process; and

• uprobe_kimg (Kernel IMaGe) – one for each

probepoint in a probed process. (Multiple uprobes

at the same address map to the same uprobe_

kimg.)

Each uprobe_task and uprobe_kimg is owned

by its uprobe_process. Data structures associated

with return probes (uretprobes) and single-stepping out

of line are described later.

5.2

The register/unregister API

The fundamental API functions are register_

uprobe() and unregister_uprobe(), each of

which takes a pointer to a uprobe object defined in the

instrumentation module. A uprobe object specifies the

pid and virtual address of the probepoint, and the han-

dler function to be run when the probepoint is hit.

The register_uprobe() function finds the

uprobe_process specified by the pid, or creates

the uprobe_process and uprobe_task(s) if

they don’t already exist. register_uprobe() then

creates a uprobe_kimg for the probepoint, queues

it for insertion, requests (via utrace) that the probed

process “quiesce,” sleeps until the insertion takes place,

and then returns. (If there’s already a probepoint at the

specified address, register_uprobe() just adds

the uprobe to that uprobe_kimg and returns.)

Note that since all threads in a process share the same

text, there’s no way to register a uprobe for a particular

thread in a multithreaded process.

Once all threads in the probed process have quiesced,

the last thread to quiesce inserts a breakpoint instruction

at the specified probepoint, rouses the quiesced threads,

and wakes up register_uprobe().

unregister_uprobe() works in reverse: Queue

the uprobe_kimg for removal, quiesce the probed

process, sleep until the probepoint has been removed,

and delete the uprobe_kimg.

If this was the

last uprobe_kimg for the process, unregister_

uprobe() tears down the entire uprobe_process,

along with its uprobe_tasks.

5.3

Utrace callbacks

Aside from registration and unregistration, everything in

uprobes happens as the result of a utrace callback. When

a uprobe_task is created, uprobes calls utrace_

attach() to create an engine for that thread, and lis-

tens for the following events in that task:

• breakpoint signal (SIGTRAP on most architec-

tures): Utrace notifies uprobes when the probed

task hits a breakpoint. Uprobes determines which

probepoint (if any) was hit, runs the associated

probe handler(s), and directs utrace to single-step

the probed instruction.

2007 Linux Symposium, Volume One • 221

• single-step signal (SIGTRAP on most architec-

tures): Utrace notifies uprobes after the probed in-

struction has been single-stepped. Uprobes does

any necessary post-single-step work (discussed in

later sections), and directs utrace to continue exe-

cution of the probed task.

• fork/clone: Utrace notifies uprobes when a probed

task forks a new process or clones a new thread.

When a new thread is cloned for the probed pro-

cess, uprobes adds a new uprobe_task to the

uprobe_process, complete with an appropri-

ately programmed utrace engine. In the case

of fork(), uprobes doesn’t attempt to preserve

probepoints in the child process, since each uprobe

object refers to only one process. Rather, uprobes

iterates through all the probe addresses in the par-

ent and removes the breakpoint instructions in the

child.

• exit: Utrace notifies uprobes when a probed

task exits. Uprobes deletes the corresponding

uprobe_task. If this was the last uprobe_

task for the process, uprobes tears down the en-

tire uprobe_process. (Since the process is ex-

iting, there’s no need to remove breakpoints.)

• exec: Utrace notifies uprobes when a probed task

execs a new program. (Utrace reports exit events

for any other threads in the process.) Since probe-

points in the old program are irrelevant in the new

program, uprobes tears down the entire uprobe_

process. (Again, there’s no need to remove

breakpoints.)

• quiesce: Uprobes listens for the above-listed events

all the time. By contrast, uprobes listens for qui-

esce events only while it’s waiting for the probed

process to quiesce, in preparation for insertion or

removal of a breakpoint instruction. (See Sec-

tion 6.2).

5.4

Architecture-specific code

Most components of the architecture-specific code for

uprobes are simple macros and inline functions. Sup-

port for return probes (uretprobes) typically adds 10–40

lines. By far the majority of the architecture-specific

code relates to “fix-ups” necessitated by the fact that we

single-step a copy of the probed instruction, rather than

single-stepping it in place. See “Tracing multithreaded

applications: SSOL” in the next section.

6

Uprobes Implementation Notes

6.1

Tracing multithreaded applications: SSOL

Like some other tracing and debugging tools, uprobes

implements a probepoint by replacing the first byte(s)

of the instruction at the probepoint with a breakpoint

instruction, after first saving a copy of the original in-

struction. After the breakpoint is hit and the handler has

been run, uprobes needs to execute the original instruc-

tion in the context of the probed process. There are two

commonly accepted ways to do this:

• Single-stepping inline (SSIL): Temporarily replace

the breakpoint instruction with the original instruc-

tion; single-step the instruction; restore the break-

point instruction; and allow the task to continue.

This method is typically used by interactive debug-

gers such as gdb.

• Single-stepping out of line (SSOL): Place a copy of

the original instruction somewhere in the probed

process’s address space; single-step the copy; “fix

up” the task’s state as necessary; and allow the task

to continue. If the effect of the instruction depends

on its address (e.g., a relative branch), the task’s

registers and/or stack must be “fixed up” after the

instruction is executed (e.g., to make the program

counter relative to the address of the original in-

struction, rather than the instruction copy). This

method is used by kprobes for kernel tracing.

The SSIL approach doesn’t work acceptably for mul-

tithreaded programs. In particular, while the break-

point instruction is temporarily removed during single-

stepping, another thread can sail past the probepoint.

We considered the approach of quiescing, or otherwise

blocking, all threads every time one hits a probepoint, so

that we could be sure of no probe misses during SSIL,

but we considered the performance implications unac-

ceptable.

In terms of implementation, SSOL has two important

implications:

• Each uprobes port must provide code to do

architecture-specific post-single-step fix-ups.

Much of this code can be filched from kprobes,

222 • Ptrace, Utrace, Uprobes: Lightweight, Dynamic Tracing of User Apps

but there are additional implications for uprobes. • If the targeted thread is already quiesced, setting

For example, uprobes must be prepared to handle the UTRACE_ACTION_QUIESCE flag causes the

any instruction in the architecture’s instruction set, quiesce callback to run immediately in the context

  not just those used in the kernel; and for some of the [un]register_uprobe() task. This

  architectures, uprobes must be able to handle both breaks the “sleep until the last quiescent thread

  32-bit and 64-bit user applications. wakes me” paradigm.

• The instruction copy to be single-stepped must re- • If a thread hits a breakpoint on the road to quies-

  side somewhere in the probed process’s address cence, its quiesce callback is called before the sig-

    space. Since uprobes can’t know what other nal callback. This turns out to be a bad place to stop

      threads may be doing while a thread is single- for breakpoint insertion or removal. For example,

      stepping, the instruction copy can’t reside in any if we happen to remove the uprobe_kimg for the

        location legitimately used by the program. After probepoint we just hit, the subsequent signal call-

        considering various approaches, we decided to al- back won’t know what to do with the SIGTRAP.

        locate a new VM area in the probed process to hold 

        the instruction copies. We call this the SSOL area. 

To minimize the impact on the system, uprobes allo-

cates the SSOL area only for processes that are actually

probed, and the area is small (typically one page) and

of fixed size. “Instruction slots” in this area are allo-

cated on a per-probepoint basis, so that multiple threads

can single-step in the same slot simultaneously. Slots

are allocated only to probepoints that are actually hit.

If uprobes runs out of free slots, slots are recycled on a

least-recently-used basis.

6.2

Quiescing

Uprobes takes a fairly conservative approach when in-

serting and removing breakpoints: all threads in the

probed process must be “quiescent” before the break-

point is inserted/removed.

Our approach to quiescing the threads started out fairly

simple: For each task in the probed process, the

[un]register_uprobe() function sets the UTRACE_

ACTION_QUIESCE flag in the uprobe_task’s en-

gine, with the result that utrace soon brings the task

to a stopped state and calls uprobes’s quiesce callback

for that task. After setting all tasks on the road to qui-

escence, [un]register_uprobe() goes to sleep.

For the last thread to quiesce, the quiesce callback in-

serts or removes the requested breakpoint, wakes up

[un]register_uprobe(), and rouses all the qui-

esced threads.

It turned out to be more complicated than that. For ex-

ample:

• If [un]register_uprobe() is called from a

uprobe handler, it runs in the context of the probed

task. Again, this breaks the “sleep until the last

quiescent thread wakes me” paradigm.

It turned out to be expedient to establish an “alternate

quiesce point” in uprobes, in addition to the quiesce call-

back. When it finishes handling a probepoint, the up-

robes signal callback checks to see whether the process

is supposed to be quiescing. If so, it does essentially

what the quiesce callback does: if it’s the last thread to

“quiesce,” it processes the pending probe insertion or re-

moval and rouses the other threads; otherwise, it sleeps

in a pseudo-quiesced state until the “last” thread rouses

it. Consequently, if [un]register_uprobe() sees

that a thread is currently processing a probepoint, it

doesn’t try to quiesce it, knowing that it will soon hit

the alternate quiesce point.

6.3

User-space return probes

Uprobes supports a second type of probe: a return probe

fires when a specified function returns. User-space re-

turn probes (uretprobes) are modeled after return probes

in kprobes (kretprobes).

When you register a uretprobe, you specify the process,

the function (i.e., the address of the first instruction), and

a handler to be run when the function returns.

Uprobes sets a probepoint at the entry to the function.

When the function is called, uprobes saves a copy of

the return address (which may be on the stack on in a

register, depending on the architecture) and replaces the

return address with the address of the “uretprobe tram-

poline,” which is simply a breakpoint instruction.

2007 Linux Symposium, Volume One • 223

When the function returns, control passes to the tram-

poline, the breakpoint is hit, and uprobes gains control.

Uprobes runs the user-specified handler, then restores

the original return address and allows the probed func-

tion to return.

7

Applying Uprobes

We envision uprobes being used in the following situa-

tions, for debugging and/or for performance monitoring:

• Tracing timing-sensitive applications.

In uprobes, the return-probes implementation differs

from kprobes in several ways:

• Tracing multithreaded applications.

• Tracing very large and/or complex applications.

• The user doesn’t need to specify how many • Diagnosis of systemic performance problems in-

    “return-probe instance” objects to preallocate. volving multiple layers of software (in kernel and

        Since uprobes runs in a context where it can use user space).

        kmalloc() freely, no preallocation is necessary. 

• Each probed process needs a trampoline in its ad- • Tracing applications in creative ways – e.g., col-

  dress space. We use one of the slots in the SSOL lecting different types of information at different

  area for this purpose. probepoints, or dynamically adjusting which code

                        points are probed.

• As in kprobes, it’s permissible to unregister the re-

turn probe while the probed function is running.

Even after all probes have been removed, uprobes

keeps the uprobe_process and its uprobe_

tasks (and utrace engines) around as long as nec-

essary to catch and process the last hit on the uret-

probe trampoline.

6.4

Registering/unregistering probes in probe han-

dlers

A uprobe or uretprobe handler can call any of the func-

tions in the uprobes API. A handler can even unregister

its own probe. However, when invoked from a handler,

the actual [un]register operations do not take place im-

mediately. Rather, they are queued up and executed af-

ter all handlers for that probepoint have been run and

the probed instruction has been single-stepped. (Specif-

ically, queued [un]registrations are run right after the

previously described “alternate quiesce point.”) If the

registration_callback field is set in the uprobe

object to be acted on, uprobes calls that callback when

the [un]register operation completes.

An instrumentation module that employs such dynamic

[un]registrations needs to keep track of them: since a

module’s uprobe objects typically disappear along with

the module, the module’s cleanup function should not

exit while any such operations are outstanding.

8

Future Work

What’s next for utrace, uprobes, and kprobes? Here are

some possibilities:

• Utrace += SSOL. As discussed in Section 6.1,

single-stepping out of line is crucial for the sup-

port of probepoints in multithreaded processes.

This technology may be migrated from uprobes to

utrace, so that other utrace clients can exploit it.

One such client might be an enhanced ptrace.

• SystemTap += utrace + uprobes. Some System-

Tap users want support for probing in user space.

Some potential utrace and uprobes users might be

more enthusiastic given the safety and ease of use

provided by SystemTap.

• Registering probes from kprobe handlers. Utrace

and uprobe handlers can register and unregister

utrace, uprobes, and kprobes handlers. It would be

nice if kprobes handlers could do the same. Per-

haps the effect of sleep-tolerant kprobe handlers

could be approximated using a kernel thread that

runs deferred handlers. This possibility is under

investigation.

• Tracing Java. Uprobes takes us closer to dynamic

tracing of the Java Virtual Machine and Java appli-

cations.

224 • Ptrace, Utrace, Uprobes: Lightweight, Dynamic Tracing of User Apps

• Task-independent exec hook. Currently, uprobes

can trace an application if it’s already running, or if

it is known which process will spawn it. Allowing

tracing of applications that are yet be started and

are of unknown lineage will help to solve problems

that creep in during application startup.

9

Acknowledgements

The authors wish to thank Roland McGrath for com-

ing up with the excellent utrace infrastructure, and also

for providing valuable suggestions and feedback on up-

robes. Thanks are due to David Wilder and Srikar Dron-

amraju for helping out with testing the uprobes frame-

work. Thanks also to Dave Hansen and Andrew Morton

for their valuable suggestions on the VM-related por-

tions of the uprobes infrastructure.

10

Legal Statements

Copyright c IBM Corporation, 2007.

This work represents the view of the authors and does not

necessarily represent the view of IBM.

IBM, PowerPC, and OS/2 are trademarks or registered trade-

marks of International Business Machines Corporation in the

United States and/or other countries.

Linux is a registered trademark of Linus Torvalds.

Pentium is a registered trademark of Intel Corporation.

Other company, product, and service names may be trade-

marks or service marks of others.

References in this publication to IBM products or services

do not imply that IBM intends to make them available in all

countries in which IBM operates.

This document is provided “AS IS,” with no express or im-

plied warranties. Use the information in this document at

your own risk.

References

[1] Suparna Bhattacharya. Dynamic Probes -

Debugging by Stealth. In Proceedings of

Linux.Conf.Au, 2003.

[2] Mavinakayanahalli et al. Probing the Guts of

Kprobes. In Proceedings of the Ottawa Linux

Symposium, 2006.

[3] Roland McGrath. Utrace home page. http://

people.redhat.com/roland/utrace/.

[4] SystemTap project team. SystemTap.

http://sourceware.org/systemtap/.

Proceedings of the

Linux Symposium

Volume One

June 27th–30th, 2007

Ottawa, Ontario

Canada

Conference Organizers

Andrew J. Hutton, Steamballoon, Inc., Linux Symposium,

Thin Lines Mountaineering

C. Craig Ross, Linux Symposium

Review Committee

Andrew J. Hutton, Steamballoon, Inc., Linux Symposium,

Thin Lines Mountaineering

Dirk Hohndel, Intel

Martin Bligh, Google

Gerrit Huizenga, IBM

Dave Jones, Red Hat, Inc.

C. Craig Ross, Linux Symposium

Proceedings Formatting Team

John W. Lockhart, Red Hat, Inc.

Gurhan Ozen, Red Hat, Inc.

John Feeney, Red Hat, Inc.

Len DiMaggio, Red Hat, Inc.

John Poelstra, Red Hat, Inc.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值