content
Decide a Sampling Frequency
Perf is a sampling based analyzer. So, the amount of sampled data must be enough to produce reliable result.
The higher sampling frequency you use, the shorter sampling time it takes, and vice versa.
The sampling frequency is suggested to be an odd number lower than 99.
Currently, we suggest sampling at 61 hertz for 180 seconds.
Note
Perf is doing best effort sampling. So total samples will be less than “sample frequency multiplies sampling time”.
Profiling the Whole System
Using perf to record system activity for 180 seconds at 61 hertz sampling rate, both user- and kernel-level stacks are samples.
Save the perf output to the RAM disk /tmp to reduce system overhead.
$ perf record -F 61 -a -g -o /tmp/whole_system.data -- sleep 180
$ perf script -i /tmp/whole_system.data > $NFS/whole_system.perf
Profiling One Process
Using perf to record process activity for 180 seconds at 61 hertz sampling rate, both user- and kernel-level stacks are samples.
Save the perf output to the RAM disk /tmp to reduce system overhead.
$ perf record -F 61 -p <PID> -g -o /tmp/process_name.data -- sleep 180
$ perf script -i /tmp/process_name.data > $NFS/process_name.perf
Using Perf for to collect data.
Sample on-CPU functions for the specified command, at 61 Hertz.
$ perf record -F 61 <command>
Sample on-CPU functions for the specified PID, at 61 Hertz, until Ctrl-C.
$ perf record -F 61 -p <PID>
Sample on-CPU functions for the specified PID, at 61 Hertz, for 180 seconds.
$ perf record -F 61 -p <PID> -- sleep 180
Sample CPU stack traces (via frame pointers) for the specified PID, at 61 Hertz, for 180 seconds.
$ perf record -F 61 -p PID -g -- sleep 180
Sample CPU stack traces for the entire system, at 61 Hertz, for 180 seconds (< Linux 4.11).
$ perf record -F 61 -ag -- sleep 180
Sample CPU stack traces for the entire system, at 61 Hertz, for 180 seconds (>= Linux 4.11).
$ perf record -F 61 -g -- sleep 180
Using Perf for to generate report
Show perf.data in an Text User Interface.
$ perf report
Show perf.data with a column for sample count.
$ perf report -n
Show perf.data as a text report, with data coalesced and percentages.
Copy the perf.data from step one to pc and run this command Since busybox curses support is limited.
$ perf report --stdio
List all events from perf.data.
$ perf script > out.perf
List all perf.data events, with customized fields (< Linux 4.1).
$ perf script -f time,event,trace
Dump raw contents from perf.data as hex (for debugging).
$ perf script -D
Disassemble and annotate instructions with percentages (needs some debuginfo).
$ perf annotate --stdio
Compare Performance Between Two Report
First generate two perf files such as whole_system_0815.perf
and whole_system_0816.perf.
Compare two perf report using PC tools such as beyound compare.
Generating Flamegraph
This section run flamegraph to generate flamegraph in SVG on PC
(Please run XXX.pl on PC not DUT)
Copy perf output to the host system and generate a flamegraph.
Same procedure if you profile just one process.
$ cd $FLAME_GRAPH_DIR
$ ./stackcollapse-perf.pl whole_system.perf > whole_system.folded
$ ./flamegraph.pl whole_system.folded > whole_system.svg
There is a batch script to generate graphs.
$ python ./perfToFlame.py --prog $FLAME_GRAPH_DIR --input $INPUT_FOLDER --out $OUTPUT_FOLDER
Note
Flamegraph can be downloaded from
https://github.com/brendangregg/FlameGraph. Script can be downloaded from perfToFlame.py.
Analyzing FlamegraphEdit
Confidence of Collected Data
Because Perf is doing best effort sampling, sometimes the total collected sample is not enough.
Please extend the sampling time to get more samples, otherwise the result is useless.
Based on our experience, total samples should be greater than X samples and the formula is listed below.
- For whole systyem profiling, X >= max( 0.7 x sampling freq x sampling time, 1500 ).
- For single process profiling, X >= max( 0.7 x sampling freq x sampling time x CPU usage of the process, 1500 ).
Viewing the Flamegraph
In a flamegraph, each block represents a function.
For each block, there could be another block under it, which means the lower function calls the upper function.
The bottom-most block is the main function or the entire system.
The top-most blocks are the functions actually running when perf take a sample.
The wider a block is, the more CPU time it and its child functions use.
So, look for top blocks that are “flat” and there could be CPU hungry logic in it.
Height is not related to CPU time but means the depth of calling stack.
Placing the mouse cursor over a block can get information about the block, including:
- Function name
- Number of samples
- Percentage of samples comparing to the bottom-most block (i.e. whole system or the process)
Clicking a block brings you a zoom-in view of that block.
To zoom out, click the reset zoom label on top-left corner of the graph.
The following is a flamegraph of whole system running av_main.
The interpretation of this graph would be:
Note
Thread name is not available in whole system profiling.
Do not use Internet Explorer to view flamegraph, some SVG features are not supported.
Other Perf Capabilities
Other possible combinations that can be explored with perf.
The below features may require additional Kernel Config flags and compilation flags to work.
Command | Description |
---|---|
annotate | Read perf.data (created by perf record) and display annotated code. |
archive | Create archive with object files with build-ids found in perf.data file. |
bench | General framework for benchmark suites. |
buildid-cache | Manage build-id cache. |
buildid-list | List the buildids in a perf.data file. |
diff | Read perf.data files and display the differential profile. |
evlist | List the event names in a perf.data file. |
inject | Filter to augment the events stream with additional information. |
kmem | Tool to trace/measure kernel memory(slab) properties. |
kvm | Tool to trace/measure kvm guest os. |
list | List all symbolic event types. |
lock | Analyze lock events. |
mem | Profile memory accesses. |
record | Run a command and record its profile into perf.data. |
report | Read perf.data (created by perf record) and display the profile. |
sched | Tool to trace/measure scheduler properties (latencies). |
script | Read perf.data (created by perf record) and display trace output. |
stat | Run a command and gather performance counter statistics. |
test | Runs sanity tests. |
timechart | Tool to visualize total system behavior during a workload. |
top | System profiling tool. |
trace | Strace inspired tool. |
Reference
official linux perf
What really is CPU Utilization?
application performance profiling
perf example
Linux Profiling at Netflix
Blazing Performance with Flame Graphs
How to read flamechart