Analyzing CPU Usage with Perf (2) - using perf and read flamgraph

Decide a Sampling Frequency

Perf is a sampling based analyzer. So, the amount of sampled data must be enough to produce reliable result.
The higher sampling frequency you use, the shorter sampling time it takes, and vice versa.

The sampling frequency is suggested to be an odd number lower than 99.
Currently, we suggest sampling at 61 hertz for 180 seconds.


Perf is doing best effort sampling. So total samples will be less than “sample frequency multiplies sampling time”.

Profiling the Whole System

Using perf to record system activity for 180 seconds at 61 hertz sampling rate, both user- and kernel-level stacks are samples.
Save the perf output to the RAM disk /tmp to reduce system overhead.

$ perf record -F 61 -a -g -o /tmp/ -- sleep 180
$ perf script -i /tmp/ > $NFS/whole_system.perf

Profiling One Process

Using perf to record process activity for 180 seconds at 61 hertz sampling rate, both user- and kernel-level stacks are samples.
Save the perf output to the RAM disk /tmp to reduce system overhead.

$ perf record -F 61 -p <PID> -g -o /tmp/ -- sleep 180
$ perf script -i /tmp/ > $NFS/process_name.perf

Using Perf for to collect data.

Sample on-CPU functions for the specified command, at 61 Hertz.
$ perf record -F 61 <command>

Sample on-CPU functions for the specified PID, at 61 Hertz, until Ctrl-C.
$ perf record -F 61 -p <PID>

Sample on-CPU functions for the specified PID, at 61 Hertz, for 180 seconds.
$ perf record -F 61 -p <PID> -- sleep 180

Sample CPU stack traces (via frame pointers) for the specified PID, at 61 Hertz, for 180 seconds.
$ perf record -F 61 -p PID -g -- sleep 180

Sample CPU stack traces for the entire system, at 61 Hertz, for 180 seconds (< Linux 4.11).
$ perf record -F 61 -ag -- sleep 180

Sample CPU stack traces for the entire system, at 61 Hertz, for 180 seconds (>= Linux 4.11).
$ perf record -F 61 -g -- sleep 180

Using Perf for to generate report

Show in an Text User Interface.
$ perf report

Show with a column for sample count.
$ perf report -n

Show as a text report, with data coalesced and percentages.
Copy the from step one to pc and run this command Since busybox curses support is limited.
$ perf report --stdio

List all events from
$ perf script > out.perf 

List all events, with customized fields (< Linux 4.1).
$ perf script -f time,event,trace

Dump raw contents from as hex (for debugging).
$ perf script -D

Disassemble and annotate instructions with percentages (needs some debuginfo).
$ perf annotate --stdio

Compare Performance Between Two Report

First generate two perf files such as whole_system_0815.perf and whole_system_0816.perf. Compare two perf report using PC tools such as beyound compare.

Generating Flamegraph

This section run flamegraph to generate flamegraph in SVG on PC
(Please run on PC not DUT)

Copy perf output to the host system and generate a flamegraph.
Same procedure if you profile just one process.

$ ./ whole_system.perf > whole_system.folded
$ ./ whole_system.folded > whole_system.svg

There is a batch script to generate graphs.

$ python ./ --prog $FLAME_GRAPH_DIR --input $INPUT_FOLDER --out $OUTPUT_FOLDER


Flamegraph can be downloaded from Script can be downloaded from

Analyzing FlamegraphEdit

Confidence of Collected Data

Because Perf is doing best effort sampling, sometimes the total collected sample is not enough.
Please extend the sampling time to get more samples, otherwise the result is useless.

Based on our experience, total samples should be greater than X samples and the formula is listed below.

  • For whole systyem profiling, X >= max( 0.7 x sampling freq x sampling time, 1500 ).
  • For single process profiling, X >= max( 0.7 x sampling freq x sampling time x CPU usage of the process, 1500 ).

Viewing the Flamegraph

In a flamegraph, each block represents a function.
For each block, there could be another block under it, which means the lower function calls the upper function.

The bottom-most block is the main function or the entire system.
The top-most blocks are the functions actually running when perf take a sample.

The wider a block is, the more CPU time it and its child functions use.
So, look for top blocks that are “flat” and there could be CPU hungry logic in it.
Height is not related to CPU time but means the depth of calling stack.

Placing the mouse cursor over a block can get information about the block, including:

  • Function name
  • Number of samples
  • Percentage of samples comparing to the bottom-most block (i.e. whole system or the process)

Clicking a block brings you a zoom-in view of that block.
To zoom out, click the reset zoom label on top-left corner of the graph.

The following is a flamegraph of whole system running av_main.
The interpretation of this graph would be:



Thread name is not available in whole system profiling.
Do not use Internet Explorer to view flamegraph, some SVG features are not supported.

Other Perf Capabilities

Other possible combinations that can be explored with perf.
The below features may require additional Kernel Config flags and compilation flags to work.

annotateRead (created by perf record) and display annotated code.
archiveCreate archive with object files with build-ids found in file.
benchGeneral framework for benchmark suites.
buildid-cacheManage build-id cache.
buildid-listList the buildids in a file.
diffRead files and display the differential profile.
evlistList the event names in a file.
injectFilter to augment the events stream with additional information.
kmemTool to trace/measure kernel memory(slab) properties.
kvmTool to trace/measure kvm guest os.
listList all symbolic event types.
lockAnalyze lock events.
memProfile memory accesses.
recordRun a command and record its profile into
reportRead (created by perf record) and display the profile.
schedTool to trace/measure scheduler properties (latencies).
scriptRead (created by perf record) and display trace output.
statRun a command and gather performance counter statistics.
testRuns sanity tests.
timechartTool to visualize total system behavior during a workload.
topSystem profiling tool.
traceStrace inspired tool.


official linux perf
What really is CPU Utilization?
application performance profiling
perf example
Linux Profiling at Netflix
Blazing Performance with Flame Graphs
How to read flamechart





当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


