There are several facilities to see where the kernel spendsits resources. A simple one is the profiling function, thatstores the current EIP (instruction pointer) at each clock tick.
Boot the kernel with command line option profile=2
(or some other number instead of 2). This will causea file/proc/profile
to be created.The number given after profile=
is the number of positionsEIP is shifted right when profiling. So a large number gives acoarse profile.The counters are reset by writing to/proc/profile
.The utility readprofile
will output statistics for you.It does not sort - you have to invokesort
explicitly.But given a memory map it will translate addresses to kernel symbols.
See kernel/profile.c
and fs/proc/proc_misc.c
and readprofile(1)
.
For example:
# echo > /proc/profile ... # readprofile -m System.map-2.5.59 | sort -nr | head -2 510502 total 0.1534 508548 default_idle 10594.7500
The first column gives the number of timer ticks.The last column gives the number of ticks divided by thesize of the function.
The command readprofile -r
is equivalent toecho > /proc/profile
.
2.10.1 Oprofile
A more advanced mechanism is given by oprofile.
Prepare kernel
Build a kernel (2.5.43 or later) with CONFIG_PROFILING=yand CONFIG_OPROFILE=y. Now the kernel knows about theoprofilefs
virtual filesystem. The utilitiesmentioned below will mount it on/dev/oprofile
.It is a good idea to add idle=poll
to the kernelcommand line; this will make sure time spent in the idle threadis properly accounted for.
Install oprofile
Get the oprofile utility fromhttp://oprofile.sourceforge.net/.Configure with./configure --with-kernel-support
,then make
and (as root)make install
.This yields binaries and a man page:
% ls /usr/local/bin op_dump op_merge op_start op_time opcontrol oprofiled op_help op_session op_stop op_to_source oprof_start oprofpp % ls /usr/local/share/oprofile stl.pat % ls /usr/local/share/doc/oprofile oprofile.html % ls /usr/local/man/man1 op_help.1 op_time.1 opcontrol.1 oprofiled.1 op_merge.1 op_to_source.1 oprofile.1 oprofpp.1 %All man pages are links to the
oprofile
man page.Thus, there are two sources of information:
% man oprofile % mozilla /usr/local/share/doc/oprofile/oprofile.html
Setup oprofile
Make sure you are root and your PATH contains /usr/local/bin
.The first action required is invokingopcontrol --setup ...
.This will create a setup file /root/.oprofile/daemonrc
.You need the big kernel toplevelvmlinux
file (not bzImage
,and not the smaller arch/i386/boot/compressed/vmlinux
).
# opcontrol --setup --vmlinux=/foo/vmlinux # cat /root/.oprofile/daemonrc IGNORE_MYSELF=0 SEPARATE_LIB_SAMPLES=0 SEPARATE_KERNEL_SAMPLES=0 VMLINUX=/foo/vmlinux BUF_SIZE=0 one_enabled=1 #The precise setup call needed depends on your hardware.The above example is for an old Pentium without hardware performancecounters. For a P3, use
opcontrol --setup --vmlinux=/foo/vmlinux --ctr0-event=CPU_CLK_UNHALTED --ctr0-count=100000For a P4, use
opcontrol --setup --vmlinux=/foo/vmlinux --ctr0-event=GLOBAL_POWER_EVENTS --ctr0-unit-mask=1 --ctr0-count=100000For an Athlon or x86-64, use
opcontrol --setup --vmlinux=/foo/vmlinux --ctr0-event=RETIRED_INSNS --ctr0-count=100000There are many other possible setup options. The command
op_help
will list them once you have done the setup :-).(See also the sourceforge site for
Intel P6/PII/PIII,
Intel P4 and
AMD events.)
The general idea is that one specifies a certain type of event(such as CPU_CLK_UNHALTED, a CPU clock cycle), and a count(like the 100000 above). Now once every count occurrencesof this event the value of EIP is recorded.Pick the value of count suitably: with a 400 MHz machine a countof 100000 for CPU_CLK_UNHALTED means 4000 interrupts/second.Too small a count and the machine dies an interrupt death.Too large a count and the profile will be very coarse.
Use of oprofile
First start the oprofile daemon.
# opcontrol --start-daemon Using log file /var/lib/oprofile/oprofiled.log Daemon started. # ps ax | grep oprofile ... /usr/local/bin/oprofiled ...
Next clear out old profiling data.
# opcontrol --reset
Next start measuring, do whatever should be measured,and stop measuring.
# opcontrol --start Profiler running. # do_something # opcontrol --stop Stopping profiling. #
One now has profiling data below /var/lib/oprofile/
.See below for what to do with the data.
When no more profiling is needed, kill the daemon:
# opcontrol --shutdown Stopping profiling. Killing daemon. #
To clean up all data generated by oprofile (after generating anydesired output):
# rm -r /var/lib/oprofile
A partial cleanup is done by the opcontrol --reset
mentioned above.
Output
To get a printout of the data for /bin/foo
, use, e.g.oprofpp -l -i /bin/foo
. The output will be boring,unless/bin/foo
was not stripped after compilation,so that it contains symbol information.
The programs and libraries that were invoked (and hence are suitablearguments foroprofpp -l -i ...
) and the numbers of ticks spentin each are given by the commandop_time
.With the -l
option the output will be split according to symbol.
# op_time | tail -3 3134 9.0531 0.0000 /lib/i686/libc-2.2.4.so 4813 13.9032 0.0000 /usr/bin/find 26212 75.7178 0.0000 /foo/vmlinux # op_time -l | tail -6 c012c7c0 703 2.04509 kmem_cache_alloc /foo/vmlinux c02146c0 786 2.28655 __copy_to_user_ll /foo/vmlinux c0169b70 809 2.35345 ext3_readdir /foo/vmlinux c01476f0 854 2.48436 link_path_walk /foo/vmlinux c016fcd0 1446 4.20655 ext3_find_entry /foo/vmlinux 00000000 4591 13.3556 (no symbol) /usr/bin/find #
To get statistics for the kernel only, use the vmlinux
name specified.
# oprofpp -l -i /foo/vmlinux | tail c012ca30 488 1.86174 kmem_cache_free c010e280 496 1.89226 mask_and_ack_8259A c010a61a 506 1.93041 restore_all c0119220 603 2.30047 do_softirq c0110b30 663 2.52938 delay_tsc c012c7c0 703 2.68198 kmem_cache_alloc c02146c0 786 2.99863 __copy_to_user_ll c0169b70 809 3.08637 ext3_readdir c01476f0 854 3.25805 link_path_walk c016fcd0 1446 5.51656 ext3_find_entry #
One can get disassembly or annotated source with indicationon where the counts occurred.
# op_to_source -a -i /foo/vmlinux ... /* 424 1.618% */ c0169ac0 <ext3_check_dir_entry>: /* 6 0.02289% */ c0169ac0: push %edi /* 56 0.2136% */ c0169ac1: push %esi c0169ac2: push %ebx c0169ac3: mov 0x18(%esp,1),%esi /* 43 0.164% */ c0169ac7: xor %ebx,%ebx c0169ac9: mov 0x14(%esp,1),%edi c0169acd: movzwl 0x4(%esi),%ecx /* 23 0.08775% */ c0169ad1: cmp $0xb,%ecx c0169ad4: jg c0169ae0 <ext3_check_dir_entry+0x20> c0169ad6: mov $0xc032b160,%ebx ...
# op_to_source --source-dir=. --output-dir=/tmp -i /path/binary # diff -u ./source.c /tmp ... int get_line(register FILE *f, int *length) +/* get_line 24 54.55% */ ... + /* 5 11.36% */ c = Getc (f); ...
This profile with annotated source is available only for binariescompiled with-g2
(and not stripped) so that they stillcontain source line numbers.