What to do when sample buffer overflows are reported
OProfile is a system-wide profiler for Linux, capable of profiling all running code at low overhead.
As of OProfile release 0.9.6, when you run 'opcontrol -shutdown' after a profiling run, oprofile will check for lost samples and display a warning message if any samples have been dropped. The message recommends reviewing the oprofiled.log for more information on lost samples and to try to eliminate or minimize them.
But how does one do that? And how does one interpret the information about lost samples in the log?
Oprofiled.log layout
Let's start out with the interpretation of the log information. The oprofiled.log is located in:
/samples
where
is /var/lib/oprofile by default
At the end of a profiling run, a section titled "OProfile Statistics" is written to this log. The beginning (summary) part of this section includes several lines giving information about lost samples or events. For example:
Following that, is a "Statistics for cpu :
" section for each processor, with more information on lost samples. The oprofile kernel driver uses a two-level sample buffering scheme: one buffer per cpu, plus one main "event" buffer which periodically collects all the individual cpu buffer samples. The userspace oprofile daemon reads the samples from this main event buffer.
Getting back to the summary part, the "OProfile Statistics".
The "Nr. non-backtrace samples" line shows the number of samples actually received.
- Recently discovered (Dec 2011) Bug Alert: The "Nr. non-backtrace samples" statistic up through OProfile 0.9.7 is not accurate. This value is less than the actual number of samples received.
- Looking at the "lost samples' lines, you'll almost always see some samples lost due to "no mapping" or "no mm". This can happen when samples are recorded by the oprofile kernel module for an application, but then that app ends before those samples can be processed by the userspace oprofile daemon. There's not much the user can do about these. Proportionally, the number of such lost samples is usually very low, so these can generally be safely ignored.
- On the other hand, the user often can (and should) take action to eliminate or minimize the number of "events lost due to buffer overflows", especially if that number is high (say, anything over 1%) compared to the total "Nr. non-backtrace samples". The same is true for "Nr. samples lost cpu buffer overflow" in the per-cpu stats sections. OK, so what action(s) should be taken?
Reducing/Eliminating Events lost due to buffer overflow
Note: The "events lost due to buffer overflow" statistic is mis-named; it should be named "samples lost due to buffer overflow". The following text uses the phrasing that the tool provides, but keep this in mind.
Generally speaking, if you're seeing "events lost due to buffer overflow" in the summary part of the "OProfile Statistics" section of the oprofiled.log, the easiest thing to try is lowering your sampling interval (i.e., using a higher 'count' value for the event(s) you're profiling). But since OProfile is a statistical profiler, you need to keep in mind that the fewer the samples over a given period of time, the less accurate your profile data will be. It's difficult to give a rule of thumb on what constitutes a statistically valid profile. But as a rough example, consider that gprof typically collects 100 samples per second. This is fine for the low-hanging fruit, but may not be fine-grained enough for detailed performance investigations. So, for time-based or cycles-based profiles, you should try to do at least an order of magnitude better than gprof.
Here is an example of how to change the 'count' value for the event PM_CYC_GRP1, the count value is the number that follows the colon:
You can check the new count value by issuing:
oprofile provides the ability to tailor the sizes of the key working buffers allocated.
If, after lowering your sampling interval, you're still getting too many overflows, try adjusting the buffer size along with the buffer-watershed (which should be around 1/4 the value of buffer size). Once you have eliminated the main buffer overflows, then go after CPU buffer overflows (if any) by adjusting the cpu-buffer-size. The default size for the main event buffer is 131072, and the default size for the per-cpu buffers is 8192.
See the 'opcontrol' man page for details on setting these parameters. The man page uses "num samples" to indicate that the value you enter will initialize a buffer whose size is big enough to hold "x" samples. The samples are unsigned long's, so they would be 64-bits or 32-bits, depending on the bitness of the kernel (since these buffers are allocated by the kernel).
Here is a quick example:
Run opcontrol --status again to verify the "non-default" settings.
To restore to the defaults, simply set each of the buffer values to zero (0).
Testing this on a POWER7 system
We did some testing on a POWER7 @3Ghz, with 32 logical processors - CPUs (i.e., 8 cores, with the default of SMT=4). The table below shows the results. You can see that, with a sampling interval of one sample every 25,000 PM_RUN_CYC events, we found no buffer size settings that prevented lost samples. But by adjusting sampling interval and buffer parameters, we were able to collect profile data with very few (or no) lost samples.
Notes
- With a sampling interval of one sample for every 50,000 run cycles on a 3 GHz system, the total possible number of samples per second is 60,000 x num_cpus (in this case 32 CPUs), assuming all CPUs are 100% busy. Using that assumption, the nominal number of samples for this testing scenario is 1,920,000 per second. That's 3Ghz / 50,000 samples = 60,000 samples per sec.
- Without oprofile running, an example runtime for a single test application is 23 seconds; 53 seconds runtime with app running on all CPUs (we'll call this "nominal runtime"). Thus, 32 apps should register 1,920,000 x 53 = 101,760,000 total samples over the collective runtime. This is a theoretical value, because when oprofile is running, the profiling overhead causes an increased runtime for the apps.
- The data in the table show that, for this type of system, oprofile is indeed capable of processing 1,920,000 million samples per second with appropriate profiling parameters.
Sampling = 25,000 events
All profiling parameters in this case resulted in too many lost samples
Sampling interval PM_RUN_CYC | --buffer-size | --buffer-watershed | --cpu-buffer-size | Events lost from buffer overflow | Average events lost per-cpu buffer | oprofile overhead (ratio of nominal) | Summary |
---|
25,000 | 2M | .5M | 16K | 96 million | 3.5 million | 1.33 | Not good |
25,000 | 4M | 1M | 16K | 45 million | 3.5 million | 1.33 | Not good |
25,000 | 8M | 2M | 16K | 10 million | 3.5 million | 1.34 | Not good |
25,000 | 16M | 4M | 16K | 0 | 3.5 million | 1.33 | Not good |
Sampling = 50,000 events
The last two did the best here. The first lost nearly all the samples.
Sampling interval PM_RUN_CYC | --buffer-size | --buffer-watershed | --cpu-buffer-size | Events lost from buffer overflow | Average events lost per-cpu buffer | oprofile overhead (ratio of nominal) | Summary |
---|
50,000 | default | default | default | 81.5 million | a few on some several thousand on others | 1.21 | Not good |
50,000 | 1M | .25M | 16K | 800,000 | 4 | 1.22 | Not good |
50,000 | 2M | .5M | 16K | 0 | 3 | 1.21 | Good |
50,000 | 4M | 1M | 16K | 0 | 3 | 1.23 | Good |
In this example, the main buffer size didn't make a difference between 2MB and 4MB, so for this workload the 2MB example would have been fine. For other workloads, you may see more events lost.
Sampling = 100,000 events
This set of profiling parameters uses even less resources and should provide as statistically valid a profile as when using a sampling interval of 50,000. Note that the oprofile overhead is also substantially less.
Sampling interval PM_RUN_CYC | --buffer-size | --buffer-watershed | --cpu-buffer-size | Events lost from buffer overflow | Average events lost per-cpu buffer | oprofile overhead (ratio of nominal) | Summary |
---|
100,000 | 1M | .25M | 16K | 0 | 1.5 | 1.07 | Very good |