Handling oprofile sample buffer overflows

 


What to do when sample buffer overflows are reported

OProfile is a system-wide profiler for Linux, capable of profiling all running code at low overhead.

As of OProfile release 0.9.6, when you run 'opcontrol -shutdown' after a profiling run, oprofile will check for lost samples and display a warning message if any samples have been dropped. The message recommends reviewing the oprofiled.log for more information on lost samples and to try to eliminate or minimize them.

But how does one do that? And how does one interpret the information about lost samples in the log?

 


Oprofiled.log layout

Let's start out with the interpretation of the log information. The oprofiled.log is located in:

/samples

where is /var/lib/oprofile by default

# cat /var/lib/oprofile/samples/oprofiled.log

At the end of a profiling run, a section titled "OProfile Statistics" is written to this log. The beginning (summary) part of this section includes several lines giving information about lost samples or events. For example:

oprofiled started Tue Jun 22 17:00:46 2010
kernel pointer size: 8

Tue Jun 22 17:01:21 2010

-- OProfile Statistics --
Nr. sample dumps: 4
Nr. non-backtrace samples: 4279
Nr. kernel samples: 3834
Nr. lost samples (no kernel/user): 0
Nr. lost kernel samples: 0
Nr. incomplete code structs: 0
Nr. samples lost due to sample file open failure: 0
Nr. samples lost due to no permanent mapping: 1
Nr. event lost due to buffer overflow: 0
Nr. samples lost due to no mapping: 11
Nr. backtraces skipped due to no file mapping: 0
Nr. samples lost due to no mm: 42

Following that, is a "Statistics for cpu : " section for each processor, with more information on lost samples. The oprofile kernel driver uses a two-level sample buffering scheme: one buffer per cpu, plus one main "event" buffer which periodically collects all the individual cpu buffer samples. The userspace oprofile daemon reads the samples from this main event buffer.

            
            
             
             
---- Statistics for cpu : 2
Nr. samples lost cpu buffer overflow: 0
Nr. samples received: 0
Nr. backtrace aborted: 0
Nr. samples lost invalid pc: 0

---- Statistics for cpu : 1
Nr. samples lost cpu buffer overflow: 0
Nr. samples received: 0
Nr. backtrace aborted: 0
Nr. samples lost invalid pc: 0

---- Statistics for cpu : 0
Nr. samples lost cpu buffer overflow: 0
Nr. samples received: 67
Nr. backtrace aborted: 0
Nr. samples lost invalid pc: 0
oprofiled stopped Tue Jun 22 17:01:21 2010
            
            

Getting back to the summary part, the "OProfile Statistics".

The "Nr. non-backtrace samples" line shows the number of samples actually received.

  • Recently discovered (Dec 2011) Bug Alert: The "Nr. non-backtrace samples" statistic up through OProfile 0.9.7 is not accurate. This value is less than the actual number of samples received.
  • Looking at the "lost samples' lines, you'll almost always see some samples lost due to "no mapping" or "no mm". This can happen when samples are recorded by the oprofile kernel module for an application, but then that app ends before those samples can be processed by the userspace oprofile daemon. There's not much the user can do about these. Proportionally, the number of such lost samples is usually very low, so these can generally be safely ignored.
  • On the other hand, the user often can (and should) take action to eliminate or minimize the number of "events lost due to buffer overflows", especially if that number is high (say, anything over 1%) compared to the total "Nr. non-backtrace samples". The same is true for "Nr. samples lost cpu buffer overflow" in the per-cpu stats sections. OK, so what action(s) should be taken?

 


Reducing/Eliminating Events lost due to buffer overflow

Note: The "events lost due to buffer overflow" statistic is mis-named; it should be named "samples lost due to buffer overflow". The following text uses the phrasing that the tool provides, but keep this in mind.

Generally speaking, if you're seeing "events lost due to buffer overflow" in the summary part of the "OProfile Statistics" section of the oprofiled.log, the easiest thing to try is lowering your sampling interval (i.e., using a higher 'count' value for the event(s) you're profiling). But since OProfile is a statistical profiler, you need to keep in mind that the fewer the samples over a given period of time, the less accurate your profile data will be. It's difficult to give a rule of thumb on what constitutes a statistically valid profile. But as a rough example, consider that gprof typically collects 100 samples per second. This is fine for the low-hanging fruit, but may not be fine-grained enough for detailed performance investigations. So, for time-based or cycles-based profiles, you should try to do at least an order of magnitude better than gprof.

Here is an example of how to change the 'count' value for the event PM_CYC_GRP1, the count value is the number that follows the colon:

$ opcontrol --event PM_RUN_CYC_GRP1:25000

You can check the new count value by issuing:

$ opcontrol --status
Daemon not running
Event 0: PM_RUN_CYC_GRP1:25000:0:1:1
Separate options: library
vmlinux file: /boot/vmlinux-2.6.32.12-0.7-ppc64
Image filter: none
Call-graph depth: 0

oprofile provides the ability to tailor the sizes of the key working buffers allocated.

$ opcontrol 2>&1 | grep buffer
   --buffer-size=num             kernel buffer size in sample units
   --buffer-watershed            kernel buffer watershed in sample units (2.6 only=
   --cpu-buffer-size=num         per-cpu buffer size in units (2.6 only)
   --note-table-size             kernel notes buffer size in notes units (2.4 only)

If, after lowering your sampling interval, you're still getting too many overflows, try adjusting the buffer size along with the buffer-watershed (which should be around 1/4 the value of buffer size). Once you have eliminated the main buffer overflows, then go after CPU buffer overflows (if any) by adjusting the cpu-buffer-size. The default size for the main event buffer is 131072, and the default size for the per-cpu buffers is 8192.

See the 'opcontrol' man page for details on setting these parameters. The man page uses "num samples" to indicate that the value you enter will initialize a buffer whose size is big enough to hold "x" samples. The samples are unsigned long's, so they would be 64-bits or 32-bits, depending on the bitness of the kernel (since these buffers are allocated by the kernel).

Here is a quick example:

# Set the event (main) buffer size
$ opcontrol --buffer-size=1000000

# Set the buffer watershed
$ opcontrol --buffer-watershed=256000

# Set the cpu buffer size
$ opcontrol --cpu-buffer-size=16000

Run opcontrol --status again to verify the "non-default" settings.

$ opcontrol --status
Daemon not running
Event 0: PM_RUN_CYC_GRP1:1000000:0:1:1
Separate options: library
vmlinux file: /boot/vmlinux-2.6.32.12-0.7-ppc64
Image filter: none
Call-graph depth: 0
Buffer size: 1000000
CPU buffer watershed: 250000
CPU buffer size: 16000

To restore to the defaults, simply set each of the buffer values to zero (0).

$ opcontrol --buffer-size=0
$ opcontrol --buffer-watershed=0
$ opcontrol --cpu-buffer-size=0

 


Testing this on a POWER7 system

We did some testing on a POWER7 @3Ghz, with 32 logical processors - CPUs (i.e., 8 cores, with the default of SMT=4). The table below shows the results. You can see that, with a sampling interval of one sample every 25,000 PM_RUN_CYC events, we found no buffer size settings that prevented lost samples. But by adjusting sampling interval and buffer parameters, we were able to collect profile data with very few (or no) lost samples.

Notes

  • With a sampling interval of one sample for every 50,000 run cycles on a 3 GHz system, the total possible number of samples per second is 60,000 x num_cpus (in this case 32 CPUs), assuming all CPUs are 100% busy. Using that assumption, the nominal number of samples for this testing scenario is 1,920,000 per second. That's 3Ghz / 50,000 samples = 60,000 samples per sec.
  • Without oprofile running, an example runtime for a single test application is 23 seconds; 53 seconds runtime with app running on all CPUs (we'll call this "nominal runtime"). Thus, 32 apps should register 1,920,000 x 53 = 101,760,000 total samples over the collective runtime. This is a theoretical value, because when oprofile is running, the profiling overhead causes an increased runtime for the apps.
  • The data in the table show that, for this type of system, oprofile is indeed capable of processing 1,920,000 million samples per second with appropriate profiling parameters.

 


Sampling = 25,000 events

All profiling parameters in this case resulted in too many lost samples

Sampling interval 

PM_RUN_CYC
--buffer-size --buffer-watershed --cpu-buffer-size Events lost from 

buffer overflow
Average events lost

per-cpu buffer
oprofile overhead

(ratio of nominal)
Summary
25,000 2M .5M 16K 96 million 3.5 million 1.33 Not good
25,000 4M 1M 16K 45 million 3.5 million 1.33 Not good
25,000 8M 2M 16K 10 million 3.5 million 1.34 Not good
25,000 16M 4M 16K 0 3.5 million 1.33 Not good

 


Sampling = 50,000 events

The last two did the best here. The first lost nearly all the samples.

Sampling interval 

PM_RUN_CYC
--buffer-size --buffer-watershed --cpu-buffer-size Events lost from 

buffer overflow
Average events lost

per-cpu buffer
oprofile overhead

(ratio of nominal)
Summary
50,000 default default default 81.5 million a few on some

several thousand 

on others
1.21 Not good
50,000 1M .25M 16K 800,000 4 1.22 Not good
50,000 2M .5M 16K 0 3 1.21 Good
50,000 4M 1M 16K 0 3 1.23 Good

In this example, the main buffer size didn't make a difference between 2MB and 4MB, so for this workload the 2MB example would have been fine. For other workloads, you may see more events lost.

 


Sampling = 100,000 events

This set of profiling parameters uses even less resources and should provide as statistically valid a profile as when using a sampling interval of 50,000. Note that the oprofile overhead is also substantially less.

Sampling interval 

PM_RUN_CYC
--buffer-size --buffer-watershed --cpu-buffer-size Events lost from 

buffer overflow
Average events lost

per-cpu buffer
oprofile overhead

(ratio of nominal)
Summary
100,000 1M .25M 16K 0 1.5 1.07 Very good
<script>window._bd_share_config={"common":{"bdSnsKey":{},"bdText":"","bdMini":"2","bdMiniList":false,"bdPic":"","bdStyle":"0","bdSize":"16"},"share":{}};with(document)0[(getElementsByTagName('head')[0]||body).appendChild(createElement('script')).src='http://bdimg.share.baidu.com/static/api/js/share.js?v=89860593.js?cdnversion='+~(-new Date()/36e5)];</script>
阅读(1116) | 评论(0) | 转发(0) |
0

上一篇:安装使用oprofile

下一篇:RC4加密解密

给主人留下些什么吧!~~
评论热议
Python网络爬虫与推荐算法新闻推荐平台:网络爬虫:通过Python实现新浪新闻的爬取,可爬取新闻页面上的标题、文本、图片、视频链接(保留排版) 推荐算法:权重衰减+标签推荐+区域推荐+热点推荐.zip项目工程资源经过严格测试可直接运行成功且功能正常的情况才上传,可轻松复刻,拿到资料包后可轻松复现出一样的项目,本人系统开发经验充足(全领域),有任何使用问题欢迎随时与我联系,我会及时为您解惑,提供帮助。 【资源内容】:包含完整源码+工程文件+说明(如有)等。答辩评审平均分达到96分,放心下载使用!可轻松复现,设计报告也可借鉴此项目,该资源内项目代码都经过测试运行成功,功能ok的情况下才上传的。 【提供帮助】:有任何使用问题欢迎随时与我联系,我会及时解答解惑,提供帮助 【附带帮助】:若还需要相关开发工具、学习资料等,我会提供帮助,提供资料,鼓励学习进步 【项目价值】:可用在相关项目设计中,皆可应用在项目、毕业设计、课程设计、期末/期中/大作业、工程实训、大创等学科竞赛比赛、初期项目立项、学习/练手等方面,可借鉴此优质项目实现复刻,设计报告也可借鉴此项目,也可基于此项目来扩展开发出更多功能 下载后请首先打开README文件(如有),项目工程可直接复现复刻,如果基础还行,也可在此程序基础上进行修改,以实现其它功能。供开源学习/技术交流/学习参考,勿用于商业用途。质量优质,放心下载使用。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

kelsel

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值