aix io pacing oracle,AIX disk I/O pacing vs PowerHA

Recently i’ve played with disk I/O pacing (technically minpout and maxpout parameters) on AIX 6.1. The root cause of this investigation is based on some high I/O waits in production. The default AIX 6.1 and 7.1 systems are coming with maxpout set to 8193 and minpout respecitvletly to 4096. However old deployments and HACMP <= 5.4.1 installations procedures are still recommending setting value of 33/20, just take a look on some old High Availability Cluster Multi-Processing Troubleshooting guide:

“Although the most efficient high- and low-water marks vary from system to system, an initial high-water mark of 33 and a low-water mark of 24 provides a good starting point. These settings only slightly reduce write times and consistently generate correct fallover behavior from the HACMP software.”

Additionally IBM described the whole disk I/O pacing mechanics in this link

100% write results with maxpout=8193 minpout=4096 (defaults)

orion_20110706_0525_summary.txt:Maximum Large MBPS=213.55 @ Small=0 and Large=6

orion_20110706_0525_summary.txt:Maximum Small IOPS=32323 @ Small=2 and Large=0

100% write results with maxpout=33 minpout=24 (old HACMP)

orion_20110706_0552_summary.txt:Maximum Large MBPS=109.64 @ Small=0 and Large=10

orion_20110706_0552_summary.txt:Maximum Small IOPS=31203 @ Small=5 and Large=0

As you can see there is drastic change when it comes to large sequential writes, but don’t change that parameters yet! You really need to read the following sentence from the IBM site “The maxpout parameter specifies the number of pages that can be scheduled in the I/O state to a file before the threads are suspended. “. It is obvious that those parameters are set for the reason of avoiding I/O resource hungry processes dominating the CPU. If you take that the smallest page size is 4kB with 64kB automatically used sometimes by Oracle you might get 33*64kB = 2112kB value in best case, after which thread doing the I/O is suspended ! Ouch! So what is the process of taking the CPU from active process? It is called involuntary context switch…

Why it is all important with HACMP/PowerHA software? Because it is implemented as normal shell scripts and some RSCT processes which are mostly not running in kernel mode at all (at least everything below PowerHA 7.1)! The only way PowerHA/HACMP can give more CPU horse power to the processes needing is to give them higher priority on the run queue, just take a look on the PRI column..

01.root@X: /home/root :# ps -efo "sched,nice,pid,pri,comm"|sort -n +3|head

02.SCH NI     PID PRI COMMAND

03.- --  520286  31 hatsd

04.- --  729314  38 hats_diskhb_nim

05.- --  938220  38 hats_nim

06.- --  622658  39 hagsd

07.- 20  319502  39 clstrmgr

08.- 20  348330  39 rmcd

09.- 20  458874  39 gsclvmd

10.- 20  508082  39 gsclvmd

11.- 20  528554  39 gsclvmd

12.root@X: /home/root :#

.. but it appears that even that could not be not enough in some situations. As you can see there is no easy answer to that question and you would have to drive your exact system configuration to 99/100% of CPU utilization on all POWER cores to see if you are not getting random failovers, because e.g. some HACMP process would not get heartbeat too late due to the CPU scheduling. So perhaps the way forward for 33/20 environments here is to leave them somewhere between 33/20 and 8193/4096 (like the mentioned 513/256 in the IBM manuals). Another thing is that you should never have a system driven to > 80% CPU usage on all physical cores, but you should now that already:)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值