基于AIX的Oracle RAC (11.2)操作系统安装建议

很久不装小鸡不搞AIX了,最近因为工作需要又整了一份,分享给大家参考,希望还有人用得到;-)

参考文档

RAC and Oracle Clusterware Best Practices and Starter Kit (AIX) (Doc ID 811293.1)
RAC and Oracle Clusterware Best Practices and Starter Kit (Platform Independent) (Doc ID 810394.1)

OS Configuration Considerations: https://www.oracle.com/technetwork/database/clusterware/overview/rac-aix-system-stability-131022.pdf
IBM Technical Technical Brief: 《Managing the Stability and Performance of current Oracle Database versions running AIX on Power Systems including POWER9》

1. AIX版本

1.1. AIX版本

AIX7.1 TL05及以上版本 (AIX7.1 TL05 recommended minimum AIX versions)

1.2. AIX必须的包

AIX 7.1 required packages: //(Doc ID 169706.1)
bos.adt.base
bos.adt.lib
bos.adt.libm
bos.perf.libperfstat
bos.perf.perfstat
bos.perf.proctools
xlC.rte.11.1.0.2 or later
gpfs.base 3.3.0.11 or later (Only for RAC systems that will use GPFS cluster filesystems)

2. AIX内存相关内核参数

(AIX 7.x版本默认值就是建议值)

参数值 建议值
minperm% 3
maxperm% 90
maxclient% 90
strict_maxclient 1
strict_maxperm 0
lru_file_repage 0 ##–> AIX7.x已经废弃该参数
lru_poll_interval 10
minfree 960
maxfree 1088
page_steal_method 1
memory_affinity 1
v_pinshm 0
lgpg_regions 0
lgpg_size 0
maxpin% 90
esid_allocator 1
vmm_klock_mode 2

## 检查方法:
% vmo -Fa|egrep "minperm%|maxperm%|maxclient%|strict_maxclient|strict_maxperm|lru_file_repage|lru_poll_interval|minfree |maxfree |page_steal_method| memory_affinity|v_pinshm|lgpg_regions|lgpg_size|maxpin%|esid_allocator|vmm_klock_mode"

## 设置方法: 
% vmo -p -o Tunable=Newvalue
% vmo -r -o Tunable=Newvalue (need to reboot to take into effect)

3. 环境变量

3.1. Addressing 64KB pages

Page sizes set at 64 KB and 16 MB have been shown to benefit Oracle performance by reducing kernel look aside processing to resolve virtual to physical addresses. Oracle 11.2, 12c, 18c and 19c use 64 KB pages for SGA by default. The general recommendation for most Oracle databases on AIX is to utilize 64KB page size and not 16MB page size for the SGA.

Use of AIX LDR_CNTRL ENVIRONMENTAL Settings with Oracle (Doc ID 2066837.1)

设置方法:
--> Oracle 11.2.0.4 (Single Instance)
Set for EACH database (oracle) user as well as for the Database TNS listener user:
  $ export LDR_CNTRL=DATAPSIZE=64K@TEXTPSIZE=64K@STACKPSIZE=64K oracle
  $ export VMM_CNTRL=vmm_fork_policy=COR  --> Listener user

--> Oracle 11g (RAC)
As an Oracle user:
  $ srvctl setenv database -d <DB_NAME> -t "LDR_CNTRL=TEXTPSIZE=64K@DATAPSIZE=64K@STACKPSIZE=64K"
As an Oracle Grid user:
  $ srvctl setenv listener -l LISTENER -t "LDR_CNTRL=TEXTPSIZE=64K@DATAPSIZE=64K@STACKPSIZE=64K"
  
检查方法:
$ crsctl stat res ora.LISTENER.lsnr -p |grep USR_ORA_ENV
USR_ORA_ENV=LDR_CNTRL=DATAPSIZE=64K@TEXTPSIZE=64K@STACKPSIZE=64K VMM_CNTRL=vmm_fork_policy=COR  --> 已经设置

$ ps -ef|grep tnslsnr|grep -v grep |awk  '{print $2}' | xargs -I {} svmon -P {} |grep BSS    
  3aceba        11 work text data BSS heap          sm   2885     0    0    2885    --> 输出'sm'为没有设置, 输出'm'为设置
  1d0e9d        10 clnt text data BSS heap,          s    193     0    -       -

3.2. 关于AIXTHREAD_SCOPE

Note: export AIXTHREAD_SCOPE=S, only necessary on AIX5L. A change was introduced in AIX 6.1 which means that the variable does not need to be set.

  • AIX 6.1+后不需要再设置*

3.3. 关于GI/Oracle owner capabilities

Ensure that the GI and ORACLE owner account has the CAP_NUMA_ATTACH, CAP_BYPASS_RAC_VMM, and CAP_PROPAGATE capabilities. This is required per the 11gR2 and above installation guide and it is also required for all pre-11gR2 installations. Check and Set example for GRID user is as follows:

#/usr/bin/lsuser -a capabilities grid
#/usr/bin/chuser capabilities=CAP_NUMA_ATTACH,CAP_BYPASS_RAC_VMM,CAP_PROPAGATE grid

4. CPU参数

4.1. SMT

For AIX executing on POWER9 based systems, SMT8 mode typically provides best performance. SMT1 mode on POWER9 bases systems will typically result in significant performance degradation with Oracle databases.

POWER9 CPU开启SMT8模式(默认)可以获取更好性能。

检查命令: # smtctl |grep SMT

4.2. Virtual processor folding

Virtual processor folding: This is a feature of Power Systems in which unused virtual processors are taken offline until the demand requires that they be activated.
The default is to allow virtual processor folding, and this should not be altered without consulting AIX support. (schedo parameter vpm_fold_policy=2).

For Oracle database environments it is strongly suggested to set schedo parameter vpm_xvcpus to a value of 2 as we have seen AIX incorrectly folding too many processors if the parameter is left at default of 0.

*** Note that this is a critical setting in a RAC environment when using LPARs with processor folding enabled. If this setting is not adjusted, there is a high risk of RAC node evictions under light database workload conditions ***

Processor Folding feature (default), 在服务器空闲时候关闭(offline)虚拟CPU, 在需要时候再激活, Oracle数据库环境强烈建议将vpm_xvcpus设置为>=2, 避免过多的CPU被关闭。RAC环境当负载低的时候很大风险会导致节点被驱逐。

设置方法: 不要去修改vpm_fold_policy参数, 但将vpm_xvcpus设置为2(至少2个CPU online);
# schedo -p -o vpm_xvcpus=2
检查方法: 
# schedo -a|grep vpm_xvcpus

5. I/O参数

5.1. 关于条带

使用ASM, 不考虑pp条带, 不考虑CIO相关设置

5.2. ASM相关设置(SCSI reservation)

For clustered ASM (e.g. RAC) configurations, SCSI reservation must be disabled on all ASM hdisk and hdiskpower devices (e.g. reserve_policy=no_reserve).

#MOS 422075.1, SCSI reservation must be disabled, for:
#  1. SSA, FAStT, or non-MPIO-capable disks         -> reserve_lock=no
#  2. SS, EMC, HDS, CLARiiON, or MPIO-capable disks -> reserve_policy=no_reserve

检查方法: lsattr -E -l hdiskX | grep reserve_  
设置方法: chdev -l hdiskX -a reserve_policy=no_reserve -P

5.3. 关于AIO设置

AIO kernel extensions are loaded at system boot (always loaded), AIO servers stay active as long as there are service requests, and the number of AIO servers is dynamically increased or reduced based on demand of the workload. The aio_server_inactivity parameter defines after how many seconds idle time an AIO server will exit. AIO tunables are now based on logical CPU count, and hence it is usually not necessary to tune minservers, maxservers, and maxreqs as in the past.

  • AIX6.1后不需要再手动设置minservers, maxservers 和 maxreqs. *

6. 网络设置

6.1. 网络内核参数

sb_max >= 1MB (1048576) and must be greater than maximum tpc or udp send or recv space
tcp_sendspace = 262144
tcp_recvspace = 262144
udp_sendspace = db_block_size * db_file_multiblock_read_count + 4K
udp_recvspace = 10 * (udp_sendspace)
tcp_fastlo = 1 #?? AIX 7+ only, This can be extremely useful for improving performance folding or loop back (bequeath) connections and should be evaluated with application testing.
rfc1323 = 1

// Ephemerals (non-defaults suggested for a large number of connecting hosts or a high degree of parallel query; also to avoid install-time warnings)
// Importance of TCP/UDP Parameters in 11gR2 (Doc ID 1307804.1)
tcp_ephemeral_low = 9000
tcp_ephemeral_high = 65500
udp_ephemeral_low = 9000
udp_ephemeral_high = 65500

设置方法:

no -p -o rfc1323=1
no -p -o sb_max = 4194304      # (Doc ID 811293.1)
no -p -o udp_sendspace=135168  #((DB_BLOCK_SIZE*DB_FILE_MULTIBLOCK_READ_COUNT)+4 KB), but no lower than 65536;
                               # 案例: 135168 = 8192 * 16 + 4096 : db_block_size=8192(8K), db_file_multiblock_read_count=16;
							   
no -p -o udp_recvspace=1351680 # Minimum recommended value is 10x udp_sendspace, parameter value must be less than sb_max;
no -p -o tcp_sendspace=262144
no -p -o tcp_recvspace=262144
no -p -o tcp_ephemeral_low=9000 -o tcp_ephemeral_high=65500 
no -p -o udp_ephemeral_low=9000 -o udp_ephemeral_high=65500
no -p -o tcp_fastlo=1          # MOS 2141756.1, AIX低版本有Bug: AIX 6.1 TL9: IV67463, AIX 7.1 TL3 SP3/SP4: IV66228, AIX 7.1 TL4: IV67443

6.2. Jumbo frames

Jumbo frames are used to reduce the number of frames to transmit a given volume of network traffic, but they only work if enabled on every hop in the network infrastructure. Jumbo frames help to reduce network and CPU overheads. The use of Jumbo frames between the database RAC nodes in a cluster (RAC interconnect) is strongly recommended.

# 检查方法
Recommendation for the Real Application Cluster Interconnect and Jumbo Frames (Doc ID 341788.1)

Traceroute: Notice the 9000 packet goes through with no error, while the 9001 fails, this is a correct configuration that supports a message of up to 9000 bytes with no fragmentation:

[node01] $ traceroute -F node02-priv 9000
traceroute to node02-priv (10.x.x.2), 30 hops max, 9000 byte packets
1 node02-priv (10.x.x.2) 0.232 ms 0.176 ms 0.160 ms

[node01] $ traceroute -F node02-priv 9001
traceroute to node02-priv (10.x.x.2), 30 hops max, 9001 byte packets
traceroute: sendto: Message too long
1 traceroute: wrote node02-priv 9001 chars, ret=-1

Ping: With ping we have to take into account an overhead of about 28 bytes per packet, so 8972 bytes go through with no errors, while 8973 fail, this is a correct configuration that supports a message of up to 9000 bytes with no fragmentation:

[node01]$ ping -c 2 -M do -s 8972 node02-priv
PING node02-priv (10.x.x.2) 1472(1500) bytes of data.
1480 bytes from node02-priv (10.x.x.2): icmp_seq=0 ttl=64 time=0.220 ms
1480 bytes from node02-priv (10.x.x.2): icmp_seq=1 ttl=64 time=0.197 ms

[node01]$ ping -c 2 -M do -s 8973 node02-priv
From node02-priv (10.x.x.1) icmp_seq=0 Frag needed and DF set (mtu = 9000)
From node02-priv (10.x.x.1) icmp_seq=0 Frag needed and DF set (mtu = 9000)
--- node02-priv ping statistics ---
0 packets transmitted, 0 received, +2 errors

7. 其它设置

7.1. Resource Limits

ulimits (smit chuser or edit /etc/security/limits to create a stanza for Oracle/grid user and set -1 (unlimited) for everything except core
oracle:
data = -1
stack = -1
fsize_hard = -1
cpu_hard = -1
data_hard = -1
stack_hard = -1
fsize = -1
nofiles = -1
cpu = -1
rss = -1

grid:
data = -1
stack = -1
fsize_hard = -1
cpu_hard = -1
data_hard = -1
stack_hard = -1
fsize = -1
nofiles = -1
cpu = -1
rss = -1

7.2. maxuproc

Maximum number of PROCESSES allowed per user (smit chgsys). Set this value to 16386 (16k)

检查方法: lsattr -EHl sys0 |egrep "maxuproc|ncargs"
设置方法:
chdev -l sys0 -a maxuproc=16384               # 不低于16384
chdev -l sys0 -a ncargs=128                   # AIX6.1 = 256, 不低于128, 默认就可以;

7.3. Timezone

AIX : OLSON TZ result in performance and eviction (Doc ID 2215971.1)

The default timezone format for AIX 6.1 and AIX 7 is Olson Time, but not POSIX time zone Format, and may be tuned to POSIX time zone for performance sensitive applications。

  • Change the time zone format to POSIX. *
检查方法: echo $TZ , 输出应为BEIST-8;
设置方法: 
smitty chtz_date
  "Change Time Zone Using User Inputted Values" 
    Standard Time ID(only alphabets) = BEIST
    Standard Time Offset from CUT([+|-]HH:MM:SS)  = -8

7.3. 关于DISK/HBA卡参数

  • 基于磁盘这列厂家建议/最佳实践设置 *
#MPIO路径切换(Doc ID 560077.1)
检查: lsattr -El fscsi0
设置: chdev -l fscsi0 -a dyntrk=yes -a fc_err_recov=fast_fail -P

#16Gb FC ??
检查: lsattr -El fscsi0
设置: chdev -l fcs0 -a num_cmd_elems=1024 -a max_xfer_size=0x200000 -P

max_xfer_size should be increased from default 1MB to 2MB. The default adapter DMA memory size is 16 MB which increases to 128 MB when a non default max_xfer_size is used. Larger DMA size can be important for performance with many concurrent large block I/O’s.

num_cmd_elems might need to be increased if fcstat -e reports a persistent nonzero value for No Command Resource Count. Please verify with your storage provider possible limits and recommended storage best practices before changing num_cmd_elems.
  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值