oracle 为什么用大页,一篇ORACLE在linux上用大页内存的好文章

Linux HugePages

* What you may already know:

- Larger (usually 2M depending on architecture) memory page size

- Processes share page table entries, less kernel page table management overhead, less TLB misses

- Always resident memory

- On Oracle11g, can be enabled only if AMM is disabled (memory_target set to 0)

- MOS notes 361323.1 (HugePages on Linux: What It Is... and What It Is Not...), 744769.1, 748637.1

- Similar to Solaris ISM (intimate shared memory) in almost every way

Now what you may not know (but may not care either):

* HugePages memory only shows as resident memory on Red Hat 4, not 5

(Actually, it's most likely Linux kernel, not Red Hat version, dependent.) On RHEL 4 server, when

HugePages is used, `top' or `ps' shows that Oracle process's resident memory is only slightly

smaller than virtual memory. But on RHEL 5, resident memory is very much smaller. This, however,

does not change the fact that HugePages memory is guaranteed to be locked in RAM. David Gibson, a

HugePages developer, says in private email "hugepages are always resident, never swapped. This

[RHEL 5 showing non-resident HugePages] must be something changed in the wider MM code".

* vm.nr_hugepages, memlock, and SGA sizing

Process memory lock limit, memlock in /etc/security/limits.conf in KB, can be set to any number

larger than vm.nr_hugepages (in /etc/sysctl.conf, multiplied by Hugepagesize). But it's best to

set vm.nr_hugepages only slightly larger than total SGA's of all instances on the box. (Don't

forget the ASM instance if any.) The memlock setting in limits.conf alone won't actually set

aside memory; it's just a mathematical number to limit the amount of memory locking. So it's OK to

set it to a very big number. But vm.nr_hugepages actually allocates memory. It must not be too

large or your box would run out of memory or can't boot.

If after starting instance HugePages is found to be not used, lower SGA a lot and try again. Then

gradually add SGA back.

* Don't waste memory

If HugePages_Free is not much smaller than HugePages_Total in /proc/meminfo, especially after the

Oracle instance has been running for a while, you're wasting too much memory allocated as HugePages

not being used, because only SysV type shared memory, such as Oracle SGA, can use HugePages. You

can either increase Oracle SGA to close to HugePages_Total (note its unit is Hugepagesize), or

decrease HugePages_Total, preferably by editing /etc/sysctl.conf and running `sysctl -p'. Although

increasing HugePages dynamically may not work because the system free memory may have already

been fragmented, decreasing it always works.

* Checking usage

`cat /proc/meminfo'. Focus on HugePages_* and PageTables. Also, `strace -f -e trace=process

sqlplus / as sysdba' and startup. Look for SHM_HUGETLB in 3rd arg to shmget(). Linux shmat()

doesn't have the option for this flag so tracing listener to follow down to the cloned server

process won't work.[note1] Also, Linux doesn't have -s option for `pmap' as on Solaris to check

page size for the individual mappings inside a process memory space.[note2]

memlock affects shell's `ulimit -l' setting. Make sure your shell has the desired setting before

starting DB instance.

You can check how the numbers HugePages_Free and HugePages_Rsvd change while you startup or

shutdown an instance that uses HugePages (adjust grep pattern as needed):

while true; do

for i in $(grep ^Huge /proc/meminfo | head -3 | awk '{print $2}'); do

echo -n "$i "

done

echo ""

sleep 5

done

The output is like the following (numbers are HugePages_Total, HugePages_Free, HugePages_Rsvd):

512 225 192

512 225 192

512 225 192

512 512 0   512 512 0

512 371 338 512 329 296 512 227 194

It indicates that when the instance is started, HugePages memory pages are immediately reserved.

This is a fast process because there's no write to the pages (remember reserved is just a special

type of free; see ). Then when the pages are written to,

they're taken off of the reserved list and used. This server has 33 "real" free pages wasted. I

could have done better diligence to not assign them to HugePages.

Note that older versions of HugePages code doesn't show reserved pages. On Red Hat Linux, the change is

between RHEL 4 and 5.

* 11g AMM

11g Automatic Memory Management includes PGA into auto management. But PGA can never be allocated

from HugePages memory.[note3] I would set memory_target to 0 to disable AMM and configure HugePages as

usual. HugePages is a far more appealing feature than AMM. If I have to sacrifice one of the two,

I sacrifice AMM. The usage of SGA and PGA is so different they should be separately managed anyway.

To name one issue with AMM, it requires hundreds if not thousands of descriptors for *each* server

process to open *all* the files under /dev/shm, most likely 4 MB each (SGA granule size, _ksmg_granule_size).

See

* In 11g, due to Bug 9251136 "INSTANCE WILL NOT USE HUGEPAGE IF STARTED BY SRVCTL". The root cause

is that process ulimit settings (as in /etc/security/limits.conf) are not used. To confirm, compare

`grep "locked memory" /proc//limits' with your setting. You can work around

the bug by setting the limit in either $CRS_HOME/bin/ohasd or /etc/init.d/ohasd, but not /etc/profile

or /etc/init.d/init.ohasd:

# diff /u01/app/11.2.0/grid/bin/ohasd /u01/app/11.2.0/grid/bin/ohasd.bak

5,7d4

< #20100319 YongHuang: Increase process max locked memory to allow HugePages as workaround for Bug 9251136

< ulimit -l 8590000

<

* Further reading

_____________

[note1] On Solaris, you can run `truss -f -p ' and connect to the database through

Oracle Net. The trace will show e.g.

shmat(1979711503, 0x40280000000, 040000)        = 0x40280000000

where 040000 is SHM_SHARE_MMU according to /usr/include/sys/shm.h.

[note2] At least not until this patch is in your kernel: That patch

may be in kernel 2.6.29 ().

[note3] For now at least. See Kevin Closson's blog for more:

转自:

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值