在linux下为oracle开启大页(hugepage)

一、在解释什么情况下需要开启大页和为啥需要开启大页前先了解下Linux下页的相关的知识:
以下的内容是基于32位的系统,4K的内存页大小做出的计算
1)目录表,用来存放页表的位置,共包含1024个目录entry,每个目录entry指向一个页表位置,每个目录entry,4b大小,目录表共4b*1024=4K大小
2)页表,用来存放物理地址页的起始地址,每个页表entry也是4b大小,每个页表共1024个页表entry,因此一个页表的大小也是4K,共1024个页表,因此页表的最大大小是1024*4K=4M大小
3)每个页表entry指向的是4K的物理内存页,因此页表一共可以指向的物理内存大小为:1024(页表数)*1024(每个页表的entry数)*4K(一个页表entry代表的页大小)=4G
4)操作系统将虚拟地址映射为物理地址时,将虚拟地址的31-22这10位用于从目录表中索引到1024个页表中的一个,将虚拟地址的12-21这10位用于从页表中索引到1024个页表entry中的一个。从这个页表entry中获得物理内存页的起始地址,然后将虚拟地址的0-12位用作4KB内存页中的偏移量,那么物理内存页起始地址加上偏移量就是进出所需要访问的物理内存地址。

由于32位操作系统不会出现4M的页表,因为一个进程不能使用到4GB的内存空间,有些空间被保留使用,比如用来做操作系统内核的内存。而且页表entry的创建出现在进程访问到一块内存的时候,而不是一开始就创建。

在32位系统下,一个进程访问1GB的内存,会产生1M(1*1024*1024/4*4/1024/1024)的页表,如果是在64位系统,将会增大到2M。

很容易推算,如果一个SGA设置为60G,有1500个ORACLE用户进程(在linux中每个进程页表独立,都有自己的页表),64位LINUX的系统上,最大的页表占用内存为:60*2*1500/1024=175G,是的,你没看错,是175G!

二、为什么使用大页?什么时候使用大页?

而在Redhat Linux中,内存都是以页的形式划分的,默认情况下每页是4K,这就意味着如果物理内存很大,则映射表的条目将会非常多,会影响CPU的检索效率。因为内存大小是固定的,为了减少映射表的条目,可采取的办法只有增加页的尺寸。这种增大的内存页尺寸在Linux 2.1中,称为Big page;在AS 3/4中,称为Hugepage
在Linux中配置hugepage可以提高oracle的性能,减少oracle sga的页交换,类似于aix中的lagepage。
当你主机的物理内存为64G,设SGA>=32G时,建议开启大页

三、设置大页步骤

1、关闭Oracle Database 11g中的AMM(Automatic Memory Management),即把两个参数MEMORY_TARGET / MEMORY_MAX_TARGET设为0
如果alter system set  MEMORY_MAX_TARGET=0 scope=spfile;重启后发现没有改为0,可以alter system reset memory_max_target; 来设置

2、参考metalink(文档 ID 401749.1)提供的脚本,计算hugepages的大小

#!/bin/bash
#
# hugepages_settings.sh
#
# Linux bash script to compute values for the
# recommended HugePages/HugeTLB configuration
#
# Note: This script does calculation for all shared memory
# segments available when the script is run, no matter it
# is an Oracle RDBMS shared memory segment or not.
#
# This script is provided by Doc ID 401749.1 from My Oracle Support 
# http://support.oracle.com

# Welcome text
echo "
This script is provided by Doc ID 401749.1 from My Oracle Support 
(http://support.oracle.com) where it is intended to compute values for 
the recommended HugePages/HugeTLB configuration for the current shared 
memory segments. Before proceeding with the execution please note following:
 * For ASM instance, it needs to configure ASMM instead of AMM.
 * The 'pga_aggregate_target' is outside the SGA and 
   you should accommodate this while calculating SGA size.
 * In case you changes the DB SGA size, 
   as the new SGA will not fit in the previous HugePages configuration, 
   it had better disable the whole HugePages, 
   start the DB with new SGA size and run the script again.
And make sure that:
 * Oracle Database instance(s) are up and running
 * Oracle Database 11g Automatic Memory Management (AMM) is not setup 
   (See Doc ID 749851.1)
 * The shared memory segments can be listed by command:
     # ipcs -m


Press Enter to proceed..."

read

# Check for the kernel version
KERN=`uname -r | awk -F. '{ printf("%d.%d\n",$1,$2); }'`

# Find out the HugePage size
HPG_SZ=`grep Hugepagesize /proc/meminfo | awk '{print $2}'`
if [ -z "$HPG_SZ" ];then
    echo "The hugepages may not be supported in the system where the script is being executed."
    exit 1
fi

# Initialize the counter
NUM_PG=0

# Cumulative number of pages required to handle the running shared memory segments
for SEG_BYTES in `ipcs -m | cut -c44-300 | awk '{print $1}' | grep "[0-9][0-9]*"`
do
    MIN_PG=`echo "$SEG_BYTES/($HPG_SZ*1024)" | bc -q`
    if [ $MIN_PG -gt 0 ]; then
        NUM_PG=`echo "$NUM_PG+$MIN_PG+1" | bc -q`
    fi
done

RES_BYTES=`echo "$NUM_PG * $HPG_SZ * 1024" | bc -q`

# An SGA less than 100MB does not make sense
# Bail out if that is the case
if [ $RES_BYTES -lt 100000000 ]; then
    echo "***********"
    echo "** ERROR **"
    echo "***********"
    echo "Sorry! There are not enough total of shared memory segments allocated for 
HugePages configuration. HugePages can only be used for shared memory segments 
that you can list by command:

    # ipcs -m

of a size that can match an Oracle Database SGA. Please make sure that:
 * Oracle Database instance is up and running 
 * Oracle Database 11g Automatic Memory Management (AMM) is not configured"
    exit 1
fi

# Finish with results
case $KERN in
    '2.2') echo "Kernel version $KERN is not supported. Exiting." ;;
    '2.4') HUGETLB_POOL=`echo "$NUM_PG*$HPG_SZ/1024" | bc -q`;
           echo "Recommended setting: vm.hugetlb_pool = $HUGETLB_POOL" ;;
    '2.6') echo "Recommended setting: vm.nr_hugepages = $NUM_PG" ;;
esac

# End

3、对hugepages_settings.sh这个脚本授可执行的权限

chmod +x hugepages_settings.sh

4、执行执行hugepages_settings.sh得到建议值

This script is provided by Doc ID 401749.1 from My Oracle Support 
(http://support.oracle.com) where it is intended to compute values for 
the recommended HugePages/HugeTLB configuration for the current shared 
memory segments. Before proceeding with the execution please note following:
 * For ASM instance, it needs to configure ASMM instead of AMM.
 * The 'pga_aggregate_target' is outside the SGA and 
   you should accommodate this while calculating SGA size.
 * In case you changes the DB SGA size, 
   as the new SGA will not fit in the previous HugePages configuration, 
   it had better disable the whole HugePages, 
   start the DB with new SGA size and run the script again.
And make sure that:
 * Oracle Database instance(s) are up and running
 * Oracle Database 11g Automatic Memory Management (AMM) is not setup 
   (See Doc ID 749851.1)
 * The shared memory segments can be listed by command:
     # ipcs -m


Press Enter to proceed...

Recommended setting: vm.nr_hugepages = 1028
得出大页的大小为1028页(注:一页为2M,这个值不可改,1028*2M=2056M),实际上hugepages与参数sga_max_size有关,比sga_max_size的值稍微大一点点(比SGA_MAX_SIZE最少要多加一页,2M的页不要分配超过sga_max_size太多,会造成内存的浪费)

注意:使用Hugepage内存是共享内存,它会一直keep在内存中的,不会被交换出去,也就是说使用hurgepage的内存不能被其他的进程使用,所以,一定要合理设置这个值,避免造成浪费。对于只使用Oracle的服务器来说,把Hugepage_pool设置成大于SGA大小才能被Oracle使用。


SQL>show parameter sga_max_size
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
sga_max_size                         big integer 2G

5、设置hugepages,在内核参数中添加一行,vi /etc/sysctl.conf

vm.nr_hugepages = 1028

6、修改内核参数立即生效
[root@el5 ~]#  sysctl -p

net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 0
kernel.core_uses_pid = 1
net.ipv4.tcp_syncookies = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
fs.file-max = 6815744
vm.nr_hugepages = 1028
7、别忘记设定/etc/security/limits.conf文件,以K为单位,必须大于sga_max_size,这里设定为2056000

[root@el5 ~]#  vi /etc/security/limits.conf

oracle          soft    memlock 2056000
oracle          hard    memlock 2056000

8、检查limits是否正确
[root@el5 ~]#  su - oracle
[oracle@el5 ~] ulimit -l
2056000

9、重启数据库---注原来的orale用户的窗口退到root用户,重新su - oracle

sys@OCM> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
[oracle@mydb ~]$ exit
logout

You have new mail in /var/spool/mail/root
[root@mydb ~]# su -  oracle
[oracle@mydb ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Thu Dec 5 10:56:20 2013

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

sys@OCM> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
sys@OCM> startup
ORACLE instance started.

Total System Global Area 2137886720 bytes
Fixed Size                  2230072 bytes
Variable Size            1409288392 bytes
Database Buffers          603979776 bytes
Redo Buffers              122388480 bytes
Database mounted.
Database opened.
10、查看大页,已被使用

[oracle@el5 ~]$ watch -n1 'cat /proc/meminfo |grep -i HugePage'

Every 1.0s: cat /proc/meminfo |grep -i HugePage                                                   Thu Dec  5 11:09:06 2013

HugePages_Total:  1028
HugePages_Free:    869
HugePages_Rsvd:    842
Hugepagesize:     2048 kB
注:
HugePages_Total:  1028    ---总共1028页
HugePages_Free:    869    ---空闲548页,即当前大页被使用了1028-869=159页,即被用了159*2M=118M,小于sga_target。
HugePages_Rsvd:    842    ---操作系统承诺给Oracle预留842页,即842*2M=1684M(1684+118==SGA_MAX_SIZE)
Hugepagesize:     2048 kB --每页是2M,不可修改


使用了hugepage之后,SGA就默认pin在内存里了,那么就不用lock sga了。接下来我们研究一下参数:pre_page_sga,这个参数默认是false,我把它打开。

sys@OCM> alter system set pre_page_sga=true scope=spfile;

System altered.

sys@OCM> show parameter sga

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
lock_sga                             boolean     FALSE
pre_page_sga                         boolean     TRUE
sga_max_size                         big integer 2G
sga_target                           big integer 1G
HugePages_Total:  1028    ---总共1028页
HugePages_Free:    548    ---空闲548页,即当前大页被使用了1028-548=480页,即被用了480*2M=960M,约等于sga_target,参数pre_page_sga起作用了。
HugePages_Rsvd:    521    ---操作系统承诺给Oracle预留521页,即521*2M=1042M(理解为sga_max_size-sga_target)
Hugepagesize:     2048 kB --每页是2M,不可修改

参考metalink:USE_LARGE_PAGES To Enable HugePages (文档 ID 1392497.1)

For 11.2.0.2 and further, the Oracle Database Server has added a new parameter that helps managing the hugepages for use by the database. 
The initialization parameter that was added is USE_LARGE_PAGES. 

USE_LARGE_PAGES parameter has these possible values: "true" (default), "only", "false".

1. The default value of "true" preserves the current behavior of trying to use hugepages if they are available on the OS. 

In 11.2.0.2 if there are not enough hugepages, only small pages will be used for SGA memory. This may lead to ORA-4030 errors due to the remaining hugepages going unused and more memory being used by the kernel for page tables. 

In 11.2.0.3 the behavior was changed such that Oracle will now allocate what it can of the SGA in hugepages and if it runs out, it will allocate the rest of the SGA using small pages. With this new behavior additional shared memory segments are an expected side effect. Part of the change is to ensure that each shared memory segment making up the SGA only contains sub-areas with an identical alignment requirement - hence the SGA will spread over more separate SHM segments. In this supported mixed page mode the database will exhaust the available hugepages, before switching to regular sized pages.
 
2. Setting it to "false" means do not use hugepages


3. A setting of "only" means do not start up the instance if hugepages cannot be used for the whole memory (to avoid an out-of-memory situation).

补充关于内存申请的OverCommit:


Linux下的OverCommit机制,主要是为了应对可能的异常的大量内存申请对OS本身造成冲击。
Linux有三种OverCommit机制,可以通过:/proc/sys/vm/overcommit_memory来配置,三种配置的具体含义:
0:启发式策略,后果比较严重的Overcommit将不能成功,而轻微的Overcommit将被允许。
1:永远允许Overcommit,这种策略适合那些不能承受内存分配失败的应用,比如某些科学计算应用。
2:永远禁止Overcommit,在这个情况下,系统所能分配的内存不会超过swap+RAM*系数(/proc/sys/vm /overcmmit_ratio,默认50%,你可以调整),如果这么多资源已经用光,那么后面任何尝试申请内存的行为都会返回错误,这通常意味着此时 没法运行任何新程序。

整理自网络






没有更多推荐了,返回首页