Linux HugePages

				Linux HugePages


* What you may already know:

 - Larger (usually 2M depending on architecture) memory page size
 - Processes share page table entries, less kernel page table management overhead, less TLB misses
 - Always resident memory
 - On Oracle11g, can be enabled only if AMM is disabled (memory_target set to 0)
 - MOS notes 361323.1 (HugePages on Linux: What It Is... and What It Is Not...), 744769.1, 748637.1
 - Similar to Solaris ISM (intimate shared memory) in almost every way
New in 11.2.0.2 (and up):
 - MOS note 1392497.1 (USE_LARGE_PAGES To Enable HugePages In 11.2)
New in SLES11, RHEL6, OEL6 and UEK2:
 - MOS note 1557478.1 (Disable Transparent HugePages on SLES11, RHEL6, OEL6 and UEK2 Kernels)




* HugePages memory only shows as resident memory on Red Hat 4, not 5 or 6

(It's most likely Linux kernel-, not Red Hat version-, dependent.) On RHEL 4 server, when 
HugePages is used, `top' or `ps' shows that Oracle process's resident memory is only slightly 
smaller than virtual memory. But on RHEL 5 or 6, resident memory is very much smaller. This, however, 
does not change the fact that HugePages memory is guaranteed to be locked in RAM. David Gibson, a 
HugePages developer, says in private email "hugepages are always resident, never swapped. This 
[RHEL 5,6 showing non-resident HugePages] must be something changed in the wider MM code".


* vm.nr_hugepages, memlock, and SGA sizing

Process memory lock limit, memlock in /etc/security/limits.conf in KB, can be set to any number 
larger than vm.nr_hugepages (in /etc/sysctl.conf, multiplied by Hugepagesize). It's just a 
mathematical number to compare with when the process wants to lock a memory segment into RAM to 
see if it's allowed. The memlock setting in limits.conf alone won't actually set aside memory. So 
it's OK to set it to a very big number. But vm.nr_hugepages actually allocates memory. It must 
not be too large or your box would run out of memory or can't boot.

If after starting Oracle instance HugePages is found to be not used, you can lower SGA a lot and try 
again. Then gradually add SGA back.


* Don't waste memory

If HugePages_Free is not much smaller than HugePages_Total in /proc/meminfo, you're wasting too 
much memory allocated as HugePages not being used, because only SysV type shared memory, such as 
Oracle SGA, can use HugePages. You can either increase Oracle SGA to close to HugePages_Total 
(note its unit is Hugepagesize), or decrease HugePages_Total, by 
echo newvalue > /proc/sys/vm/nr_hugepages 
or by editing /etc/sysctl.conf and running `sysctl -p'. Although increasing HugePages dynamically 
may not work because the system free memory may have already been fragmented, decreasing it always 
works.


* Writing to /proc/sys/vm/nr_hugepages

That "file" is writable. If you `echo {the number in /etc/sysctl.conf} > /proc/sys/vm/nr_hugepages', 
it's like running `sysctl -p' except you're only updating vm.nr_hugepages. Keep running `echo' and 
`cat nr_hugepages' to see how many HugePages the OS actually created.


* Checking usage

/proc/meminfo or /proc/sys/vm/nr_hugepages is the file we'll look at (unless you're interested in 
NUMA node specific values which are in /sys/devices/system/node/node*/meminfo). Focus on 
HugePages_* and PageTables. Suppose the HugePages_* numbers are like the following:

HugePages_Total:  3090
HugePages_Free:   2358
HugePages_Rsvd:   2341

It means 3090-2358=732 pages are used (actually being used, not just reserved to be used). 2341 pages 
are reserved. So the wastage, i.e. HugePages memory that will never be used, is 2358-2341=17 pages. 
In short, you can say that 
HugePages_Total - HugePages_Free + HugePages_Rsvd 
is or will be used. 

Based on this understanding, you can use the difference between HugePages_Free and HugePages_Rsvd 
to check whether your HugePages are used by Oracle SGA. Since 2358-2341=17 pages or 34 MB is small, 
we're good. Don't assume HugePages is not used just because the numbers are big, because an instance 
just started hasn't really used the memory yet.

Here's an easy way to understand the HugePages entries in /proc/meminfo.
(Ref: http://www.freelists.org/post/oracle-l/OT-Blog-entry-on-hugepages,20)

UUUUUFFFF <-- Total split into really used (U) and free (F)
UUUUURRR. <-- Total split into really used (U), reserved (R) and really free (.)

If one letter or dot is one HugePage, the above says

HugePages_Total: 9
HugePages_Free:  4
HugePages_Rsvd:  3

and you'll have 4-3=1 page completely wasted.

If you want to check more, `strace -f -e trace=process sqlplus / as sysdba' and startup. 
Look for SHM_HUGETLB in 3rd arg to shmget(). Unfortunately, because Linux shmat() doesn't have the 
option for this flag, tracing listener to follow (-f option) down to the clone()'d server process 
won't work.[note1] Also, Linux doesn't have -s option for `pmap' as on Solaris to check page size 
for the individual mappings inside a process memory space. Fortunately, you can check 
/proc/<pid>/smaps for that:[note2]

# cat /proc/<any pid of Oracle instance>/smaps
...
61000000-a7000000 rwxs 00000000 00:0c 1146885                            /SYSV00000000 (deleted)
Size:            1146880 kB
Rss:                   0 kB
Pss:                   0 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:            0 kB
Anonymous:             0 kB
AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:     2048 kB  <-- 2MB HugePage size
MMUPageSize:        2048 kB  <-- 2MB HugePage size

(All 0's for most fields are OK for HugePage pages. Make sure Transparent HugePages is disabled on RHEL6, 
or the last two fields may show up "4 kB" in some kernel versions.)

With that knowledge, you can check if a running Oracle instance is using HugePages without checking alert.log: 

$ grep ^KernelPageSize /proc/15824/smaps | grep -v '4 kB$' #15824 is an Oracle process
KernelPageSize:     2048 kB
...

No output means not using HugePages; all pages are of standard 4k size.

[Update 2015-04]
Beginning with 12c, you can query x$ksmssinfo inside Oracle. It not only matches lines of `ipcs -m' with lines of v$sga, but also shows which SGA component uses what page size, e.g. 4096 or 2097152.


* You can check how the numbers HugePages_Free and HugePages_Rsvd change while you startup or 
shutdown an instance that uses HugePages (adjust grep pattern as needed):

while true; do
 for i in $(grep ^Huge /proc/meminfo | head -3 | awk '{print $2}'); do
  echo -n "$i "
 done
 echo ""
 sleep 5
done

The output is like the following (numbers are HugePages_Total, HugePages_Free, HugePages_Rsvd):

512 225 192
512 225 192
512 225 192
512 512 0   <- Instance down. All HugePages freed. (This is the last moment of database shutdown.)
512 512 0
512 371 338 <- Startup. 338 pages free but reserved (i.e. 371-338=33 pages "real" free), 512-371=141 pages used
512 329 296 <- 512-329=183 pages used, up by 183-141=42, reserved pages down by 42, "real" free unchanged
512 227 194 <- 512-227=285 pages used, up by 285-183=102, reserved down by 102 too, "real" free unchanged

It indicates that when the instance is started, HugePages memory pages are immediately reserved. 
This is a fast process because there's no write to the pages (remember reserved is just a special 
type of free; see http://linux-mm.org/DynamicHugetlbPool). Then when the pages are written to, 
they're taken off of the reserved list and used. This server has 33 "real" free pages wasted. I 
could have done better diligence to not assign them to HugePages.

Note that older versions of HugePages code doesn't show reserved pages. On Red Hat Linux, the change is 
between RHEL 4 and 5.


* 11g AMM

11g Automatic Memory Management includes PGA into auto management. But PGA can never be allocated 
from HugePages memory.[note3] I would make sure memory_target is 0 to disable AMM and configure HugePages 
as usual. (The default value for memory_target is already 0, but DBCA may automatically choose a non-zero 
number for you, if you use it to create a DB. ASM will thus have a non-zero number as well but we 
normally leave ASM to use AMM. If ASM starts to throw ORA-4031, configure it to use HugePages as well. 
See Note 1625886.1 "ORA-4031 Errors On ASM Instance When Huge Pages Are Enabled On The System".) 

HugePages is a far more appealing feature than AMM. If I have to sacrifice one of the two, I sacrifice 
AMM. The usage of SGA and PGA is so different they should be separately managed anyway. To name one issue 
with AMM, it requires hundreds if not thousands of descriptors for *each* server process to open *all* 
the files under /dev/shm, most likely 4 MB each (SGA granule size, _ksmg_granule_size). 
See http://download.oracle.com/docs/cd/B28359_01/install.111/b32002/pre_install.htm#LADBI204


* In 11.2.0.1, due to Bug 9251136 "INSTANCE WILL NOT USE HUGEPAGE IF STARTED BY SRVCTL". The root cause
is that process ulimit settings (as in /etc/security/limits.conf) are not used. To confirm, compare
`grep "locked memory" /proc/<any instance process pid>/limits' with your setting. You can work around 
the bug by setting the limit in either $CRS_HOME/bin/ohasd or /etc/init.d/ohasd, but not /etc/profile 
or /etc/init.d/init.ohasd:

# diff /u01/app/11.2.0/grid/bin/ohasd /u01/app/11.2.0/grid/bin/ohasd.bak
5,7d4
< #20100319 YongHuang: Increase process max locked memory to allow HugePages as workaround for Bug 9251136
< ulimit -l 8590000
<

If modifying $CRS_HOME/bin/ohasd, remember to update it whenever you apply a clusterware patch. If possible, 
modifying /etc/init.d/ohasd is preferred even though it's owned by root, because it's less likely to be 
overwritten. 


* Oracle instance still doesn't use HugePages. Now what?
Try decreasing SGA a little.
Try a simple non-Oracle-specific program:
http://www.mjmwired.net/kernel/Documentation/vm/hugepage-shm.c
Make sure /proc/sys/kernel/shmmax and /proc/sys/kernel/shmall are big enough. They can be set to 
very big numbers with no ill effect.


* Transparent HugePages

Reference MOS 1557478.1
You should disable Transparent HugePages SLES11, RHEL6, OEL6 and UEK2 Kernels due to its possible high 
sys CPU usage, its immaturity and irrelevance to an Oracle server.

The recommended clean way to disable THP is to append transparent_hugepage=never to the kernel line of 
/boot/grub/grub.conf. On RHEL (but not Oracle Linux) servers, if THP is disabled by 
echo never > /sys/kernel/mm/enabled
you may still see a non-zero AnonHugePages, a measure of THP:

$ grep HugePage /proc/meminfo
AnonHugePages:     98304 kB
HugePages_Total:    5140
HugePages_Free:       60
HugePages_Rsvd:       41
HugePages_Surp:        0

This may be due to a RHEL 6 problem. See https://access.redhat.com/solutions/422283. But it could be 
something else (other than tuned or ktune running).

To see what processes use THP:

# grep AnonHugePages /proc/*/smaps | grep -v 'AnonHugePages:         0 kB'
/proc/2266/smaps:AnonHugePages:      2048 kB
/proc/2519/smaps:AnonHugePages:      2048 kB
/proc/3199/smaps:AnonHugePages:      2048 kB
/proc/3388/smaps:AnonHugePages:      2048 kB
/proc/4137/smaps:AnonHugePages:      2048 kB
/proc/4674/smaps:AnonHugePages:     40960 kB
/proc/4674/smaps:AnonHugePages:     38912 kB
/proc/4674/smaps:AnonHugePages:      4096 kB
/proc/4674/smaps:AnonHugePages:      2048 kB
/proc/4674/smaps:AnonHugePages:      2048 kB

Most of it is used by our EM agent pid 4674:
# ps -fp 2266,2519,3199,3388,4137,4674
UID        PID  PPID  C STIME TTY          TIME CMD
root      2266     1  0 Sep05 ?        00:00:40 /sbin/rsyslogd -i /var/run/syslogd.pid -c 5
root      2519  2373  0 Sep05 ?        00:19:57 lw-container lsass
root      3199     1  0 Sep05 ?        00:06:45 hpasmxld -f /dev/ipmi0
root      3388     1  0 Sep05 ?        00:00:12 cmapeerd -l /var/log/hp-snmp-agents/cma.log
oracle    4137     1  0 Sep05 ?        01:15:40 /u01/app/grid/bin/ohasd.bin reboot
oracle    4674  4366  0 Sep05 ?        03:16:00 /u01/app/oracle/agent/core/12.1.0.2.0/jdk/bin/java -Xmx128M -XX:MaxPermSize=96M -ser

To see some stats of THP:

# egrep 'trans|thp' /proc/vmstat
nr_anon_transparent_hugepages 48
thp_fault_alloc 260
thp_fault_fallback 100
thp_collapse_alloc 1
thp_collapse_alloc_failed 0
thp_split 12


* "Using mlock ulimits for SHM_HUGETLB [is] deprecated." in /var/log/messages

According to
Oracle Linux 6 Release Notes (Doc ID 1292376.1)
https://oss.oracle.com/el6/docs/RELEASE-NOTES-GA-en.html
"the application should be configured to use CAP_IPC_LOCK or the process (e.g. Oracle) should be 
added to the hugetlb_shm_group." Specifically, echo oracle's gid (id -g oracle) into /proc/sys/vm/hugetlb_shm_group. 
And also put vm.hugetlb_shm_group=<oracle's gid> in /etc/sysctl.conf.

According to
https://access.redhat.com/solutions/65846
this warning seems to be completely harmless and can be ignored. Kernel 2.6.32-431.el6 or later will suppress the 
warning.


* Further reading

http://linux-mm.org/HugePages
For Transparent HugePages:
http://www.mjmwired.net/kernel/Documentation/vm/transhuge.txt
http://structureddata.org/2012/06/18/linux-6-transparent-huge-pages-and-hadoop-workloads/

_____________
[note1] On Solaris, you can run `truss -f -p <listener PID>' and connect to the database through 
Oracle Net. The trace will show e.g.
shmat(1979711503, 0x40280000000, 040000)        = 0x40280000000
where 040000 is SHM_SHARE_MMU according to /usr/include/sys/shm.h.

[note2] /proc/<pid>/smaps comes with this patch in Linux kernel: http://lkml.org/lkml/2008/10/3/250. The patch 
may be in kernel 2.6.29 (http://www.kernel.org/pub/linux/kernel/v2.6/). Red Hat Enterprise Linux 6 
should be at least at that kernel version.
(Also See: http://www.freelists.org/post/oracle-l/OT-Blog-entry-on-hugepages,20)

[note3] For now at least. See Kevin Closson's blog for more:
http://kevinclosson.wordpress.com/2007/08/23/oracle11g-automatic-memory-management-and-linux-hugepages-support/
相关推荐
程序员的必经之路! 【限时优惠】 现在下单,还享四重好礼: 1、教学课件免费下载 2、课程案例代码免费下载 3、专属VIP学员群免费答疑 4、下单还送800元编程大礼包 【超实用课程内容】  根据《2019-2020年中国开发者调查报告》显示,超83%的开发者都在使用MySQL数据库。使用量大同时,掌握MySQL早已是运维、DBA的必备技能,甚至部分IT开发岗位也要求对数据库使用和原理有深入的了解和掌握。 学习编程,你可能会犹豫选择 C++ 还是 Java;入门数据科学,你可能会纠结于选择 Python 还是 R;但无论如何, MySQL 都是 IT 从业人员不可或缺的技能!   套餐中一共包含2门MySQL数据库必学的核心课程(共98课时)   课程1:《MySQL数据库从入门到实战应用》   课程2:《高性能MySQL实战课》   【哪些人适合学习这门课程?】  1)平时只接触了语言基础,并未学习任何数据库知识的人;  2)对MySQL掌握程度薄弱的人,课程可以让你更好发挥MySQL最佳性能; 3)想修炼更好的MySQL内功,工作中遇到高并发场景可以游刃有余; 4)被面试官打破沙锅问到底的问题问到怀疑人生的应聘者。 【课程主要讲哪些内容?】 课程一:《MySQL数据库从入门到实战应用》 主要从基础篇,SQL语言篇、MySQL进阶篇三个角度展开讲解,帮助大家更加高效的管理MySQL数据库。 课程二:《高性能MySQL实战课》主要从高可用篇、MySQL8.0新特性篇,性能优化篇,面试篇四个角度展开讲解,帮助大家发挥MySQL的最佳性能的优化方法,掌握如何处理海量业务数据和高并发请求 【你能收获到什么?】  1.基础再提高,针对MySQL核心知识点学透,用对; 2.能力再提高,日常工作中的代码换新貌,不怕问题; 3.面试再加分,巴不得面试官打破沙锅问到底,竞争力MAX。 【课程如何观看?】  1、登录CSDN学院 APP 在我的课程中进行学习; 2、移动端:CSDN 学院APP(注意不是CSDN APP哦)  本课程为录播课,课程永久有效观看时长 【资料开放】 课件、课程案例代码完全开放给你,你可以根据所学知识,自行修改、优化。  下载方式:电脑登录课程观看页面,点击右侧课件,可进行课程资料的打包下载。
©️2020 CSDN 皮肤主题: 大白 设计师:CSDN官方博客 返回首页