原文链接:
http://linux.cloudibee.com/2007/09/linux-hugepages/
http://linuxgazette.net/155/krishnakumar.html
在看dpdk时官网即提到配置hugepage,配置方法很简单,参照以下文章。
简述:配置大页内存是为了从虚拟地址到物理地址映射的速度,默认访问一次物理地址需要2次操作,查页表,访问真实物理地址,通过tlb可以加快访问速度,但是tlb会存在缺页现象,通过调整大页内存可以减少缺页现象。
不知道上述理解是否有问题。有空再仔细研究一下。
摘文:
配置Linux系统大内存
Hugepages is a mechanism that allows the Linux kernel to utilize
the multiple page size capabilities of modern hardware
architectures. Linux uses pages as the basic unit of memory, where
physical memory is partitioned and accessed using the basic page
unit. The default page size is 4096 Bytes in the x86
architecture.Hugepages allows large amounts of memory to be
utilized with a reduced overhead. Linux uses “Transaction Lookaside
Buffers” (TLB) in the CPU architecture. These buffers contain
mappings of virtual memory to actual physical memory addresses. So
utilizing a huge amount of physical memory with the default page
size consumes the TLB and adds processing overhead.
The Linux kernel is able to set aside a portion of physical
memory to be able be addressed using a larger page size. Since the
page size is higher, there will be less overhead managing the pages
with the TLB. In the Linux 2.6 series of kernels, hugepages is
enabled using the CONFIG_HUGETLB_PAGE feature when the kernel is
built. Systems with large amount of memory can be configured to
utilize the memory more efficiently by setting aside a portion
dedicated for hugepages. The actual size of the page is dependent
on the system architecture.
A typical x86 system will have a Huge Page Size of 2048 kBytes.
The huge page size may be found by looking at the /proc/meminfo
:
# cat /proc/meminfo |grep Hugepagesize
Number of Hugepages can be allocated using the
/proc/sys/vm/nr_hugepages entry, or by using the sysctl
command. To view the current setting using the
/proc entry:
# cat /proc/sys/vm/nr_hugepages
To view the current setting using the sysctl
command:
# sysctl vm.nr_hugepages
To set the number of huge pages using /proc
entry:
# echo 5 > /proc/sys/vm/nr_hugepages
To set the number of hugepages using
sysctl:
# sysctl -w vm.nr_hugepages=5
It may be necessary to reboot to be able to allocate the number
of hugepages that is needed. This is because hugepages requires
large areas of contiguous physical memory. Over time, physical
memory may be mapped and allocated to pages, thus the physical
memory can become fragmented. If the hugepages are allocated early
in the boot process, fragmentation is unlikely to have
occurred.
It is recommended that the /etc/sysctl.conf file should be used
to allocate hugepages at boot time. For example, to allocate 5
hugepages at boot time add the line below to the sysctl.conf file
:
vm.nr_hugepages = 5
在C/C++程序中使用大页面
Abstract
This article is meant to be a primer to the HugeTLB feature of
the Linux kernel, which enables one to use virtual memory pages of
large sizes. First, we will go through an introduction of large
page support in the kernel, then we will see how to enable large
pages and how to use large pages from the application. Finally, we
will look into the internals of the large page support in the Linux
kernel.
We will be using terms such as "huge pages", "large pages",
"HugeTLB", etc. interchangeably in this article. This article
covers large page support for x86 based architecture, although most
of it is directly applicable to other architectures.
Introduction
From a memory management perspective, the entire physical memory
is divided into "frames" and the virtual memory is divided into
"pages". The memory management unit performs a translation of
virtual memory address to physical memory address. The information
regarding which virtual memory page maps to which physical frame is
kept in a data structure called the "Page Table". Page table
lookups are costly. In order to avoid performance hits due to this
lookup, a fast lookup cache called Translation Lookaside
Buffer(TLB) is maintained by most architectures. This lookup cache
contains the virtual memory address to physical memory address
mapping. So any virtual memory address which requires translation
to the physical memory address is first compared with the
translation lookaside buffer for a valid mapping. When a valid
address translation is not present in the TLB, it is called a "TLB
miss". If a TLB miss occurs, the memory management unit will have
to refer to the page tables to get the translation. This brings
additional performance costs, hence it is important that we try to
reduce the TLB misses.
On normal configurations of x86 based machines, the page size is
4K, but the hardware offers support for pages which are larger in
size. For example, on x86 32-bit machines (Pentiums and later)
there is support for 2Mb and 4Mb pages. Other architectures such as
IA64 support multiple page sizes. In the past Linux did not support
large pages, but with the advent of HugeTLB feature in the Linux
kernel, applications can now benefit from large pages. By using
large pages, the TLB misses are reduced. This is because when the
page size is large, a single TLB entry can span a larger memory
area. Applications which have heavy memory demands such as database
applications, HPC applications, etc. can potentially benefit from
this.
Enabling Large Page Support
Support for large pages can be included into the Linux kernel by
choosing CONFIG_HUGETLB_PAGE and CONFIG_HUGETLBFS during kernel
configuration. On a machine which has HugeTLB enabled in the
kernel, information about the Hugepages can be seen from the
/proc/meminfo. The following is an example taken from an AMD
Semptron laptop, running kernel 2.6.20.7 with HugeTLB enabled. The
information about large pages is contained in entries starting with
string "Huge".
#cat /proc/meminfo | grepHuge
HugePages_Total:0HugePages_Free:0HugePages_Rsvd:0Hugepagesize:4096 kB
We have to tell the kernel the number of large pages that needs
to be reserved for usage. An echo of the number of large pages to
be reserved, to the nr_hugepages proc sys entry. In the following
example, we reserve a maximum of 4 large pages:
#echo 4 > /proc/sys/vm/nr_hugepages
Now the kernel will have allocated the necessary large pages
(depending on the availability of memory). We can once again see
the /proc/meminfo and confirm that the kernel has indeed allocated
the large pages.
#cat /proc/meminfo | grepHuge
HugePages_Total:4HugePages_Free:4HugePages_Rsvd:0Hugepagesize:4096 kB
We can also enable the HugeTLB pages by giving "hugepages="
parameter at kernel boot. Also we can use 'sysctl' to set the
number of large pages.
How to Use Large Pages?
An application can make use of large pages in two ways. One is
by using a special shared memory region and another is by mmaping
files from the hugetlb filesystem. Especially if we want to use
private HugeTLB mapping, then mmaping files from hugetlb technique
is recommended. In this article we will concentrate on the large
page support via shared memory. We will see here how we can use an
array which is mapped into large pages from an application.
#include #include#include#include#define MB_1 (1024*1024)#define MB_8 (8*MB_1)char *a;intshmid1;voidinit_hugetlb_seg()
{
shmid1= shmget(2, MB_8, SHM_HUGETLB| IPC_CREAT |SHM_R|SHM_W);if ( shmid1 < 0) {
perror("shmget");
exit(1);
}
printf("HugeTLB shmid: 0x%x\n", shmid1);
a= shmat(shmid1, 0, 0);if (a == (char *)-1) {
perror("Shared memory attach failure");
shmctl(shmid1, IPC_RMID, NULL);
exit(2);
}
}voidwr_to_array()
{inti;for( i=0 ; i) {
a[i]= 'A';
}
}voidrd_from_array()
{int i, count = 0;for( i=0 ; i)if (a[i] == 'A') count++;if (count==i)
printf("HugeTLB read success :-)\n");elseprintf("HugeTLB read failed :-(\n");
}int main(int argc, char *argv[])
{
init_hugetlb_seg();
printf("HugeTLB memory segment initialized !\n");
printf("Press any key to write to memory area\n");
getchar();
wr_to_array();
printf("Press any key to rd from memory area\n");
getchar();
rd_from_array();
shmctl(shmid1, IPC_RMID, NULL);return 0;
}
The above program is just like any other program which uses
shared memory. First, we initialize the shared memory segment with
an additional flag SHM_HUGETLB for getting large page-based shared
memory. Then we attach the shared memory segment to the program.
Following this, we write to the shared memory area in the function
call 'wr_to_array'. And finally we verify whether the data has been
written properly by reading back the data in the function
'rd_from_array'.
Example program execution - using large pages
Now let us compile the program and run it.
#cc hugetlb-array.c -o hugetlb-array -Wall
#./hugetlb-array
HugeTLB shmid:0x40000HugeTLB memory segment initialized!Press any key towrite to memory area
At this point in time if we check the status of the HugeTLB
pages in the /proc/meminfo, it will show that 2 pages, i.e. 8MB of
memory area are reserved. All the large pages will still be shown
as free, as we have not yet started using the memory area.
#cat /proc/meminfo | grepHuge
HugePages_Total:4HugePages_Free:4HugePages_Rsvd:2Hugepagesize:4096 kB
Press key at the program input, which will result in the writing
to the allocated HugeTLB memory location. Now the memory segment
which was allocated will be used. This will move the 2 large pages
to allocated state. We can see this in the /proc/meminfo as
HugePages_Free shows only 2.
#cat /proc/meminfo | grepHuge
HugePages_Total:4HugePages_Free:2HugePages_Rsvd:0Hugepagesize:4096 kB
The following message will appear now
Press any key to rd from memory area
Finally when we press a key at the program input, the program
will check whether the data which was written is indeed present in
the HugeTLB area. If everything goes fine we will get a hugetlb
smiley.
HugeTLB read success :-)
Internals of large page support
Inside the Linux kernel, large page support is implemented in
two parts. The first part consists of a global pool of large pages
which are allocated and kept reserved for providing large pages
support to applications. The global pool of large pages is built by
allocating physically contiguous pages (of large page sizes) using
normal kernel memory allocation APIs. Second part consists of the
kernel itself allocating large pages from this pool to applications
that request them.
We will first see the internals of how the large pages are
initialized and how the global pools are filled up. Then we will
see how shared memory can be used by application to leverage the
large pages and how the physical pages actually get allocated by
means of page fault. We will not perform a line-to-line code walk
through; instead we will go through the main parts of the code
relevant to large pages.
Large Page initialization
In the Linux kernel source code (in file mm/hugetlb.c) we have
the function "hugetlb_init" which allocates multiple physically
contiguous pages of normal page size to form clusters of pages
which can be used for large page sizes. The number of pages which
are allocated like this depends on the value of "max_huge_pages"
variable. This number can be passed on as a kernel command line
option by using the 'hugepages' parameter. The large page size
allocated depends on the macro HUGETLB_PAGE_ORDER which in turn
depends on HPAGE_SHIFT macro. For example this macro is assigned
the value 22 (when PAE in not enabled) on an x86 based
architecture. This means that the size of large page allocated will
be 4Mb. Note that the large page size depends on architecture and
corresponding supported page sizes.
The pages allocated as mentioned previously are enqueued into
"hugepage_freelists" for the respective node, where the page is
allocated from, by the function 'enqueue_huge_page'. Each memory
node (in case of NUMA) will have one hugepage_freelists. When the
large pages are allocated dynamically as in the example (by echoing
the value to proc) or by other dynamic methods, a similar sequence
of events occurs, as explained during the static allocation of
large pages.
In order to use a shared memory area, we will have to create it.
This, as we have seen before, is done by the 'shmget' system call.
This system call will invoke the kernel function 'sys_shmget' which
in turn calls 'newseg'. In 'newseg' a check is made to confirm if
the user has asked for the creation of a HugeTLB shared memory
area. If the user has specified the large page flag SHM_HUGETLB,
then the file operations corresponding to this file structure will
be assigned to 'hugetlbfs_file_operations'. The large pages gets
reserved by the function 'hugetlb_reserve_pages' which will
increment the reserve pages count - resv_huge_pages which shows up
as 'HugePages_Rsvd'in the proc.
When the system call 'sys_shmat' is made, address alignment
check and other sanity checks are done by using
'hugetlb_get_unmapped_area' function.
Large page fault and physical page allocation
When a page fault occurs, the "vma" which corresponds to the
address is found. The vma which corresponds to a hugetlb shared
memory location will have 'vma-> vm_flags' set as
'VM_HUGETLB', and is detected by calling 'is_vm_hugetlb_page'. When
a hugetlb vma is found the 'hugetlb_fault' function is called. This
procedure sets up large page flag in the page directory entry then
allocates a huge page based on a copy-on-write logic from the
global pool of large pages initialized previously. The large page
size itself is set in the hardware by setting the _PAGE_PSE flag in
the pgd(the 7th bit, starting from 0th bit, in cases without PAE
for x86).
Where to go from here?
Detailed documentation with advanced examples can be found in
the file Documentation/vm/hugetlbpage.txt which comes with Linux
kernel source code.
The HugeTLB feature inside the kernel is not application
transparent, in the sense that we need to explicitly make
modifications (i.e. have to insert code which uses shared memory or
HugeTLB fs) to the application to make use of large pages. For
folks who are interested in application transparent implementations
of large page support, an internet search for "Transparent
superpages" will get you to Web sites containing details of such
implementations.
Links
Improving enterprise database performance on
Linux: http://www.linuxsymposium.org/2003/view_abstract.php?talk=55
TLB wikipedia
entry: http://en.wikipedia.org/wiki/Translation_Lookaside_Buffer
HugeTLB kernel documentation link from kernel source
online: http://lxr.linux.no/source/Documentation/vm/hugetlbpage.txt
Conclusion
We have seen how the Linux kernel provides applications with the
ability to use large pages. We went through methods to enable and
use large pages. After that we skimmed through the internals of the
HugeTLB implementation inside the kernel.
Acknowledgements
I would like to extend my sincere thanks to Kenneth Chen for
giving me better insights into HugeTLB code, for answering my
questions with patience and for the review of an initial draft of
this article. I would also like to thank Pramode Sir, Badri, Malay,
Shijesta and Chitkala for review and feedback.