linux大页内存使用,大页内存

原文链接:

http://linux.cloudibee.com/2007/09/linux-hugepages/

http://linuxgazette.net/155/krishnakumar.html

在看dpdk时官网即提到配置hugepage,配置方法很简单,参照以下文章。

简述:配置大页内存是为了从虚拟地址到物理地址映射的速度,默认访问一次物理地址需要2次操作,查页表,访问真实物理地址,通过tlb可以加快访问速度,但是tlb会存在缺页现象,通过调整大页内存可以减少缺页现象。

不知道上述理解是否有问题。有空再仔细研究一下。

摘文:

配置Linux系统大内存

Hugepages is a mechanism that allows the Linux kernel to utilize

the multiple page size capabilities of modern hardware

architectures. Linux uses pages as the basic unit of memory, where

physical memory is partitioned and accessed using the basic page

unit. The default page size is 4096 Bytes in the x86

architecture.Hugepages allows large amounts of memory to be

utilized with a reduced overhead. Linux uses “Transaction Lookaside

Buffers” (TLB) in the CPU architecture. These buffers contain

mappings of virtual memory to actual physical memory addresses. So

utilizing a huge amount of physical memory with the default page

size consumes the TLB and adds processing overhead.

The Linux kernel is able to set aside a portion of physical

memory to be able be addressed using a larger page size. Since the

page size is higher, there will be less overhead managing the pages

with the TLB. In the Linux 2.6 series of kernels, hugepages is

enabled using the CONFIG_HUGETLB_PAGE feature when the kernel is

built. Systems with large amount of memory can be configured to

utilize the memory more efficiently by setting aside a portion

dedicated for hugepages. The actual size of the page is dependent

on the system architecture.

A typical x86 system will have a Huge Page Size of 2048 kBytes.

The huge page size may be found by looking at the /proc/meminfo

:

# cat /proc/meminfo |grep Hugepagesize

Number of Hugepages can be allocated using the

/proc/sys/vm/nr_hugepages entry, or by using the sysctl

command. To view the current setting using the

/proc entry:

# cat /proc/sys/vm/nr_hugepages

To view the current setting using the sysctl

command:

# sysctl vm.nr_hugepages

To set the number of huge pages using /proc

entry:

# echo 5 > /proc/sys/vm/nr_hugepages

To set the number of hugepages using

sysctl:

# sysctl -w vm.nr_hugepages=5

It may be necessary to reboot to be able to allocate the number

of hugepages that is needed. This is because hugepages requires

large areas of contiguous physical memory. Over time, physical

memory may be mapped and allocated to pages, thus the physical

memory can become fragmented. If the hugepages are allocated early

in the boot process, fragmentation is unlikely to have

occurred.

It is recommended that the /etc/sysctl.conf file should be used

to allocate hugepages at boot time. For example, to allocate 5

hugepages at boot time add the line below to the sysctl.conf file

:

vm.nr_hugepages = 5

在C/C++程序中使用大页面

Abstract

This article is meant to be a primer to the HugeTLB feature of

the Linux kernel, which enables one to use virtual memory pages of

large sizes. First, we will go through an introduction of large

page support in the kernel, then we will see how to enable large

pages and how to use large pages from the application. Finally, we

will look into the internals of the large page support in the Linux

kernel.

We will be using terms such as "huge pages", "large pages",

"HugeTLB", etc. interchangeably in this article. This article

covers large page support for x86 based architecture, although most

of it is directly applicable to other architectures.

Introduction

From a memory management perspective, the entire physical memory

is divided into "frames" and the virtual memory is divided into

"pages". The memory management unit performs a translation of

virtual memory address to physical memory address. The information

regarding which virtual memory page maps to which physical frame is

kept in a data structure called the "Page Table". Page table

lookups are costly. In order to avoid performance hits due to this

lookup, a fast lookup cache called Translation Lookaside

Buffer(TLB) is maintained by most architectures. This lookup cache

contains the virtual memory address to physical memory address

mapping. So any virtual memory address which requires translation

to the physical memory address is first compared with the

translation lookaside buffer for a valid mapping. When a valid

address translation is not present in the TLB, it is called a "TLB

miss". If a TLB miss occurs, the memory management unit will have

to refer to the page tables to get the translation. This brings

additional performance costs, hence it is important that we try to

reduce the TLB misses.

On normal configurations of x86 based machines, the page size is

4K, but the hardware offers support for pages which are larger in

size. For example, on x86 32-bit machines (Pentiums and later)

there is support for 2Mb and 4Mb pages. Other architectures such as

IA64 support multiple page sizes. In the past Linux did not support

large pages, but with the advent of HugeTLB feature in the Linux

kernel, applications can now benefit from large pages. By using

large pages, the TLB misses are reduced. This is because when the

page size is large, a single TLB entry can span a larger memory

area. Applications which have heavy memory demands such as database

applications, HPC applications, etc. can potentially benefit from

this.

Enabling Large Page Support

Support for large pages can be included into the Linux kernel by

choosing CONFIG_HUGETLB_PAGE and CONFIG_HUGETLBFS during kernel

configuration. On a machine which has HugeTLB enabled in the

kernel, information about the Hugepages can be seen from the

/proc/meminfo. The following is an example taken from an AMD

Semptron laptop, running kernel 2.6.20.7 with HugeTLB enabled. The

information about large pages is contained in entries starting with

string "Huge".

#cat /proc/meminfo | grepHuge

HugePages_Total:0HugePages_Free:0HugePages_Rsvd:0Hugepagesize:4096 kB

We have to tell the kernel the number of large pages that needs

to be reserved for usage. An echo of the number of large pages to

be reserved, to the nr_hugepages proc sys entry. In the following

example, we reserve a maximum of 4 large pages:

#echo 4 > /proc/sys/vm/nr_hugepages

Now the kernel will have allocated the necessary large pages

(depending on the availability of memory). We can once again see

the /proc/meminfo and confirm that the kernel has indeed allocated

the large pages.

#cat /proc/meminfo | grepHuge

HugePages_Total:4HugePages_Free:4HugePages_Rsvd:0Hugepagesize:4096 kB

We can also enable the HugeTLB pages by giving "hugepages="

parameter at kernel boot. Also we can use 'sysctl' to set the

number of large pages.

How to Use Large Pages?

An application can make use of large pages in two ways. One is

by using a special shared memory region and another is by mmaping

files from the hugetlb filesystem. Especially if we want to use

private HugeTLB mapping, then mmaping files from hugetlb technique

is recommended. In this article we will concentrate on the large

page support via shared memory. We will see here how we can use an

array which is mapped into large pages from an application.

#include #include#include#include#define MB_1 (1024*1024)#define MB_8 (8*MB_1)char *a;intshmid1;voidinit_hugetlb_seg()

{

shmid1= shmget(2, MB_8, SHM_HUGETLB| IPC_CREAT |SHM_R|SHM_W);if ( shmid1 < 0) {

perror("shmget");

exit(1);

}

printf("HugeTLB shmid: 0x%x\n", shmid1);

a= shmat(shmid1, 0, 0);if (a == (char *)-1) {

perror("Shared memory attach failure");

shmctl(shmid1, IPC_RMID, NULL);

exit(2);

}

}voidwr_to_array()

{inti;for( i=0 ; i) {

a[i]= 'A';

}

}voidrd_from_array()

{int i, count = 0;for( i=0 ; i)if (a[i] == 'A') count++;if (count==i)

printf("HugeTLB read success :-)\n");elseprintf("HugeTLB read failed :-(\n");

}int main(int argc, char *argv[])

{

init_hugetlb_seg();

printf("HugeTLB memory segment initialized !\n");

printf("Press any key to write to memory area\n");

getchar();

wr_to_array();

printf("Press any key to rd from memory area\n");

getchar();

rd_from_array();

shmctl(shmid1, IPC_RMID, NULL);return 0;

}

The above program is just like any other program which uses

shared memory. First, we initialize the shared memory segment with

an additional flag SHM_HUGETLB for getting large page-based shared

memory. Then we attach the shared memory segment to the program.

Following this, we write to the shared memory area in the function

call 'wr_to_array'. And finally we verify whether the data has been

written properly by reading back the data in the function

'rd_from_array'.

Example program execution - using large pages

Now let us compile the program and run it.

#cc hugetlb-array.c -o hugetlb-array -Wall

#./hugetlb-array

HugeTLB shmid:0x40000HugeTLB memory segment initialized!Press any key towrite to memory area

At this point in time if we check the status of the HugeTLB

pages in the /proc/meminfo, it will show that 2 pages, i.e. 8MB of

memory area are reserved. All the large pages will still be shown

as free, as we have not yet started using the memory area.

#cat /proc/meminfo | grepHuge

HugePages_Total:4HugePages_Free:4HugePages_Rsvd:2Hugepagesize:4096 kB

Press key at the program input, which will result in the writing

to the allocated HugeTLB memory location. Now the memory segment

which was allocated will be used. This will move the 2 large pages

to allocated state. We can see this in the /proc/meminfo as

HugePages_Free shows only 2.

#cat /proc/meminfo | grepHuge

HugePages_Total:4HugePages_Free:2HugePages_Rsvd:0Hugepagesize:4096 kB

The following message will appear now

Press any key to rd from memory area

Finally when we press a key at the program input, the program

will check whether the data which was written is indeed present in

the HugeTLB area. If everything goes fine we will get a hugetlb

smiley.

HugeTLB read success :-)

Internals of large page support

Inside the Linux kernel, large page support is implemented in

two parts. The first part consists of a global pool of large pages

which are allocated and kept reserved for providing large pages

support to applications. The global pool of large pages is built by

allocating physically contiguous pages (of large page sizes) using

normal kernel memory allocation APIs. Second part consists of the

kernel itself allocating large pages from this pool to applications

that request them.

We will first see the internals of how the large pages are

initialized and how the global pools are filled up. Then we will

see how shared memory can be used by application to leverage the

large pages and how the physical pages actually get allocated by

means of page fault. We will not perform a line-to-line code walk

through; instead we will go through the main parts of the code

relevant to large pages.

Large Page initialization

In the Linux kernel source code (in file mm/hugetlb.c) we have

the function "hugetlb_init" which allocates multiple physically

contiguous pages of normal page size to form clusters of pages

which can be used for large page sizes. The number of pages which

are allocated like this depends on the value of "max_huge_pages"

variable. This number can be passed on as a kernel command line

option by using the 'hugepages' parameter. The large page size

allocated depends on the macro HUGETLB_PAGE_ORDER which in turn

depends on HPAGE_SHIFT macro. For example this macro is assigned

the value 22 (when PAE in not enabled) on an x86 based

architecture. This means that the size of large page allocated will

be 4Mb. Note that the large page size depends on architecture and

corresponding supported page sizes.

The pages allocated as mentioned previously are enqueued into

"hugepage_freelists" for the respective node, where the page is

allocated from, by the function 'enqueue_huge_page'. Each memory

node (in case of NUMA) will have one hugepage_freelists. When the

large pages are allocated dynamically as in the example (by echoing

the value to proc) or by other dynamic methods, a similar sequence

of events occurs, as explained during the static allocation of

large pages.

In order to use a shared memory area, we will have to create it.

This, as we have seen before, is done by the 'shmget' system call.

This system call will invoke the kernel function 'sys_shmget' which

in turn calls 'newseg'. In 'newseg' a check is made to confirm if

the user has asked for the creation of a HugeTLB shared memory

area. If the user has specified the large page flag SHM_HUGETLB,

then the file operations corresponding to this file structure will

be assigned to 'hugetlbfs_file_operations'. The large pages gets

reserved by the function 'hugetlb_reserve_pages' which will

increment the reserve pages count - resv_huge_pages which shows up

as 'HugePages_Rsvd'in the proc.

When the system call 'sys_shmat' is made, address alignment

check and other sanity checks are done by using

'hugetlb_get_unmapped_area' function.

Large page fault and physical page allocation

When a page fault occurs, the "vma" which corresponds to the

address is found. The vma which corresponds to a hugetlb shared

memory location will have 'vma-> vm_flags' set as

'VM_HUGETLB', and is detected by calling 'is_vm_hugetlb_page'. When

a hugetlb vma is found the 'hugetlb_fault' function is called. This

procedure sets up large page flag in the page directory entry then

allocates a huge page based on a copy-on-write logic from the

global pool of large pages initialized previously. The large page

size itself is set in the hardware by setting the _PAGE_PSE flag in

the pgd(the 7th bit, starting from 0th bit, in cases without PAE

for x86).

Where to go from here?

Detailed documentation with advanced examples can be found in

the file Documentation/vm/hugetlbpage.txt which comes with Linux

kernel source code.

The HugeTLB feature inside the kernel is not application

transparent, in the sense that we need to explicitly make

modifications (i.e. have to insert code which uses shared memory or

HugeTLB fs) to the application to make use of large pages. For

folks who are interested in application transparent implementations

of large page support, an internet search for "Transparent

superpages" will get you to Web sites containing details of such

implementations.

Links

Improving enterprise database performance on

Linux: http://www.linuxsymposium.org/2003/view_abstract.php?talk=55

TLB wikipedia

entry: http://en.wikipedia.org/wiki/Translation_Lookaside_Buffer

HugeTLB kernel documentation link from kernel source

online: http://lxr.linux.no/source/Documentation/vm/hugetlbpage.txt

Conclusion

We have seen how the Linux kernel provides applications with the

ability to use large pages. We went through methods to enable and

use large pages. After that we skimmed through the internals of the

HugeTLB implementation inside the kernel.

Acknowledgements

I would like to extend my sincere thanks to Kenneth Chen for

giving me better insights into HugeTLB code, for answering my

questions with patience and for the review of an initial draft of

this article. I would also like to thank Pramode Sir, Badri, Malay,

Shijesta and Chitkala for review and feedback.

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值