High Memory In The Linux Kernel(Cited)

From:http://kerneltrap.org/node/2450  Amit Shah

As RAM increasingly becomes a commodity, the prices drop and computer users are able to buy more. 32-bit archictectures face certain limitations in regards to accessing these growing amounts of RAM. To better understand the problem and the various solutions, we begin with an overview of Linux memory management. Understanding how basic memory management works, we are better able to define the problem, and finally to review the various solutions.

This article was written by examining the Linux 2.6 kernel source code for the x86 architecture types.


Overview of Linux memory management

32-bit architectures can reference 4 GB of physical memory (2^32). Processors that have an MMU (Memory Management Unit) support the concept of virtual memory: page tables are set up by the kernel which map "virtual addresses" to "physical addresses"; this basically means that each process can access 4 GB of memory, thinking it's the only process running on the machine (much like multi-tasking, in which each process is made to think that it's the only process executing on a CPU).

The virtual address to physical address mappings are done by the kernel. When a new process is "fork()"ed, the kernel creates a new set of page tables for the process. The addresses referenced within a process in user-space are virtual addresses. They do not necessarily map directly to the same physical address. The virtual address is passed to the MMU (Memory Management Unit of the processor) which converts it to the proper physical address based on the tables set up by the kernel. Hence, two processes can refer to memory address 0x08329, but they would refer to two different locations in memory.

The Linux kernel splits the 4 GB virtual address space of a process in two parts: 3 GB and 1 GB. The lower 3 GB of the process virtual address space is accessible as the user-space virtual addresses and the upper 1 GB space is reserved for the kernel virtual addresses. This is true for all processes.

+----------+ 4 GB | | | | | | | Kernel | | | +----------+ | Virtual | | | | | | | | Space | | High | | | | | | (1 GB) | | Memory | | | | | | | | (unused) | +----------+ 3 GB +----------+ 1 GB | | | | | | | | | | | | | | | Kernel | | | | | | | | Physical | | | | | |User-space| | Space | | | | | | Virtual | | | | | | | | Space | | | | | | | | (3 GB) | +----------+ 0 GB | | | | Physical | | Memory | | | | | | | | | | +----------+ 0 GB Virtual Memory

The kernel virtual area (3 - 4 GB address space) maps to the first 1 GB of physical RAM. The 3 GB addressable RAM available to each process is mapped to the available physical RAM.

The Problem

So, the basic problem here is, the kernel can just address 1 GB of virtual addresses, which can translate to a maximum of 1 GB of physical memory. This is because the kernel directly maps all available kernel virtual space addresses to the available physical memory.

Solutions

There are some solutions which address this problem:

  1. 2G / 2G, 1G / 3G split
  2. HIGHMEM solution for using up to 4 GB of memory
  3. HIGHMEM solution for using up to 64 GB of memory

1. 2G / 2G, 1G / 3G split

Instead of splitting the virtual address space the traditional way of 3G / 1G (3 GB for user-space, 1 GB for kernel space), third-party patches exist to split the virtual address space 2G / 2G or 1G / 3G. The 1G / 3G split is a bit extreme in that you can map up to 3 GB of physical memory, but user-space applications cannot grow beyond 1 GB. It could work for simple applications; but if one has more than 3 GB of physical RAM, he / she won't run simple applications on it, right?

The 2G / 2G split seems to be a balanced approach to using RAM more than 1 GB without using the HIGHMEM patches. However, server applications like databases always want as much virtual addressing space as possible; so this approach may not work in those scenarios.

There's a patch for 2.4.23 that includes a config-time option of selecting the user / kernel split values by Andrea Arcangeli. It is available at his kernel page. It's a simple patch and making it work on 2.6 should not be too difficult.

Before looking at solutions 2 & 3, let's take a look at some more Linux Memory Management issues.

Zones

In Linux, the memory available from all banks is classified into "nodes". These nodes indicate how much memory each bank has. This classification is mainly useful for NUMA architectures, but it's also used for UMA architectures, where the number of nodes is just 1.

Memory in each node is divided into "zones". The zones currently defined are ZONE_DMA, ZONE_NORMAL and ZONE_HIGHMEM.

ZONE_DMA is used by some devices for data transfer and is mapped in the lower physical memory range (up to 16 MB).

Memory in the ZONE_NORMAL region is mapped by the kernel in the upper region of the linear address space. Most operations can only take place in ZONE_NORMAL; so this is the most performance critical zone. ZONE_NORMAL goes from 16 MB to 896 MB.

To address memory from 1 GB onwards, the kernel has to map pages from high memory into ZONE_NORMAL.

Some area of memory is reserved for storing several kernel data structures that store information about the memory map and page tables. This on x86 is 128 MB. Hence, of the 1 GB physical memory the kernel can access, 128MB is reserved. This means that the kernel virtual address in this 128 MB is not mapped to physical memory. This leaves a maximum of 896 MB for ZONE_NORMAL. So, even if one has 1 GB of physical RAM, just 896 MB will be actually available.

Back to the solutions:

2. HIGHMEM solution for using up to 4 GB of memory

Since Linux can't access memory which hasn't been directly mapped into its address space, to use memory > 1 GB, the physical pages have to be mapped in the kernel virtual address space first. This means that the pages in ZONE_HIGHMEM have to be mapped in ZONE_NORMAL before they can be accessed.

The reserved space which we talked about earlier (in case of x86, 128 MB) has an area in which pages from high memory are mapped into the kernel address space.

To create a permanent mapping, the "kmap" function is used. Since this function may sleep, it may not be used in interrupt context. Since the number of permanent mappings is limited (if not, we could've directly mapped all the high memory in the address space), pages mapped this way should be "kunmap"ped when no longer needed.

Temporary mappings can be created via "kmap_atomic". This function doesn't block, so it can be used in interrupt context. "kunmap_atomic" un-maps the mapped high memory page. A temporary mapping is only available as long as the next temporary mapping. However, since the mapping and un-mapping functions also disable / enable preemption, it's a bug to not kunmap_atomic a page mapped via kmap_atomic.

3. HIGHMEM solution for using 64 GB of memory

This is enabled via the PAE (Physical Address Extension) extension of the PentiumPro processors. PAE addresses the 4 GB physical memory limitation and is seen as Intel's answer to AMD 64-bit and AMD x86-64. PAE allows processors to access physical memory up to 64 GB (36 bits of address bus). However, since the virtual address space is just 32 bits wide, each process can't grow beyond 4 GB. The mechanism used to access memory from 4 GB to 64 GB is essentially the same as that of accessing the 1 GB - 4 GB RAM via the HIGHMEM solution discussed above.

Should I enable CONFIG_HIGHMEM for my 1 GB RAM system?

It is advised to not enable CONFIG_HIGHMEM in the kernel to utilize the extra 128 MB you get for your 1 GB RAM system. I/O Devices cannot directly address high memory from PCI space, so bounce buffers have to be used. Plus the virtual memory management and paging costs come with extra mappings. For details on bounce buffers, refer to Mel Gorman's documentation (link below).

For more information, see

 

 

Re: I have trouble understanding... 
laudneyonFebruary 21, 2004 - 2:05pm
This article is not clear in this part. The crucial point is: physical memery is mapped != physical memory is used. It means necessary 'page tables' have been set up so that the physical memory is "addressable" from virtual memory space, either process or kernel. The kernel uses its 3-4G address space to map the physical memory from the very start. This doesn't prevent processes from mapping certain physical memory pages into their 1-3G address space, by "malloc" or "brk" system calls.

This article should have mentioned that process can run in "user mode" or "kernel mode". When in user mode, the process can only access 1-3G address space, while in "kernel mode", i.e. kernel runs in the process context and executes on behalf of the process, all 4G address space can be accessed. Besides, physical memory allocated to processes can all be accessed by the kernel, while kernel private data strutures stored in certain physical memory pages is invisible to processes, otherwise any process can mess up the kernel. That's why kernel needs to map all physical memory into its address space, either directly (no high mem) or selectively (high mem).

-- Laudney

reply

I'm not sure I understand you 
AnonymousonFebruary 22, 2004 - 5:14am
I'm not sure I understand your concern.

The mapping of virtual addresses to physical addresses is pretty arbitrary (with a granularity of 4k on x86). On a machine with less than 1GB RAM, userspace virtual memory with addresses above 1 GB are simply mapped to physical addresses < 1 GB if they are mapped to RAM at all.

It's not actually 3 GB of addressable RAM that is given to each process (which is where you may be getting confused). Rather, each process has 3 GB of address space to work with. Some of the data in that address space will probably be stored in physical RAM. Some of the data in that address space will probably be stored on disk and not in physical RAM. And what portion of physical RAM each 4k chunk of virtual memory corresponds to (if any) is pretty arbitrary in that 3 GB for processes.

I feel it is a mistake to consider the 3GB address spaces given to processes as RAM. The 3GB address spaces are abstractions that differ significantly from RAM, and from the physical address space abstraction. The 896 MB portion of the kernel address space can reasonably be considered RAM (or rather whatever portion is mapped to RAM for systems under 1 GB). This, I suspect, is very helpful. The virtual and physical addresses differ, but the mapping between the two is simple. But for systems > 1GB RAM, there is not enough kernel virtual address to map to all the RAM, and so some of the RAM needs to be treated specially.

reply

Re: I have trouble understanding... 
Amit ShahonFebruary 23, 2004 - 7:21am
You're right in saying that physical addresses are sequential. On an offtopic note, there are patches which allow non-sequential physical memory regions: badram and NUMA.

I've written: "The 3 GB addressable RAM available to each process is mapped to the available physical RAM."

This means that the userspace application can grow to a maximum of 3 GB. It is not necessary to have 3 GB of physical memory to map it all. The kernel "swaps out" some of the physical pages to make room for processes that might need more RAM. If a process accesses a page that has been swapped out, it is brought back in RAM (possibly after swapping out some other page).

This is a wonderful concept, take a look at some OS theory or i386 manuals for the complete description.

--
Amit Shah
http://amitshah.nav.to/

reply


That's not what I was objecti 
AnonymousonFebruary 23, 2004 - 12:21pm
That's not what I was objecting to. You said : "The kernel virtual area (3 - 4 GB address space) maps to the first 1 GB of physical RAM." In fact this is misleading, since it gives the feeling that the whole first GB of physical RAM is reserved for kernel use, which it is certainly not (otherwise a machine with 1.5 GB RAM would only give 512 MB max physical RAM to userspace processes...). The ascii drawing is misleading too.

reply 
Re: I have trouble understanding... 
Amit ShahonFebruary 24, 2004 - 2:49am
it gives the feeling that the whole first GB of physical RAM is reserved for kernel use

In fact, it is. The kernel has to have control over the whole memory. It's the kernel's job to allocate/deallocate RAM to processes.

What the kernel does is, maps the entire available memory in its own space, so that it can access any memory region. It then gives out free pages to processes which need them.

The userspace processes cannot of its own allocate a page to itself. It has to request the kernel to give it some memory area. Once the kernel has mapped a physical page in the process's space, it can use that extra memory.

reply


"What the kernel does is, map 
AnonymousonFebruary 24, 2004 - 8:35am
"What the kernel does is, maps the entire available memory in its own space, so that it can access any memory region. It then gives out free pages to processes which need them."

Thanks, it's a lot clearer now! You should add these precisions to your article.

reply


Swap 
AnonymousonFebruary 22, 2004 - 4:38am
I've got 2 question :
- Is the swap accounted as available RAM ? For instance, is 2Gb RAM + 4Gb Swap over the 4Gb limit ?
- The HIGHMEM I/O support is not explained (recent in 2.4) : does the bouncing the article talk about remain true if this option is set ?

reply 
Re: Swap 
AnonymousonFebruary 22, 2004 - 3:37pm
1) Swap is not accounted as available RAM for this. There is no limit on the sum of swap + RAM, only a limit on swap's size (man mkswap says it is 2Gb). And with 2Gb of RAM you must use HIGHEM support since it is more than 1 Gb of RAM.

2) No idea about this, but the Documentation/Configure.help seems to say you are right. But it seems that drivers must do some work themselves to use it. Actually, for what I could check the scsi driver support it: grep through the kernel for "blk_nohighio". Only drivers/scsi/ contains references to it and claims in comment to support it.

However, at the very beginning of 2.5 the block API was completely changed; the 2.6 drivers using the new API (almost all) can more easily support Highmem, if I'm not wrong. See in the 2.6 kernel sources the file Documentation/block/biodoc.txt.

reply 


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值