Chapter 8. Memory Management
8.1. Page Frame Management
8.1.1. Page Descriptors
State information of a page frame is kept in a page descriptor of type page
All page descriptors are stored in the mem_map array.
virt_to_page(addr)
pfn_to_page(pfn)
8.1.2. Non-Uniform Memory Access (NUMA)
The physical memory inside each node can be split into several zones, as we will see in the next
section. Each node has a descriptor of type pg_data_t,
8.1.3. Memory Zones
Linux 2.6 partitions the physical memory of every memory node
into three zones. In the 80 x 86 UMA architecture the zones are:
ZONE_DMA
Contains page frames of memory below 16 MB
ZONE_NORMAL
Contains page frames of memory at and above 16 MB and below 896 MB
ZONE_HIGHMEM
Contains page frames of memory at and above 896 MB
The ZONE_DMA and ZONE_NORMAL zones include the "normal" page frames that can be directly accessed
by the kernel through the linear mapping in the fourth gigabyte of the linear address space (see the
section "Kernel Page Tables" in Chapter 2). Conversely, the ZONE_HIGHMEM zone includes page frames
that cannot be directly accessed by the kernel through the linear mapping in the fourth gigabyte of
linear address space (see the section "Kernel Mappings of High-Memory Page Frames" later in this
chapter). The ZONE_HIGHMEM zone is always empty on 64-bit architectures.
Each memory zone has its own descriptor of type zone. Its fields are shown in Table 8-4.
8.1.4. The Pool of Reserved Page Frames
min_free_kbytes,
initially min_free_kbytes cannot be lower than 128 and greater than 65,536
The pages_min field of the zone descriptor stores the number of reserved page frames inside the
zone. As we'll see in Chapter 17, this field plays also a role for the page frame reclaiming algorithm,
together with the pages_low and pages_high fields. The pages_low field is always set to 5/4 of the
value of pages_min, and pages_high is always set to 3/2 of the value of pages_min
8.1.5. The Zoned Page Frame Allocator
8.1.5.1. Requesting and releasing page frames
alloc_pages(gfp_mask, order)
alloc_page(gfp_mask)
Macro used to request 2order contiguous page frames. It returns the address of the descriptor
of the first allocated page frame or returns NULL if the allocation failed.
_ _get_free_pages(gfp_mask, order
_ _get_free_page(gfp_mask)
get_zeroed_page(gfp_mask)
_ _get_dma_pages(gfp_mask, order)
but it returns the linear address of the first allocated page.
_ _free_pages(page, order)
_ _free_page(page)
This function checks the page descriptor pointed to by page; if the page frame is not reserved
(i.e., if the PG_reserved flag is equal to 0), it decreases the count field of the descriptor. If
count becomes 0, it assumes that 2order contiguous page frames starting from the one
corresponding to page are no longer used. In this case, the function releases the page frames
as explained in the later section
free_pages(addr, order)
free_page(addr)
but it receives as an argument the linear address addr of the first page frame to be released.
8.1.6. Kernel Mappings of High-Memory Page Frames????
The kernel uses three different mechanisms to map page frames in high memory; they are called
permanent kernel mapping, temporary kernel mapping, and noncontiguous memory allocation. In
this section, we'll cover the first two techniques; the third one is discussed in the section
"Noncontiguous Memory Area Management" later in this chapter
8.1.6.1. Permanent kernel mappings
page_address( );
The page_address( ) function returns the linear address associated with the page frame, or NULL if the page frame is in high memory and is not mapped.
kmap_high()
The kmap_high( ) function is invoked if the page frame really belongs to high memory.
kunmap( )
The kunmap( ) function destroys a permanent kernel mapping established previously by kmap( ).
8.1.6.2. Temporary kernel mappings
kmap_atomic( )
8.1.7. The Buddy System Algorithm
The technique adopted by Linux to solve the external fragmentation problem is based on the wellknown
buddy system algorithm. All free page frames are grouped into 11 lists of blocks that contain
groups of 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, and 1024 contiguous page frames, respectively. The
largest request of 1024 page frames corresponds to a chunk of 4 MB of contiguous RAM. The
physical address of the first page frame of a block is a multiple of the group size.for example, the
initial address of a 16-page-frame block is a multiple of 16 x 212 (212 = 4,096, which is the regular
page size).
8.1.7.1. Data structures
1:zone->zone_mem_map Pointer to first page descriptor of the zone.
2:An array consisting of eleven elements of type free_area, one element for each group size.
The array is stored in the free_area field of the zone descriptor.
zone->free_area [k]
8.1.7.2. Allocating a block
The _ _rmqueue( ) function is used to find a free block in a zone
8.1.7.3. Freeing a block
_ _free_pages_bulk( )/__free_one_page()
function implements the buddy system strategy for freeing page frames
8.1.8. The Per-CPU Page Frame Cache
The main data structure implementing the per-CPU page frame cache is an array of per_cpu_pageset
data structures stored in the pageset field of the memory zone descriptor. The array includes one
element for each CPU; this element, in turn, consists of two per_cpu_pages descriptors, one for the
hot cache and the other for the cold cache. The fields of the per_cpu_pages descriptor are listed in
Table 8-7. The fields of the per_cpu_pages descriptor
Type Name Description
int count Number of pages frame in the cache
int low Low watermark for cache replenishing
int high High watermark for cache depletion
int batch Number of page frames to be added or subtracted from the cache
struct list_head list List of descriptors of the page frames included in the cache
8.1.8.1. Allocating page frames through the per-CPU page frame caches
buffered_rmqueue( )
8.1.8.2. Releasing page frames to the per-CPU page frame caches
free_hot_cold_page( )
8.1.9. The Zone Allocator
_ _alloc_pages( )-->zone_watermark_ok( )
_ _free_pages( )-->__free_one_page()
8.2. Memory Area Management
8.2.1. The Slab Allocator
Figure 8-3. The slab allocator components
8.2.2. Cache Descriptor
1: Each cache is described by a structure of type kmem_cache_t(eg:kmem_cache)
Table 8-8. The fields of the kmem_cache_t descriptor
Type Name Description
struct array_cache *array[] array Per-CPU array of pointers to local caches of free objects (see the section "Local Caches of Free Slab Objects" later in this chapter).
unsigned int batchcount Number of objects to be transferred in bulk to or from the local caches.
unsigned int limit Maximum number of free objects in the local caches. This is tunable.
struct kmem_list3 lists See next table.
unsigned int objsize Size of the objects included in the cache
unsigned int flags Set of flags that describes permanent properties of the cache.
unsigned int num Number of objects packed into a single slab. (All slabs of the cache
have the same size.)
unsigned int free_limit Upper limit of free objects in the whole slab cache
spinlock_t spinlock Cache spin lock.
unsigned int gfporder Logarithm of the number of contiguous page frames included in a single slab.
unsigned int gfpflags Set of flags passed to the buddy system function when allocating page frames.
size_t colour Number of colors for the slabs (see the section "Slab Coloring" later
in this chapter).
unsigned int colour_off Basic alignment offset in the slabs.
unsigned int colour_next Color to use for the next allocated slab.
kmem_cache_t* slabp_cache Pointer to the general slab cache containing the slab descriptors
(NULL if internal slab descriptors are used; see next section).
unsigned int slab_size The size of a single slab
unsigned int dflags Set of flags that describe dynamic properties of the cache
void * ctor Pointer to destructor method associated with the cache
void * dtor Pointer to destructor method associated with the cache
const char * name Character array storing the name of the cache
struct list_head next Pointers for the doubly linked list of cache descriptors.
The CFLGS_OFF_SLAB flag in the flags field of the cache descriptor is set to one if the slab descriptor is stored outside the slab; it is set to zero otherwise.
2: The lists field of the kmem_cache_t descriptor
8.2.3. Slab Descriptor
kmem_cache->flags :
The CFLGS_OFF_SLAB flag in the flags field of the cache descriptor
is set to one if the slab descriptor is stored outside the slab;
External slab descriptor
Internal slab descriptor