Memory Management

The kernel must keep track of the current status of each page frame. For instance, it must be able to distinguish the page frames used to contain pages belonging to processes from those that contain kernel code or kernel data structures; similarly, it must be able to determine whether a page frame in dynamic memory is free or not. This sort of state information is kept in an array of descriptors, one for each page frame. The descriptors of type struct page have the following format:

typedef struct page {
struct page *next;
struct page *prev;
struct inode *inode;
unsigned long offset;
struct page *next_hash;
atomic_t count;
unsigned long flags;
struct wait_queue *wait;
struct page **pprev_hash;
struct buffer_head * buffers;
} mem_map_t;
All the page frame descriptors on the system are included in an array called mem_map.Since each descriptor is less than 64 bytes long, mem_map requires about four page frames for each megabyte of RAM. The MAP_NR macro computes the number of the page frame whose address is passed as a parameter,and thus the index of the corresponding descriptor in mem_map:
#define MAP_NR(addr)		(((unsigned long)(addr) - PAGE_OFFSET) >> PAGE_SHIFT)

After having seen how the kernel allocates and initializes the data structures for page frame handling, we now look at how page frames are allocated and released. Page frames can be requested by making use of four slightly differing functions and macros:

__get_free_pages(gfp_mask, order)

Function used to request 2order contiguous page frames.

__get_dma_pages(gfp_mask, order)

Macro used to get page frames suitable for DMA; it expands to:

__get_free_pages(gfp_mask | GFP_DMA, order)

__get_free_page(gfp_mask)

Macro used to get a single page frame; it expands to:

__get_free_pages(gfp_mask, 0)

get_free_page(gfp_mask) :

Function that invokes:

__get_free_page(gfp_mask)

and then fills the page frame obtained with zeros.

The parameter gfp_mask specifies how to look for free page frames. It consists of the

following flags:

__GFP_WAIT

Set if the kernel is allowed to discard the contents of page frames in order to free memory before satisfying the request.

__GFP_IO

Set if the kernel is allowed to write pages to disk in order to free the corresponding page frames. (Since swapping can block the process in Kernel Mode, this flag must be cleared when handling interrupts or modifying critical kernel data structures.)

__GFP_DMA

Set if the requested page frames must be suitable for DMA. (The hardware limitation that gives rise to this fla was explained in Section6.1.)

__GFP_HIGH , __GFP_MED, __GFP_LOW

Specify the request priority. _ _GFP_LOW is usuallyassociated with dynamic memory requests issued by User Mode processes, while the otherpriorities are associated with kernel requests.

 

Page frames can be released through any of the followingthree functions and macros:

free_pages(addr, order)

This function checks the page descriptor of the page frame having physical address; if the page frame is not reserved (i.e., if the PG_reserved flag is equal to 0), it decrements the count field of the descriptor. If count becomes 0,it assumes that 2 order contiguous page frames starting from addr are nolonger used. In that case, the function invokes free_ pages_ok( ) to insert the page frame descriptor of the first free page in the proper list of free page frames(described in the following section).

__free_page(p)

Similar to the previous function, except that it releases the page frame whose descriptor is pointed to by parameter p.

free_page(addr)

Macro used to release the page frame having physical address addr; it expands

to free_pages(addr,0)

 

The kernel must establish a robust and efficient strategy for allocating groups of contiguous page frames. In doing so, it must deal with a well-known memory management problem called external fragmentation : frequent requests and releases of groups of contiguous page frames of different sizes may lead to a situation in which several small blocks of free page frames are "scattered" inside blocks of allocated page frames. As a result, it may become impossible to allocate a large block of contiguous page frames, even if there are enough free pages to satisfy the request.

 

The technique adopted by Linux to solve the external fragmentation problem is based on the well-known buddy system algorithm. All free page frames are grouped into 10 lists of blocks that contain groups of 1, 2, 4, 8, 16, 32, 64, 128, 256, and 512 contiguous page frames, respectively. The physical address of the first page frame of a block is a multiple of the group size: for example, the initial address of a 16-page-frame block is a multiple of 16 x 212. We'll show how the algorithm works through a simple example.

Assume there is a request for a group of 128 contiguous page frames (i.e., a half-megabyte). The algorithm checks first whether a free block in the 128-page-frame list exists. If there is no such block, the algorithm looks for the next larger block, that is, a free block in the 256-page frame list. If such a block exists, the kernel allocates 128 of the 256 page frames to satisfy the request and inserts the remaining 128 page frames into the list of free128-page-frame blocks.

If there is no free 256-page block, it then looks for the next larger block, that is, a free 512-page-frame block. If such a block exists, it allocates 128 ofthe 512 page frames to satisfy the request, inserts the first 256 of the remaining 384 page frames into the list of free 256-page frame blocks, and inserts the last 128 of the remaining 384 page frames into the list of free 128-page-frame blocks. If the list of 512-page-frame blocks is empty, the algorithm gives up and signals an error condition. The reverse operation, releasing blocks of page frames, gives rise to the name of this algorithm. The kernel attempts to mergetogether pairs of free buddy blocks of size b into a single block of size 2b. Two blocks are consideredbuddy if:

• Both blocks have thesame size, say b.

• They are located incontiguous physical addresses.

• The physical addressof the first page frame of the first block is a multiple of 2 x b x

212.

The algorithm is iterative; if it succeeds in merging released blocks, it doubles b and tries again so as to create even bigger blocks.

 

Linux makes use of two different buddy systems: one handles the page frames suitable for ISA DMA, while the other one handles the remaining page frames. Each buddy system relies on the following main data structures:

• The mem_map array introduced previously.

• An array having 10 elements of type free_area_struct, one element for each group size. The variable free_area[0] points tothe array used by the buddy system for the page frames that are not suitable for ISA DMA, while free_area[1] points to the array used by the buddy system for page frames suitable for ISA DMA.

• Ten binary arrays named bitmaps, one for each group size. Each buddy system has its own set of bitmaps, which it uses to keep track of the blocks it allocates.

Understanding the Linux Kernel 156 Each element of the free_area[0] and free_area[1] arrays is a structure of type free_area_struct, which is defined as follows:

struct free_area_struct {

struct page *next;

struct page *prev;

unsigned int *map;

unsigned long count;

};


Notice that the first two fields of this structure match the corresponding fields of a page descriptor; in fact, pointers to free_area_struct structures are sometimes used as pointers to page descriptors.

 

The k th element of either the free_area[0] or the free_area[1]array is associated with a doubly linked circular list of blocks of size 2k, implemented through the next and prev fields.

 

Each member of such a list is the descriptor of the first page frame of a block. The count field of each free_area_struct structurestores the number of elements in the corresponding list.

 

The map field points to a bitmap whose size dependson the number of existing page frames.

 

Each bit of the bitmap of the k th entry of either free_area[0] or free_area[1] describes

the status of two buddy blocks of size 2page frames. If a bit of the bit map is equal to 0, either both buddy blocks of the pair are free or both are busy; if it is equal to 1, exactly one of the blocks is busy. When both buddies are free, the kernel treats them as a single free block of size 2k+1.

Let us consider, for sake of illustration, a 128 MB RAM and the bitmaps associated with the non-DMA page frames. The 128 MB can be divided into 32768 single pages, 16384 groups of 2 pages each, or 8192 groups of 4 pages each and so on up to 64 groups of 512 pages each. So the bitmap corresponding to free_area[0][0] consists of 16384 bits, one for each pair of the 32768 existing page frames; the bitmap corresponding to free_area[0][1] consists of 8192 bits, one for each pair of blocks of two consecutive page frames; the last bitmap corresponding to free_area[0][9] consists of 32 bits, one for each pair of blocks of 512 contiguous page frames.

The figure illustrate with a simple example the use of the data structures introduced by the buddy system algorithm. The array mem_map contains nine free page frames grouped in one block of one (that is, a single page frame) at the topand two blocks of four further down. The double arrows denote doubly link edcircular lists implemented by the next and prev fields. Notice that the bitmaps are not drawn to scale.



评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值