LINUX 2.4.22 内存管理之二物理内存管理的数据结构

DADA2ndTIAN

已于 2022-07-15 14:45:44 修改

阅读量354

点赞数

分类专栏： # 内存管理文章标签：物理内存管理

于 2019-11-12 22:07:29 首次发布

本文链接：https://blog.csdn.net/zsj1126/article/details/96978093

版权

内存管理专栏收录该内容

9 篇文章 0 订阅

订阅专栏

上一篇：物理内存管理的基本概念

下一篇：物理内存初始化（一）

文章目录

物理内存的数据结构
- node
- zone
- - watermarks
- page

了解linux 内存管理的基本概念后，本篇我们将介绍linux内核中物理内存的概念对应的数据结构

物理内存的数据结构

物理内存的管理分为三个层次，依次为：bank(node)、zone、page。一个bank包含1–3个zone,每个zone包含许多固定大小的page。下面来分别介绍linux 中对不同层次的内存结构的软件构造和描述

node

内存节点代表的是物理内存的一个bank,是最上层，也是最大的内存单元，uma架构的系统中只有一个内存节点，NUMA 系统架构则有多个节点，所有节点都保存pgdat_list 链表中。其结构如下：

/*
 * The pg_data_t structure is used in machines with CONFIG_DISCONTIGMEM
 * (mostly NUMA machines?) to denote a higher-level memory zone than the
 * zone_struct denotes.
 *
 * On NUMA machines, each NUMA node would have a pg_data_t to describe
 * it's memory layout.
 *
 * XXX: we need to move the global memory statistics (active_list, ...)
 *      into the pg_data_t to properly support NUMA.
 */
 typedef struct pglist_data {
	zone_t node_zones[MAX_NR_ZONES]; //节点包含的zone
	zonelist_t node_zonelists[GFP_ZONEMASK+1];//内存分配的区域优先次序
	int nr_zones;.//分区的个数
	struct page *node_mem_map;//节点第一个page的指针，在mem_map数组的某个位置
	unsigned long *valid_addr_bitmap;//有效地址位图，用于有空洞的内存系统
	struct bootmem_data *bdata;//用于启动内存分配
	unsigned long node_start_paddr;.//节点的起始物理地址
	unsigned long node_start_mapnr;//全局mem map数组的偏移量
	unsigned long node_size;//包含的page数量
	int node_id;//  节点ID， 从0开始
	struct pglist_data *node_next;//下个节点指针
} pg_data_t;

UMA 的架构下，只有一个node，对应于page_alloc .c 下静态定义的 contig_page_data 在这里插入图片描述

zone

节点又分为不同的zone,目前有三种类型的zone:normal、highmem、dma。不是所有的系统都有这三个分区。其结构如下：

typedef struct zone_struct {
/*
 * Commonly accessed fields:
 */
spinlock_t		lock; 
unsigned long		free_pages;  //本区空闲的页数
unsigned long		pages_min, pages_low, pages_high;//水位标志
int			need_balance;//需平衡标志

/*
 * free areas of different sizes
 */
free_area_t		free_area[MAX_ORDER];//空闲区域位图，用于伙伴分配器

/*
 * wait_table		-- the array holding the hash table
 * wait_table_size	-- the size of the hash table array
 * wait_table_shift	-- wait_table_size
 * 				== BITS_PER_LONG (1 << wait_table_bits)
 *
 * The purpose of all these is to keep track of the people
 * waiting for a page to become available and make them
 * runnable again when possible. The trouble is that this
 * consumes a lot of space, especially when so few things
 * wait on pages at a given time. So instead of using
 * per-page waitqueues, we use a waitqueue hash table.
 *
 * The bucket discipline is to sleep on the same queue when
 * colliding and wake all in that wait queue when removing.
 * When something wakes, it must check to be sure its page is
 * truly available, a la thundering herd. The cost of a
 * collision is great, but given the expected load of the
 * table, they should be so rare as to be outweighed by the
 * benefits from the saved space.
 *
 * __wait_on_page() and unlock_page() in mm/filemap.c, are the
 * primary users of these fields, and in mm/page_alloc.c//
 * free_area_init_core() performs the initialization of them.
 */
wait_queue_head_t	* wait_table;
unsigned long		wait_table_size;
unsigned long		wait_table_shift; //the number of bits a page address must be shifted right to return an index within the table

/*
 * Discontig memory support fields.
 */
struct pglist_data	*zone_pgdat; //指向其父节点指针
struct page		*zone_mem_map;//该区在MEM_MAP中首个页地址
unsigned long		zone_start_paddr;//该区起始物理地址
unsigned long		zone_start_mapnr;//该区在全局mem map数组的偏移量

/*
 * rarely used fields:
 */
char			*name;//This is the string name of the zone: “DMA”, “Normal” or “HighMem”
unsigned long		size;// the size of the zone in pages	
} zone_t;

watermarks

在这里插入图片描述

page

一个zone 包含多个page,每个page的大小是相同的，页的大小依赖硬件架构，或者硬件支持多种大小的页，由操作系统选择决定，但同一时刻只能选择一种大小，所以对整个系统来说，页是固定大小的。

/*
 * Each physical page in the system has a struct page associated with
 * it to keep track of whatever it is we are using the page for at the
 * moment. Note that we have no way to track which tasks are using
 * a page.
 *
 * Try to keep the most commonly accessed fields in single cache lines
 * here (16 bytes or greater).  This ordering should be particularly
 * beneficial on 32-bit processors.
 *
 * The first line is data used in page cache lookup, the second line
 * is used for linear searches (eg. clock algorithm scans). 
 *
 * TODO: make this structure smaller, it could be as small as 32 bytes.
 */
typedef struct page {
	struct list_head list;		/* ->mapping has some page lists. */
	struct address_space *mapping;	/* The inode (or ...) we belong to. */
	unsigned long index;		/* Our offset within mapping. */
	struct page *next_hash;		/* Next page sharing our hash bucket in
					   the pagecache hash table. */
	atomic_t count;			/* Usage count, see below. */
	unsigned long flags;		/* atomic flags, some possibly
					   updated asynchronously */
	struct list_head lru;		/* Pageout list, eg. active_list;
					   protected by pagemap_lru_lock !! */
	struct page **pprev_hash;	/* Complement to *next_hash. */
	struct buffer_head * buffers;	/* Buffer maps us to a disk block. */

	/*
	 * On machines where all RAM is mapped into kernel address space,
	 * we can simply calculate the virtual address. On machines with
	 * highmem some memory is mapped into kernel virtual memory
	 * dynamically, so we need a place to store that address.
	 * Note that this field could be 16 bits on x86 ... ;)
	 *
	 * Architectures with slow multiplication can define
	 * WANT_PAGE_VIRTUAL in asm/page.h
	 */
#if defined(CONFIG_HIGHMEM) || defined(WANT_PAGE_VIRTUAL)
	void *virtual;			/* Kernel virtual address (NULL if
					   not kmapped, ie. highmem) */
#endif /* CONFIG_HIGMEM || WANT_PAGE_VIRTUAL */
} mem_map_t;

在这里插入图片描述