内核中与驱动相关的内存操作之四(常用结构体)

最新推荐文章于 2022-07-11 02:56:04 发布

请叫我四哥

最新推荐文章于 2022-07-11 02:56:04 发布

阅读量1.7k

点赞数

分类专栏： Ldd

本文链接：https://blog.csdn.net/tang_jin_chan/article/details/17502309

版权

Ldd 专栏收录该内容

47 篇文章 0 订阅

订阅专栏

内核中与内存相关的结构体主要有:struct pglist_data、struct zone、struct page、struct mm和struct vma.

1.struct pglist_data:

在内核中struct pglist_data谓之为"节点".它是内核内存生物链的金字塔顶,它表征每个CPU所管辖的内存空间.就是说,多少个CPU,对应多少个pglist_data.每一个pglist_data又进一步被划分为一些区域(表示内存中的范围),分别为:ZONE_DMA、ZONE_NORMAL、ZONE_HIGHMEM.如下:

typedef struct pglist_data {
	struct zone node_zones[MAX_NR_ZONES];
	struct zonelist node_zonelists[GFP_ZONETYPES];
	int nr_zones;
	struct page *node_mem_map;
	struct bootmem_data *bdata;
	unsigned long node_start_pfn;
	unsigned long node_present_pages; /* total number of physical pages */
	unsigned long node_spanned_pages; /* total size of physical page
					     range, including holes */
	int node_id;
	struct pglist_data *pgdat_next;
	wait_queue_head_t       kswapd_wait;
	struct task_struct *kswapd;
} pg_data_t;

各成员大体意思如下:

node_zones:

某一节点对应的的内存区域划分,这里是ZONE_DMA、ZONE_NORMAL、ZONE_HIGHMEM;

node_zonelists:

用来组织和管理zone的数组,是内存应该分配哪个zone的内存的一种策略依据,见GFP_ZONETYPES的注释;

nr_zones:

一个节点有多少个zone.一般是1~3个;

node_mem_map:

node中第一个page;

bdata:

用于接收boot传递过来的参数的内存分配管理;

node_start_pfn:

pfn是page frame number的缩写.这个成员是用于表示node中的开始那个page在物理内存中的位置的;

node_present_pages:

一个node可包含的物理页;

pgdat_next:

下一个struct pglist_data;

kswapd_wait:

等待队列头;

kswapd:

内核线程,负责维护当前ZONE内存分配的平衡性.

2.struct zone:
表征一个内存区域,如ZONE_DMA、ZONE_NORMAL、ZONE_HIGHMEM.如下:

/*
 * On machines where it is needed (eg PCs) we divide physical memory
 * into multiple physical zones. On a PC we have 3 zones:
 *
 * ZONE_DMA	  < 16 MB	ISA DMA capable memory
 * ZONE_NORMAL	16-896 MB	direct mapped by the kernel
 * ZONE_HIGHMEM	 > 896 MB	only page cache and user processes
 */

struct zone {
	/* Fields commonly accessed by the page allocator */
	unsigned long		free_pages;
	unsigned long		pages_min, pages_low, pages_high;
	/*
	 * protection[] is a pre-calculated number of extra pages that must be
	 * available in a zone in order for __alloc_pages() to allocate memory
	 * from the zone. i.e., for a GFP_KERNEL alloc of "order" there must
	 * be "(1<<order) + protection[ZONE_NORMAL]" free pages in the zone
	 * for us to choose to allocate the page from that zone.
	 *
	 * It uses both min_free_kbytes and sysctl_lower_zone_protection.
	 * The protection values are recalculated if either of these values
	 * change.  The array elements are in zonelist order:
	 *	[0] == GFP_DMA, [1] == GFP_KERNEL, [2] == GFP_HIGHMEM.
	 */
	unsigned long		protection[MAX_NR_ZONES];

	struct per_cpu_pageset	pageset[NR_CPUS];

	/*
	 * free areas of different sizes
	 */
	spinlock_t		lock;
	struct free_area	free_area[MAX_ORDER];


	ZONE_PADDING(_pad1_)

	/* Fields commonly accessed by the page reclaim scanner */
	spinlock_t		lru_lock;	
	struct list_head	active_list;
	struct list_head	inactive_list;
	unsigned long		nr_scan_active;
	unsigned long		nr_scan_inactive;
	unsigned long		nr_active;
	unsigned long		nr_inactive;
	unsigned long		pages_scanned;	   /* since last reclaim */
	int			all_unreclaimable; /* All pages pinned */

	/*
	 * prev_priority holds the scanning priority for this zone.  It is
	 * defined as the scanning priority at which we achieved our reclaim
	 * target at the previous try_to_free_pages() or balance_pgdat()
	 * invokation.
	 *
	 * We use prev_priority as a measure of how much stress page reclaim is
	 * under - it drives the swappiness decision: whether to unmap mapped
	 * pages.
	 *
	 * temp_priority is used to remember the scanning priority at which
	 * this zone was successfully refilled to free_pages == pages_high.
	 *
	 * Access to both these fields is quite racy even on uniprocessor.  But
	 * it is expected to average out OK.
	 */
	int temp_priority;
	int prev_priority;


	ZONE_PADDING(_pad2_)
	/* Rarely used or read-mostly fields */

	/*
	 * wait_table		-- the array holding the hash table
	 * wait_table_size	-- the size of the hash table array
	 * wait_table_bits	-- wait_table_size == (1 << wait_table_bits)
	 *
	 * The purpose of all these is to keep track of the people
	 * waiting for a page to become available and make them
	 * runnable again when possible. The trouble is that this
	 * consumes a lot of space, especially when so few things
	 * wait on pages at a given time. So instead of using
	 * per-page waitqueues, we use a waitqueue hash table.
	 *
	 * The bucket discipline is to sleep on the same queue when
	 * colliding and wake all in that wait queue when removing.
	 * When something wakes, it must check to be sure its page is
	 * truly available, a la thundering herd. The cost of a
	 * collision is great, but given the expected load of the
	 * table, they should be so rare as to be outweighed by the
	 * benefits from the saved space.
	 *
	 * __wait_on_page_locked() and unlock_page() in mm/filemap.c, are the
	 * primary users of these fields, and in mm/page_alloc.c
	 * free_area_init_core() performs the initialization of them.
	 */
	wait_queue_head_t	* wait_table;
	unsigned long		wait_table_size;
	unsigned long		wait_table_bits;

	/*
	 * Discontig memory support fields.
	 */
	struct pglist_data	*zone_pgdat;
	struct page		*zone_mem_map;
	/* zone_start_pfn == zone_start_paddr >> PAGE_SHIFT */
	unsigned long		zone_start_pfn;

	unsigned long		spanned_pages;	/* total size, including holes */
	unsigned long		present_pages;	/* amount of memory (excluding holes) */

	/*
	 * rarely used fields:
	 */
	char			*name;
} ____cacheline_maxaligned_in_smp;

结构体各成员意义如下:

free_pages:

空闲page数量;

pages_min, pages_low, pages_high:

ZONE_DMA、ZONE_NORMAL、ZONE_HIGHMEM所包含的pages;

protection[MAX_NR_ZONES]:

分配哪个zone里面的内存的标识,在zonelist里面,数组protection里面的元素意义如下:

protection[0]:表示从ZONE_DMA中分配内存;

protection[1]:表示从ZONE_NORMAL中分配内存;

protection[2]:表示从ZONE_HIGHMEM中分配内存.

pageset[NR_CPUS]:

每个CPU所管辖的page数目.就像一个CPU管辖一个zonelist(一般用来组织ZONE_DMA、ZONE_NORMAL、ZONE_HIGHMEM)一样.

lock:

自旋锁;

free_area[MAX_ORDER]:

页面使用状态的信息,以每个bit标识对应的page是否可以分配;

active_list:

可用的pages的组织list;

inactive_list:

不可用的pages的组织list;

nr_scan_active:

扫描可用pages的数目;

nr_scan_inactive:

扫描不可用pages的数目;

nr_active:

可用页的数目;

nr_inactive:

不可用页的数目;

pages_scanned:

上次扫描完,回收的页的数量;

zone_pgdat:

指向此ZONE所属的struct pglist_data;

zone_mem_map:

指向此ZONE映射的内存区域的第一个page;

zone_start_pfn:

和node_start_pfn的含义一样.这个成员是用于表示引ZONE中的开始那个page在物理内存中的位置的;

spanned_pages:

此ZONE总共的page数目;

present_pages:

此ZONE可用的page数目;

name:

此ZONE的名字,如"DMA"、"NORMAL"、"HIGHMEM".

关于zone的相关代码流程大体如下:

asmlinkage void __init start_kernel(void)
-->
void __init setup_arch(char **cmdline_p)
-->
void __init paging_init(struct machine_desc *mdesc)
-->
void __init bootmem_init(void)
-->
static void __init bootmem_free_node(int node, struct meminfo *mi)
-->
free_area_init_core(pgdat, zones_size, zholes_size);

3.struct page:

struct page用来表征系统里面的每个物理页.这个结构体包含了内核所需要的所有的内存的信息.如下:

/*
 * Each physical page in the system has a struct page associated with
 * it to keep track of whatever it is we are using the page for at the
 * moment. Note that we have no way to track which tasks are using
 * a page, though if it is a pagecache page, rmap structures can tell us
 * who is mapping it.
 */
struct page {
	unsigned long flags;		/* Atomic flags, some possibly
					 * updated asynchronously */
	atomic_t _count;		/* Usage count, see below. */
	union {
		atomic_t _mapcount;	/* Count of ptes mapped in mms,
					 * to show when page is mapped
					 * & limit reverse map searches.
					 */
		struct {		/* SLUB */
			u16 inuse;
			u16 objects;
		};
	};
	union {
	    struct {
		unsigned long private;		/* Mapping-private opaque data:
					 	 * usually used for buffer_heads
						 * if PagePrivate set; used for
						 * swp_entry_t if PageSwapCache;
						 * indicates order in the buddy
						 * system if PG_buddy is set.
						 */
		struct address_space *mapping;	/* If low bit clear, points to
						 * inode address_space, or NULL.
						 * If page mapped as anonymous
						 * memory, low bit is set, and
						 * it points to anon_vma object:
						 * see PAGE_MAPPING_ANON below.
						 */
	    };
#if USE_SPLIT_PTLOCKS
	    spinlock_t ptl;
#endif
	    struct kmem_cache *slab;	/* SLUB: Pointer to slab */
	    struct page *first_page;	/* Compound tail pages */
	};
	union {
		pgoff_t index;		/* Our offset within mapping. */
		void *freelist;		/* SLUB: freelist req. slab lock */
	};
	struct list_head lru;		/* Pageout list, eg. active_list
					 * protected by zone->lru_lock !
					 */
	/*
	 * On machines where all RAM is mapped into kernel address space,
	 * we can simply calculate the virtual address. On machines with
	 * highmem some memory is mapped into kernel virtual memory
	 * dynamically, so we need a place to store that address.
	 * Note that this field could be 16 bits on x86 ... ;)
	 *
	 * Architectures with slow multiplication can define
	 * WANT_PAGE_VIRTUAL in asm/page.h
	 */
#if defined(WANT_PAGE_VIRTUAL)
	void *virtual;			/* Kernel virtual address (NULL if
					   not kmapped, ie. highmem) */
#endif /* WANT_PAGE_VIRTUAL */
#ifdef CONFIG_WANT_PAGE_DEBUG_FLAGS
	unsigned long debug_flags;	/* Use atomic bitops on this */
#endif

#ifdef CONFIG_KMEMCHECK
	/*
	 * kmemcheck wants to track the status of each byte in a page; this
	 * is a pointer to such a status block. NULL if not tracked.
	 */
	void *shadow;
#endif
};

与驱动相关的主要结构体各成员意义如下:

flags:

一套描述页状态的一套位标志.这些包括PG_locked,它指示该页在内存中已被加锁,以及 PG_reserved,它防止内存管理系统使用该页.

_count:

这个页的引用数.当这个count 掉到0, 这页被返回给空闲列表;

virtual:

如果该页被映射了,这里记录的是此页的内核虚拟地址,否则为NULL.低端内存总是被映射,而高端内存则不然.

4.struct mm_struct:
内存描述符,用来表示进程的地址空间,它包含了和进程地址空间相关的所有信息.每个进程都有其惟一的mm_struct.如下:

struct mm_struct {
	struct vm_area_struct * mmap;		/* list of VMAs */
	struct rb_root mm_rb;
	struct vm_area_struct * mmap_cache;	/* last find_vma result */
	unsigned long (*get_unmapped_area) (struct file *filp,
				unsigned long addr, unsigned long len,
				unsigned long pgoff, unsigned long flags);
	void (*unmap_area) (struct mm_struct *mm, unsigned long addr);
	unsigned long mmap_base;		/* base of mmap area */
	unsigned long task_size;		/* size of task vm space */
	unsigned long cached_hole_size; 	/* if non-zero, the largest hole below free_area_cache */
	unsigned long free_area_cache;		/* first hole of size cached_hole_size or larger */
	pgd_t * pgd;
	atomic_t mm_users;			/* How many users with user space? */
	atomic_t mm_count;			/* How many references to "struct mm_struct" (users count as 1) */
	int map_count;				/* number of VMAs */
	struct rw_semaphore mmap_sem;
	spinlock_t page_table_lock;		/* Protects page tables and some counters */

	struct list_head mmlist;		/* List of maybe swapped mm's.	These are globally strung
						 * together off init_mm.mmlist, and are protected
						 * by mmlist_lock
						 */

	/* Special counters, in some configurations protected by the
	 * page_table_lock, in other configurations by being atomic.
	 */
	mm_counter_t _file_rss;
	mm_counter_t _anon_rss;

	unsigned long hiwater_rss;	/* High-watermark of RSS usage */
	unsigned long hiwater_vm;	/* High-water virtual memory usage */

	unsigned long total_vm, locked_vm, shared_vm, exec_vm;
	unsigned long stack_vm, reserved_vm, def_flags, nr_ptes;
	unsigned long start_code, end_code, start_data, end_data;
	unsigned long start_brk, brk, start_stack;
	unsigned long arg_start, arg_end, env_start, env_end;

	unsigned long saved_auxv[AT_VECTOR_SIZE]; /* for /proc/PID/auxv */

	struct linux_binfmt *binfmt;

	cpumask_t cpu_vm_mask;

	/* Architecture-specific MM context */
	mm_context_t context;

	/* Swap token stuff */
	/*
	 * Last value of global fault stamp as seen by this process.
	 * In other words, this value gives an indication of how long
	 * it has been since this task got the token.
	 * Look at mm/thrash.c
	 */
	unsigned int faultstamp;
	unsigned int token_priority;
	unsigned int last_interval;

	unsigned long flags; /* Must use atomic bitops to access the bits */

	struct core_state *core_state; /* coredumping support */
#ifdef CONFIG_AIO
	spinlock_t		ioctx_lock;
	struct hlist_head	ioctx_list;
#endif
#ifdef CONFIG_MM_OWNER
	/*
	 * "owner" points to a task that is regarded as the canonical
	 * user/owner of this mm. All of the following must be true in
	 * order for it to be changed:
	 *
	 * current == mm->owner
	 * current->mm != mm
	 * new_owner->mm == mm
	 * new_owner->alloc_lock is held
	 */
	struct task_struct *owner;
#endif

#ifdef CONFIG_PROC_FS
	/* store ref to file /proc/<pid>/exe symlink points to */
	struct file *exe_file;
	unsigned long num_exe_file_vmas;
#endif
#ifdef CONFIG_MMU_NOTIFIER
	struct mmu_notifier_mm *mmu_notifier_mm;
#endif
};

结构体各主要成员意义如下:

mm_users:

正在使用该地址的进程的数目.如果有两个进程共享此空间,那么mm_users等于2.

mmlist:

所有的mm_struct都是通过mmlist域链接在一个双向链表中,该链表的首元素是ini_mm内存描述符,它代表init进程的地址空间.

5.struct vm_area_struct:
内存区域描述符,它描述了指定地址空间内连续区间上的一个独立的内存范围.注意,struct vm_area_struct描述的是一段内存区间.每一个vm_area_struct对应于进程地址空间中的唯一区间,包括当前进程的代码段、数据段和BSS段的布局.如下:

/*
 * This struct defines a memory VMM memory area. There is one of these
 * per VM-area/task.  A VM area is any part of the process virtual memory
 * space that has a special rule for the page-fault handlers (ie a shared
 * library, the executable area etc).
 */
struct vm_area_struct {
	struct mm_struct * vm_mm;	/* The address space we belong to. */
	unsigned long vm_start;		/* Our start address within vm_mm. */
	unsigned long vm_end;		/* The first byte after our end address
					   within vm_mm. */

	/* linked list of VM areas per task, sorted by address */
	struct vm_area_struct *vm_next;

	pgprot_t vm_page_prot;		/* Access permissions of this VMA. */
	unsigned long vm_flags;		/* Flags, see mm.h. */

	struct rb_node vm_rb;

	/*
	 * For areas with an address space and backing store,
	 * linkage into the address_space->i_mmap prio tree, or
	 * linkage to the list of like vmas hanging off its node, or
	 * linkage of vma in the address_space->i_mmap_nonlinear list.
	 */
	union {
		struct {
			struct list_head list;
			void *parent;	/* aligns with prio_tree_node parent */
			struct vm_area_struct *head;
		} vm_set;

		struct raw_prio_tree_node prio_tree_node;
	} shared;

	/*
	 * A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma
	 * list, after a COW of one of the file pages.	A MAP_SHARED vma
	 * can only be in the i_mmap tree.  An anonymous MAP_PRIVATE, stack
	 * or brk vma (with NULL file) can only be in an anon_vma list.
	 */
	struct list_head anon_vma_node;	/* Serialized by anon_vma->lock */
	struct anon_vma *anon_vma;	/* Serialized by page_table_lock */

	/* Function pointers to deal with this struct. */
	const struct vm_operations_struct *vm_ops;

	/* Information about our backing store: */
	unsigned long vm_pgoff;		/* Offset (within vm_file) in PAGE_SIZE
					   units, *not* PAGE_CACHE_SIZE */
	struct file * vm_file;		/* File we map to (can be NULL). */
	void * vm_private_data;		/* was vm_pte (shared mem) */
	unsigned long vm_truncate_count;/* truncate_count or restart_addr */

#ifndef CONFIG_MMU
	struct vm_region *vm_region;	/* NOMMU mapping region */
#endif
#ifdef CONFIG_NUMA
	struct mempolicy *vm_policy;	/* NUMA policy for the VMA */
#endif
};

其主要成员意义如下:

vm_start、vm_end:

内存区间开始、结束地址.因此,[vm_start,vm_end]构成了这个内存区间;

vm_flags:

VMA的行为标识,比如我们需要对当前进程的内存区间进行内存共享,需要把vm_flags设置为VM_SHARED.

vm_ops:

struct vm_area_struct可以理解为一个对象,自然的,对象有其相应的操作集,这里对应着就是vm_ops.

下面我们通过一个简单的示例查看一下一个进程的VMA.如下:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>


int main(int argc,char **argv)
{
        while(1)
        {
                usleep(1000);
        }

        return 0;
}

编译后运行并查看其maps(cat /proc/<pid>/maps):

04dd000-004fd000 r-xp 00000000 fd:00 201453     /lib/ld-2.10.1.so
004fd000-004fe000 r--p 0001f000 fd:00 201453     /lib/ld-2.10.1.so
004fe000-004ff000 rw-p 00020000 fd:00 201453     /lib/ld-2.10.1.so
00501000-0066c000 r-xp 00000000 fd:00 201496     /lib/libc-2.10.1.so
0066c000-0066d000 ---p 0016b000 fd:00 201496     /lib/libc-2.10.1.so
0066d000-0066f000 r--p 0016b000 fd:00 201496     /lib/libc-2.10.1.so
0066f000-00670000 rw-p 0016d000 fd:00 201496     /lib/libc-2.10.1.so
00670000-00673000 rw-p 00670000 00:00 0
0094b000-0094c000 r-xp 0094b000 00:00 0          [vdso]
08048000-08049000 r-xp 00000000 fd:00 1743273    /home/seven/learn/vma
08049000-0804a000 rw-p 00000000 fd:00 1743273    /home/seven/learn/vma
b7f32000-b7f34000 rw-p b7f32000 00:00 0
bf835000-bf84a000 rw-p bffeb000 00:00 0          [stack]

各列意义如下:

开始-结束 访问权限 偏移 主设备号:次设备号 i节点 文件

每一行都与一个vm_area_struct相对应.例如:
第四行是libc库的代码段;

第七行是libc库的的数据段;

第八行是libc库的bss段;

最后一行是进程的栈.大小为84KB(bf84a000 - bf835000 = 0x15000 = 84KB).

用工具pamp可以看得更明了:

[root@seven linux-2.6.34]# pmap 2546
2546:   ./vma
004dd000    128K r-x--  /lib/ld-2.10.1.so
004fd000      4K r----  /lib/ld-2.10.1.so
004fe000      4K rw---  /lib/ld-2.10.1.so
00501000   1452K r-x--  /lib/libc-2.10.1.so
0066c000      4K -----  /lib/libc-2.10.1.so
0066d000      8K r----  /lib/libc-2.10.1.so
0066f000      4K rw---  /lib/libc-2.10.1.so
00670000     12K rw---    [ anon ]
0094b000      4K r-x--    [ anon ]
08048000      4K r-x--  /home/seven/learn/vma
08049000      4K rw---  /home/seven/learn/vma
b7f32000      8K rw---    [ anon ]
bf835000     84K rw---    [ stack ]
 total     1720K
[root@seven linux-2.6.34]#

[注:]这里我们的进程的私有内存区域只占了84KB,但是总进程所占的内存区域为1720KB,其中1452KB是C库的,这个内存区域的大小是共享的,每一个使用C库的进程都是共享这个内存区域的,而不是每一个使用C库的进程都要额外分配1452KB的内存空间.