Linux内存管理基础

系统启动之Linux内存管理基础


Keywords

非一致内存访问(NUMA)模型、节点(node)、内存管理区(Zone)、一致内存访问(UMA)模型、内核页表、内存管理区分配器(伙伴系统Buddy System)、slab系统、活动页链表、非活动页链表、保留物理页框、TLB抖动、内存管理的初始化…

闲言碎语

直接映射:是指将某一块连续地虚拟内存区间固定地映射到一块物理内存区间,生成相应的映射页表表项,这些表项生成之后页表表项不会改变,映射关系保持不变【固定】,直到操作系统shutdown。

动态映射: 直接映射是将一块虚拟内存区间固定地映射到一块物理内存区间,映射关系一直保持。而动态映射正好相反,它可以先将一块虚拟内存区间映射到某块物理内存区间上,之后还可以改变这种不固定的映射,将这块虚拟内存区间映射到另外的物理内存区间上。为了描述方便,我假定物理内存区间在物理内存地址空间是连续的,但实际上并不需要保证连续性【可以是离散的页组成的物理内存区间】。

永久映射: 将物理高端内存页帧映射到内核线性地址空间的持久映射区[PKMap区],一旦映射建立之后,在解除映射关系之前,对应的页表表项都不能更改为其他映射关系。

虚拟内存区间连续:是指一块虚拟内存区间在虚拟地址空间中是连续的,但不要求与之对应的物理内存区间在物理内存地址空间中连续。

物理内存区间连续:是指一块物理内存区间在物理内存地址空间中连续。

虚拟地址:代码中对地址的引用。

线性地址:虚拟地址经过段机制映射之后的地址[在没有段机制时,虚拟地址就是线性地址]。

物理地址:线性地址经过分页机制映射之后的地址[在没有分页机制时,线性地址就是物理地址]。

内核页目录:内核页目录一般是指内核全局页目录,ilde和init在初始化系统时就使用的这个页目录。这个页目录只在系统启动早期被这几个内核初始化线程使用,后面就不会被任何进程用做页目录。因为调度器选择一个内核线程执行的时候,内核线程会挪用被调度出去的进程的mm[地址空间],所以说内核线程是没有自己的地址空间的【当然,除了系统启动时的idle和init线程,不过,后面idle和init以及其他所有内核线程一样,在进程切换的时候使用被调度走的那个进程的页目录】。

进程页目录:进程拥有自己的地址空间mm,所以有自己的页目录mm->pgd,在切换到他时,内核会更新cr3寄存器,使用他的页目录。

1. 操作系统启动过程

参考《linux内核完全注释 》《Linux Kernel Analysis》,下面的关注点在内存管理。

注:文中代码来自Linux内核版本2.6.11

代码最初运行在实模式下,BIOS完成开机自检等过程,初始化IVT【中断向量表】之后,会读取【用户设定或选择的】启动存储设备的第一个扇区。这个扇区的512B包含系统引导代码[bootsect.s]以及磁盘分区表,其中的启动代码[boot/bootsect.s]属于bootloader[boot/bootsect.s+boot/setup.s] 的一部分。

注:现在的bootloader已经不再是原来的[boot/bootsect.s+boot/setup.s]的形式,而是:
第三方的bootloader[GRUB…] 【完成bootsect.s的功能以及一些其他功能】+ boot/setup.s

1.1. bootloader的主要任务是:

加载操作系统内核到内存中,使能段机制,完成实模式到保护模式的转换。

bootloader[boot/bootsect.s]首先将[setup.s]和[内核(包含boot/compressed/head.s)]加载到内存中,然后跳转到setup.s。

接下来bootloader[boot/setup.s] 建立 system’s physical memory map、将内核转移到物理地址0x1000[4k]【小内核】或者0x100000[1M]【大内核】。

boot/compressed/head.s位于这个物理地址处。

建立临时的IDT,然后建立好临时的GDT【全局描述符表/段描述符表】,使得:

virt addr=linear addr = phy addr。【虚拟地址与物理地址的对等映射】

在建立好GDT后使能段机制,进入保护模式

这里,段机制实际上就是实现了一个对等映射,为什么要这么做呢?必然是为了进入保护模式,扩展内存寻址空间啊【20 -> 32】!

注意,这时候还没有建立页表,也没有使能页机制。这里的代码是基于物理内存地址进行编写和链接的【编写bootloader时就知道它会初始化GDT,实现一个虚拟地址与物理地址的对等映射,然后开启段机制进入保护模式,扩展寻址空间】。

进入保护模式之后,bootloader[boot/setup.s]使pc跳转到0x1000【小内核】或者0x100000 【大内核】[boot/compressed/head.s]去执行指令。

//boot/setup.s

...

// jump to startup_32 in arch/i386/boot/compressed/head.
// NOTE: For high loaded big kernels we need a
//    jmpi    0x100000,__BOOT_CS
  ...
/* 
default:
jump to 0x10:0x1000 
For high loaded big kernels: 
jump to 0x10:0x100000 (segment number: 0x10; offset: 0x100000.)
*/

...

head.s主要解压缩内核映像(decompress_kernel),解压后的结果最终都将放在0x100000[1M]处(不管大内核、小内核)。然后转入内核[kernel/head.s]:

//boot/compressed/head.s
startup_32:
...

ljmp $(__KERNEL_CS), $0x100000 //kernel/head.s中的startup_32缺省地址:0x100000,所以
  //这行代码执行之后,pc跳到kernel/head.s中的startup_32处
 /*
 在解压缩的过程中,kernel/head.s会覆盖原来的boot/head.s
 */
...

通过这些步骤之后,真正开始了内核初始化过程,包括:

  • 启动分页机制;
  • 让操作系统各组成部分(内存管理、进程管理等)分别完成自己的初始化,如建立各种管理用的数据结构等;
  • 完成外部设备的初始化;
  • 创建并启动用户进程;
  • 启动Shell或GUI,开始与用户交互

1.2. 为启用分页机制做准备,并使能分页机制

内核最初的初始化任务由[kernel/head.s]来完成。

线性地址空间大小

  • 一个页目录大小为一页,4KB,每个页目录项为4字节,因此,一个页目录包含1024个页目录项,即能够描述1024个页表。
  • 一个页表大小为一页,4KB,每个页表项为4字节,因此,一个页表包含1024个页表项,即能够描述1024个页。

线性地址由页目录+页表+偏移量组成,而系统中只有一个页目录,那么线性地址空间能表示的最大范围为1024*1024个4KB页=4GB。

kernel/head.s开启页机制

注意,内核【包括kernel/head.s】的代码被链接到了__PAGE_OFFSET之上的线性空间中,实际却被装载在物理地址0x100000[1M]的位置,所以要注意代码中符号的引用[线性地址]对应的物理地址是否正确。

/*
 *  linux/arch/i386/kernel/head.S -- the 32-bit startup code.
 */

...
ENTRY(startup_32)
...
  //sets up the final GDT
  lgdt boot_gdt_descr - __PAGE_OFFSET 
  //__PAGE_OFFSET = 0xC0000000[for 32bit os]
  /*
  boot_gdt_descr的线性地址位于__PAGE_OFFSET之上,但实际位于物理地址:
  boot_gdt_descr - __PAGE_OFFSET处。
  */

/*
 * builds provisional kernel page tables so that paging can be turned on
 * 建立临时页表 ,代码在后文给出,带有注释
 */
...

/*
 * Enable paging
 * 使能页机制
 */
  movl $swapper_pg_dir-__PAGE_OFFSET,%eax
  movl %eax,%cr...3       /* set the page table pointer.. */
  movl %cr0,%eax
  orl $0x80000000,%eax
  movl %eax,%cr0      /* ..and set paging (PG) bit */
  ljmp $__BOOT_CS,$1f    /* Clear prefetch and normalize %eip */

     //使能页机制之后,因为建立了临时的映射关系,所以往后对符号的引用不再用手动的计算其
     //实际所在的物理地址,MMU会帮你安装页表给出的映射关系自动完成计算。
...
    lgdt cpu_gdt_descr
...

  call setup_idt

...

/*
 *  setup_idt
 * 
 * creates the final interrupt descriptor table
 */
setup_idt:
...


/* This is the default interrupt "handler" :-) */
  ALIGN
ignore_int:
...
  iret
...

//下面的数据段随内核镜像被加载到了物理内存中,但其符号的线性地址却在链接脚本中被设置为
//相对物理地址偏移__PAGE_OFFSET,所以在建立页表映射并使能页机制之前,对这些符号的使用
//要格外小心,需要手动地在代码中计算出其实际物理地址。所以,head.s需尽快地建立临时页
//表,使能页机制。

/*
 * BSS section
 */
.section ".bss.page_aligned","w"
ENTRY(swapper_pg_dir)
  .fill 1024,4,0
ENTRY(empty_zero_page)
  .fill 4096,1,0

/*
 * This starts the data section.
 */
.data

ENTRY(stack_start)//kernel stack
  .long init_thread_union+THREAD_SIZE
  .long __BOOT_DS

...

.globl boot_gdt_descr
.globl idt_descr
.globl cpu_gdt_descr

  ALIGN

# early boot GDT descriptor (must use 1:1 address mapping)

  .word 0             # 32 bit align gdt_desc.address
boot_gdt_descr:
  .word __BOOT_DS+7
  .long boot_gdt_table - __PAGE_OFFSET

  .word 0             # 32-bit align idt_desc.address
idt_descr:
  .word IDT_ENTRIES*8-1       # idt contains 256 entries
  .long idt_table


# boot GDT descriptor (later on used by CPU#0):

  .word 0             # 32 bit align gdt_desc.address
cpu_gdt_descr:
  .word GDT_ENTRIES*8-1
  .long cpu_gdt_table

  .fill NR_CPUS-1,8,0     # space for the other GDT descriptors

/*
 * The boot_gdt_table must mirror the equivalent in setup.S and is
 * used only for booting.
 */
  .align L1_CACHE_BYTES
ENTRY(boot_gdt_table)
  .fill GDT_ENTRY_BOOT_CS,8,0
  .quad 0x00cf9a000000ffff    /* kernel 4GB code at 0x00000000 */
  .quad 0x00cf92000000ffff    /* kernel 4GB data at 0x00000000 */

/*
 * The Global Descriptor Table contains 28 quadwords, per-CPU.
 */
  .align PAGE_SIZE_asm
ENTRY(cpu_gdt_table)
  .quad 0x0000000000000000    /* NULL descriptor */
  .quad 0x0000000000000000    /* 0x0b reserved */
     ...
  .quad 0x0000000000000000    /* 0x33 TLS entry 1 */
  .quad 0x0000000000000000    /* 0x3b TLS entry 2 */
  .quad 0x0000000000000000    /* 0x43 TLS entry 3 */
   ...

  .quad 0x00cf9a000000ffff    /* 0x60 kernel 4GB code at 0x00000000 */
  .quad 0x00cf92000000ffff    /* 0x68 kernel 4GB data at 0x00000000 */
  .quad 0x00cffa000000ffff    /* 0x73 user 4GB code at 0x00000000 */
  .quad 0x00cff2000000ffff    /* 0x7b user 4GB data at 0x00000000 */
  ...
1.2.1. 建立临时页表

Mappings are created both at virtual address 0 (identity mapping) and PAGE_OFFSET for up to _end+sizeof(page tables)+INIT_MAP_BEYOND_END.

/*
 * This is how much memory *in addition to the memory covered up to
 * and including _end* we need mapped initially.  We need one bit for
 * each possible page, but only in low memory, which means
 * 2^32/4096/8 = 128K worst case (4G/4G split.)
 * This should be a multiple of a page.
 */

#define INIT_MAP_BEYOND_END   (128*1024)

//128KB, used as a bitmap covering all pages.
//128k的内存,其中1bit代表一页物理页帧,128k能表示2^32/4096个物理页帧,即4G的物理空间

 cld //EFLAGS中的方向位置0

/*
 * builds provisional kernel page tables so that paging can be turned on
 * 建立临时页表
 */
page_pde_offset = (__PAGE_OFFSET >> 20);//__PAGE_OFFSET在PED的偏移  
/* 
__PAGE_OFFSET是0xc0000000,page_pde_offset = 3072 = 0xc00,是页目录中的第
3072/4 = 768个表项:PDE[768]
*/  

//pg0 starts at _end
//swapper_pg_dir starts at the beginning of BSS

  movl $(pg0 - __PAGE_OFFSET), %edi //第0个页表所在的物理地址
  movl $(swapper_pg_dir - __PAGE_OFFSET), %edx //页目录所在的物理地址
  movl $0x007, %eax  /* 0x007 = PRESENT+RW+USER 用来设置表项的标记位*/ 

10://外循环:填充PDE

  //将edi寄存器中的值+0x007然后赋给寄存器ecx,从后面得知:下一次执行到这,
  //edi+=page_size。
  leal 0x007(%edi),%ecx           /* Create PDE entry */

    //对等映射空间: 将第i个页表的物理地址写入到PDE的第i项中-->PDE[i]  
  movl %ecx,(%edx)            /* Store identity PDE entry */

  //内核线性空间[__PAGE_OFFSET之上]:将第i个页表的物理地址写入到PDE的第
  //page_pde_offset+i项中-->PDE[page_pde_offset+i]
  movl %ecx,page_pde_offset(%edx)     /* Store kernel PDE entry */

  addl $4,%edx //每次外循环,i++
  movl $1024, %ecx //每次内循环1024次


11://内循环:填充PTE

//stols指令将eax中的值保存到es:edi指向的地址中,若设置了EFLAGS中的方向位置位(即在STOSL
//指令前使用STD指令)则edi自减4,否则(使用CLD指令)edi自增4
  stosl//%eax初始值为$0 + $0x007,其中的$0表示第0个物理页框的起始物理地址

  addl $0x1000,%eax //0x1000=4k,所以每次循环都将填充下一个物理页的物理地址到当前页
      //的下一个页表项中去

    //执行LOOP指令时,CPU自动将CX的值减1,直到CX为0 ,循环结束
  loop 11b //因为%ecx=1024,所以循环1024次,一次循环结束,一张页表完成映射

    /* End condition: we must map up to and including INIT_MAP_BEYOND_END */
    /* 
  bytes beyond the end of our own page tables; the +0x007 is the attribute 
  bits
    */

    //ebp = edi寄存器的数值 + INIT_MAP_BEYOND_END(128K)+ 0x007 
  leal (INIT_MAP_BEYOND_END+0x007)(%edi),%ebp
  cmpl %ebp,%eax
      //%eax-0x007 = 下一个待映射的物理页的物理地址
      //%ebp-0x007 = 当前被映射的页表的最后一项的物理地址 + INIT_MAP_BEYOND_END 
      /*
      上面的比较要表达的意思是:
      +----------+
      |----------|
--if--+----------+ 8M <———如果当前被映射了的物理空间到了这儿,则可以跳出循环
            ...
      +----------+      
      |           |
------+----------+------if condition
      |     128K  |
      +----------+
      |----4b----|<——————最后一个页表项:init_pg_tables_end
      +----------+
            ...
      |----------|
      +----------+ _end <————页表起始位置:pg0
            ...
      +----------+ __bss_start <————页目录起始位置:swapper_pg_dir
            ...
      |----------|
--if--+----------+ 4M <———如果当前被映射了的物理空间到了这儿,则还需映射,继续外循环
      |----------|
      +----------+
        ...
   1M +----------+ _text <———内核代码被加载到了这里 
        ...
      +----------+

   physical memory layout

      所以就是要保证包括页表所在的物理空间都要映射到,还要保证至少128k的额外空间
      [bitmap]被映射到

      */

  jb 10b 
      //%eax<%ebp,则表示被映射的物理页帧不够,跳回到外循环。
      //It’s certain that no bootable kernel will be greater than 8MB in size,
      //所以只建立了物理内存前8M的映射:
      //linear address[0,8M]-->physical address[0,8m] 
      //and
      //linear address[3G,3G+8M]-->physical address[0,8m]


    //将临时页表结束位置的物理地址赋值给init_pg_tables_end
  movl %edi,(init_pg_tables_end - __PAGE_OFFSET)


    /*
    页目录占据一页内存[4k],共4k/4=1024项页目录项,每一项目录项对应一张页表[4k,共1024项
    页表项],每一项页表项对应一页物理内存[4k],故完整的页目录共映射了:
    1024*1024*4k=4G的线性空间

    但是,这里只映射了PDE[1~2]和PDE[page_pde_offset+1~page_pde_offset+2]到物理页区
    间:[0,4M]   
    */ 

最后得到的临时页表会是下面的样子:


p141 临时页表

图示中,swapper_pg_dir的第0项和第1中项中的pg1和pg0指向的页表中的页表项将线性地址映射成对等的物理内存地址【其中0x300[4Gx3/4]和0x301两项后文会做出解释】:

linear addr = phy addr,使得:virt addr = linear addr = phy addr。 【注意: linear addr = phy addr的映射关系对0x300和0x301两项不成立】

例如:访问虚拟内存地址0x00100300,经过GDT段机制映射之后转化为线性地址0x00100300,经过页表页机制映射之后转化为物理地址0x00100300。


p141 临时页表

还是这个图,不过,现在重点关注0x300和0x301两项。你会看到第0x300项和第0项以及第0x301项和第1项指向了相同的的页表。技巧就在这里:

访问虚拟内存地址0xc0100700,经过GDT段机制映射之后转化为线性地址0xc00100700,经过页表页机制映射之后转化为物理地址0x00100700。

最后的布局就会是这样:

注意:在pg1上面还有一个128k的空间[bitmap]。


p141 kernel.head.s

开启页机制之后,内核代码就不用顾忌对符号引用的解析了。接下来内核就要让一系列的子系统去完成自己的初始化工作。

kernel/head.s执行完之后跳转到init/main.c

//kernel/head.s

call start_kernel //init/main.c::start_kernel

1.3. init/main.c::start_kernerl

init/main.c::start_kernel:

start_kernel完成了内核所有的初始化工作。

asmlinkage void __init start_kernel(void)
{
   char * command_line;
...  
    page_address_init();

  //for i386: arch\i386\kernel\setup.c::setup_arch
  //建立前896M的映射页表,初始化node、zone、memmap、buddy system、kmap区等描述物理内存
  //的结构体
    setup_arch(&command_line);

  /*
  进程环境的初始化,创建系统第一个进程:idle
  */
    sched_init();

    page_alloc_init();

  /* 异常处理调用函数表排序 */
    sort_main_extable();

  /* 重新设置中断向量表 */
    trap_init();
...
   /* 虚拟文件系统的初始化 */ 
    vfs_caches_init_early();

  /* 
  内存初始化,释放前边标志为保留的所有页面 
  initializes the kernel's memory management subsystem. It also prints a tabulation 
  of all available memory and the memory occupied by the kernel.
  */
    mem_init();

    kmem_cache_init();//初始化slab分配器,建立在buddy system之上

...

    anon_vma_init();//匿名虚拟内存域初始化

    fork_init(num_physpages);  /* 根据物理内存大小计算允许创建进程的数量 */ 

  /*
  执行proc_caches_init() , bufer_init(), unnamed_dev_init() ,vfs_caches_init(), 
  signals_init()等函数对各种管理机制建立起专用的slab缓冲区队列。
  */
    proc_caches_init();

    buffer_init();

...
    signals_init();

    /* rootfs populating might need page-writeback */
    page_writeback_init();

#ifdef CONFIG_PROC_FS
    proc_root_init();//对虚拟文件系统/proc进行初始化
#endif

    acpi_early_init();

    /* Do the rest non-__init'ed, we're now alive */
    rest_init();
}

1.4. PKmap区

1.4.1. mm\highmem.c::page_address_init

Linux高端内存

/*
初始化管理高端物理内存页的数据结构,用于kmap【PKMap区】。
*/

static struct page_address_map page_address_maps[LAST_PKMAP];//1024页,共4M
void __init page_address_init(void)
{
  int i;

  /*
  page_address_pool:一个全局的page_address_map链表,初始化为1024项page_address_map。 
  用作page_address_map的缓存池。在需要page_address_map时,如果page_address_pool不为
  空,那就可以跳过创建一个page_address_map的过程,直接从page_address_pool中摘取一项
  page_address_pool。【缓存---加快分配速度】
  */
  INIT_LIST_HEAD(&page_address_pool);

  for (i = 0; i < ARRAY_SIZE(page_address_maps); i++)
      list_add(&page_address_maps[i].list, &page_address_pool);

      /*
   初始化page_address_htable数组,一共128项【128个槽】,用来快速查找已经创建了
   页表项并建立了映射关系的page对应的page_address_map。一旦在表中获得了一个page
   对应的page_address_map,则可以获得该page在页表中被映射到线性地址空间中的
   线性地址page_address_map->virtual。
   */

  /*
  page_address_htable中挂上的【已被映射到线性地址空间的】物理高端内存页全部被映射到
  了内核的PKMap区
  */

  //page_address_htable中挂着的都是page_address_map链表
  for (i = 0; i < ARRAY_SIZE(page_address_htable); i++) {
      INIT_LIST_HEAD(&page_address_htable[i].lh);
      spin_lock_init(&page_address_htable[i].lock);
  }
      spin_lock_init(&pool_lock);
}


//--------------------下面是相关的数据结构------------------------

/*
 * page_address_map freelist, allocated from page_address_maps.
 */
static struct list_head page_address_pool;    /* freelist */
static spinlock_t pool_lock;          /* protects page_address_pool */

/*
 * Describes one page->virtual association
 */
struct page_address_map {
  struct page *page;
  void *virtual;
   //通过list字段链接到页表池全局链表page_address_pool中或
   //page_address_htable[hash_ptr(page,PA_HASH_ORDER)].lh
  struct list_head list;//将page_address_map结构体实例链接起来
};


#define PA_HASH_ORDER 7


/*
 * Hash table bucket
 */
static struct page_address_slot {
  struct list_head lh;            /* List of page_address_maps */
  spinlock_t lock;            /* Protect this bucket's list */
} ____cacheline_aligned_in_smp page_address_htable[1<<PA_HASH_ORDER];


#ifdef CONFIG_X86_PAE


#define LAST_PKMAP 512


#else


#define LAST_PKMAP 1024


#endif

2. 物理内存描述初始化

这部分初始化的主要工作是:

初始化node、zone、mem_map、建立内核页表、初始化PKMap区、初始化固定映射区。

2.1. arch\i386\kernel\setup.c::setup_arch

图示1:内核地址空间布局


p141 内核空间1

/*
 * Determine if we were loaded by an EFI loader.  If so, then we have also been
 * passed the efi memmap, systab, etc., so we should use these data structures
 * for initialization.  Note, the efi init code path is determined by the
 * global efi_enabled. This allows the same kernel image to be used on existing
 * systems (with a traditional BIOS) as well as on EFI systems.
 */
void __init setup_arch(char **cmdline_p)
{
  unsigned long max_low_pfn;

...
  //下面是记录内核代码段的起始,结束虚拟地址
  init_mm.start_code = (unsigned long) _text;
  init_mm.end_code = (unsigned long) _etext;

  init_mm.end_data = (unsigned long) _edata;
  init_mm.brk = init_pg_tables_end + PAGE_OFFSET;

    //下面是记录内核代码段的起始,结束物理地址
  code_resource.start = virt_to_phys(_text);
  code_resource.end = virt_to_phys(_etext)-1;

  data_resource.start = virt_to_phys(_etext);
  data_resource.end = virt_to_phys(_edata)-1;

  //描述物理内存:低端内存、高端内存。
  max_low_pfn = setup_memory();

  /*
   * NOTE: before this point _nobody_ is allowed to allocate
   * any memory using the bootmem allocator.  Although the
   * alloctor is now initialised only the first 8Mb of the kernel
   * virtual address space has been mapped.  All allocations before
   * paging_init() has completed must use the alloc_bootmem_low_pages()
   * variant (which allocates DMA'able memory) and care must be taken
   * not to exceed the 8Mb limit.
   */
...

  /*
 * paging_init() sets up the page tables - note that the first 8MB are
 * already mapped by head.S.
 *
 * This routines also unmaps the page at virtual kernel address 0, so
 * that we can trap those pesky NULL-reference errors in the kernel.
 */
  paging_init();

  /*
   * NOTE: at this point the bootmem allocator is fully available.
   */
...

  //Request address space for all standard resources
  register_memory();

...
}
2.1.1. arch\i386\kernel\setup.c::setup_memory

Describing Physical Memory

//物理内存描述
/*
- Find the start and ending PFN for low memory (min_low_pfn, max_low_pfn), the 
start and end PFN for high memory (highstart_pfn, highend_pfn) and the PFN for 
the last page in the system (max_pfn).

- Initialise the bootmem_data structure and declare which pages may be used by 
the boot memory allocator

- Mark all pages usable by the system as “free” and then reserve the pages used 
by the bitmap representing the pages
Reserve pages used by the SMP config or the initrd image if one exists
*/

static unsigned long __init setup_memory(void)
{
  unsigned long bootmap_size, start_pfn, max_low_pfn;

  /*
   * partially used pages are not usable - thus
   * we are rounding upwards:
   */
  //有一部分物理页帧不能使用【被内核image占据什么的...】,所以start_pfn记录
   //第一个可用的物理页帧号
  start_pfn = PFN_UP(init_pg_tables_end);//向上取整【页】

  find_max_pfn();//初始化全局变量max_pfn,得到最大的物理帧号

 //finds the highest page frame addressable in ZONE_NORMAL[896M位置的物理帧]
  max_low_pfn = find_max_low_pfn();


#ifdef CONFIG_HIGHMEM

  highstart_pfn = highend_pfn = max_pfn;
  if (max_pfn > max_low_pfn) {
      highstart_pfn = max_low_pfn;
     //初始化全局变量highstart_pfn[896M之后的物理起始帧号]
  }

#endif


  /*
   * Initialize the boot-time allocator (with low memory only):
   */
  bootmap_size = init_bootmem(start_pfn, max_low_pfn);
  /*
      设置全局变量:max_low_pfn = max_low_pfn;
      设置全局变量:min_low_pfn = start_pfn;
      Initialises the appropriate struct bootmem_data_t and inserts the node 
      into the linked list of nodes pgdat_list.
  */

  /*  
  register_bootmem_low_pages() reads the e820 map and calls free_bootmem() 
  for all usable pages in the running system. This is what marks the pages 
  marked as reserved during initialisation as free

  将系统中所有可用的物理内存标记为可用
  */
  register_bootmem_low_pages(max_low_pfn);

  /*
  Reserve the pages that are being used to store the bitmap representing the 
  pages
   */
  reserve_bootmem(HIGH_MEMORY, (PFN_PHYS(start_pfn) +
           bootmap_size + PAGE_SIZE-1) - (HIGH_MEMORY));

  /*
  Reserve page 0 as it is often a special page used by the bios
   */
  reserve_bootmem(0, PAGE_SIZE);

  /* reserve EBDA region, it's a 4K region */
  reserve_ebda_region();

   /* could be an AMD 768MPX chipset. Reserve a page  before VGA to prevent
      PCI prefetch into it (errata #56). Usually the page is reserved anyways,
      unless you have no PS/2 mouse plugged in. */
  if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
      boot_cpu_data.x86 == 6)
      reserve_bootmem(0xa0000 - 4096, 4096);


#ifdef CONFIG_SMP

  /*
   * But first pinch a few for the stack/trampoline stuff
   * FIXME: Don't need the extra page at 4K, but need to fix
   * trampoline before removing it. (see the GDT stuff)
   */
  reserve_bootmem(PAGE_SIZE, PAGE_SIZE);

#endif


#ifdef CONFIG_ACPI_SLEEP

  /*
   * Reserve low memory region for sleep support.
   */
  acpi_reserve_bootmem();

#endif


#ifdef CONFIG_X86_FIND_SMP_CONFIG

  /*
   * Find and reserve possible boot-time SMP configuration:
   */
  find_smp_config();

#endif


...
  return max_low_pfn;
}
2.1.2. arch\i386\kernel\setup.c::paging_init

Linux 内存管理之highmemory简介
Describing Physical Memory
Page Table Management

//内核页表初始化
/*
 * paging_init() sets up the page tables - note that the first 8MB are
 * already mapped by head.S.
 *
 * This routines also unmaps the page at virtual kernel address 0, so
 * that we can trap those pesky NULL-reference errors in the kernel.
 */
void __init paging_init(void)
{
...
      //pagetable_init() is responsible for setting up a static page table using 
      //swapper_pg_dir as the PGD
      //映射内核空间前896M的页表
  pagetable_init();//1

  load_cr3(swapper_pg_dir);//将页目录基址载入cr3
...
  /*
  This function only exists if CONFIG_HIGHMEM is set during compile time. It is 
  responsible for caching where the beginning of the kmap region is, the PTE 
  referencing it and the protection for the page tables. This means the PGD will 
  not have to be checked every time kmap() is used.
  */
  //initialises the region of pagetables reserved for use with kmap() 
  //kmap_pte:FIX_KMAP_BEGIN项所对应的页表项,从上往下设置。
  kmap_init();//请看下面的注释
   //--------------load_cr3 and kmap_init-------------- 
    /*
  #define load_cr3(pgdir) \
  asm volatile("movl %0,%%cr3": :"r" (__pa(pgdir)))

  #define __pa(x)     ((unsigned long)(x)-PAGE_OFFSET)      
  */

    /*
  void __init kmap_init(void)
{
  unsigned long kmap_vstart;

  //cache the first kmap pte 
  kmap_vstart = __fix_to_virt(FIX_KMAP_BEGIN);
  kmap_pte = kmap_get_fixmap_pte(kmap_vstart);

  kmap_prot = PAGE_KERNEL;
}


#define kmap_get_fixmap_pte(vaddr)                      \

     pte_offset_kernel(pmd_offset(pgd_offset_k(vaddr), (vaddr)), (vaddr))
*/


  //This is the top-level function which is used to initialise each of the  
  //zones. The size of the zones in PFNs was discovered during setup_memory().
  zone_sizes_init();//2

}


--------------如果不感兴趣,接下来的代码就不用看了--------------

//-----------------1.pagetable_init-----------------
/*
This function is responsible for statically inialising a pagetable starting with 
a statically defined PGD called swapper_pg_dir. 
At the very least, a PTE will be available that points to every page frame in 
ZONE_NORMAL.
*/  

static void __init pagetable_init (void)
{
  unsigned long vaddr;
  pgd_t *pgd_base = swapper_pg_dir;
...
  //内核线性空间低端内存的内核页表项的初始化,并建立与物理页的映射关系
  kernel_physical_mapping_init(pgd_base);//1.1
...

//At this point, page table entries have been setup which reference all parts of 
//ZONE_NORMAL. The remaining regions needed are those for fixed mappings and 
//those needed for mapping high memory pages with kmap().

  //固定映射区和永久映射区的初始化

  /*
   * Fixed mappings, only the page table structure has to be
   * created - mappings will be set by set_fixmap():
   */
    /*
  为了避免前期可能对固定映射区已经分配了页表项,基于临时内核映射区间要求页表连续性的保
  证,所以在此重新申请连续的页表空间将原页表内容拷贝至此。值得注意的是,与低端内存的页表
  初始化不同的是,这里的页表只是被分配,相应的PTE项并未初始化,这个工作将会交由以后各个
  固定映射区部分的相关代码调用set_fixmap()来将相关的固定映射区页表与物理内存关联。    

  参考:http://blog.csdn.net/hanchaoman/article/details/6942140
  */
  vaddr = __fix_to_virt(__end_of_fixed_addresses - 1) & PMD_MASK;
  //该函数不会更改pgd_base
  page_table_range_init(vaddr, 0, pgd_base);//1.2 


  //内部调用一个page_table_range_init,同上,将原页表拷贝到新的一页之中,新的页表所需
  //页帧用alloc_bootmem_low_pages(PAGE_SIZE)分配
  permanent_kmaps_init(pgd_base);//1.3
...
}

//-----------------1.1.pagetable_init-----------------
/*
 * This maps the physical memory to kernel virtual address space, a total 
 * of max_low_pfn pages, by creating page tables starting from address 
 * PAGE_OFFSET.
 */

//将前896M的物理内存线性映射到了PAGE_OFFSET之上:
//virtual address = physical address + PAGE_OFFSET
static void __init kernel_physical_mapping_init(pgd_t *pgd_base)
{
  unsigned long pfn;
  pgd_t *pgd;
  pmd_t *pmd;
  pte_t *pte;
  int pgd_idx, pmd_idx, pte_ofs;
  ...
}

//-----------------1.2.page_table_range_init-----------------
/*
 * This function initializes a certain range of kernel virtual memory 
 * with new bootmem page tables, everywhere page tables are missing in
 * the given range.
 */

/*
 * NOTE: The pagetables are allocated contiguous on the physical space 
 * so we can cache the place of the first one and move around without 
 * checking the pgd every time.
 */
static void __init page_table_range_init (unsigned long start, unsigned long 
                                          end, pgd_t *pgd_base)
{
  pgd_t *pgd;
  pud_t *pud;
  pmd_t *pmd;
  int pgd_idx, pmd_idx;
  unsigned long vaddr;

  vaddr = start;
  pgd_idx = pgd_index(vaddr);
  pmd_idx = pmd_index(vaddr);
  pgd = pgd_base + pgd_idx;

  for ( ; (pgd_idx < PTRS_PER_PGD) && (vaddr != end); pgd++, pgd_idx++) {
      if (pgd_none(*pgd)) 
          one_md_table_init(pgd);//新分配一张页表,并将物理地址映射到pgd中
      pud = pud_offset(pgd, vaddr);
      pmd = pmd_offset(pud, vaddr);
      for (; (pmd_idx < PTRS_PER_PMD) && (vaddr != end); pmd++, pmd_idx++) {
          if (pmd_none(*pmd)) 
              one_page_table_init(pmd);//新分配一张页表,并将物理地址映射到pmd中

          vaddr += PMD_SIZE;
      }
      pmd_idx = 0;
  }
}

//-----------------1.3.permanent_kmaps_init-----------------
void __init permanent_kmaps_init(pgd_t *pgd_base)
{
  pgd_t *pgd;
  pud_t *pud;
  pmd_t *pmd;
  pte_t *pte;
  unsigned long vaddr;

  vaddr = PKMAP_BASE;
  page_table_range_init(vaddr, vaddr + PAGE_SIZE*LAST_PKMAP, pgd_base);

  pgd = swapper_pg_dir + pgd_index(vaddr);
  pud = pud_offset(pgd, vaddr);
  pmd = pmd_offset(pud, vaddr);
  pte = pte_offset_kernel(pmd, vaddr);
  pkmap_page_table = pte; 
}

//-----------------2.zone_sizes_init-----------------
  • 0
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值