e820 -- retrieve memory map from BIOS

e820是x86系统中BIOS报告物理内存布局的方式,通过int 15h调用访问。Linux内核通过detect_memory_e820()获取e820表,并在default_machine_specific_memory_setup()中进行复制和排序。此外,本文还介绍了用于操作和排序e820表的重要API,如e820_add_region()、e820_search_gap()等。
摘要由CSDN通过智能技术生成

e820 – retrieve memory map from BIOS

What is e820

Definition from Wikipedia:

e820 is shorthand to refer to the facility by which the BIOS of x86-based computer systems reports the memory map to the operating system or boot loader.

It is accessed via the int 15h call, by setting the AX register to value E820 in hexadecimal. It reports which memory address ranges are usable and which are reserved for use by the BIOS.

Let me summarize it:

e820 is a kind of hardware to detects physical memory arrangement
and
software can talk with it by int 15th

This means on x86 architecture, any system what to know the memory layout, it
needs to talk to e820. Different architectures use different mechanism to
detect memory layout. For example, on ppc64, it uses device tree.

e820 Table looks like this

If you are using linux, you could type following command in your terminal:

dmesg | grep e820

Then you will see:

[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000003ffdffff] usable
[    0.000000] BIOS-e820: [mem 0x000000003ffe0000-0x000000003fffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved

Yes, it is simple. It is just a table. And each entry represents a memory
range with start, end and type. As you can image, the data structure of the
entry is self explain.

struct e820entry {
    __u64 addr; /* start of memory segment */
    __u64 size; /* size of memory segment */
    __u32 type; /* type of memory segment */
} __attribute__((packed));

How linux kernel utilize this

Take a look at the kernel code to see how it works~

First, retrieve the e820 table from BIOS

The function access e820 and retrieves the memory layout is
detect_memory_e820().

static int detect_memory_e820(void)
{
    int count = 0;
    struct biosregs ireg, oreg;
    struct e820entry *desc = boot_params.e820_map;
    static struct e820entry buf; /* static so it is zeroed */

    initregs(&ireg);
    ireg.ax  = 0xe820;
    ireg.cx  = sizeof buf;
    ireg.edx = SMAP;
    ireg.di  = (size_t)&buf;

    /*
     * Note: at least one BIOS is known which assumes that the
     * buffer pointed to by one e820 call is the same one as
     * the previous call, and only changes modified fields.  Therefore,
     * we use a temporary buffer and copy the results entry by entry.
     *
     * This routine deliberately does not try to account for
     * ACPI 3+ extended attributes.  This is because there are
     * BIOSes in the field which report zero for the valid bit for
     * all ranges, and we don't currently make any use of the
     * other attribute bits.  Revisit this if we see the extended
     * attribute bits deployed in a meaningful way in the future.
     */

    do {
        intcall(0x15, &ireg, &oreg);
        ireg.ebx = oreg.ebx; /* for next iteration... */

        /* BIOSes which terminate the chain with CF = 1 as opposed
           to %ebx = 0 don't always report the SMAP signature on
           the final, failing, probe. */
        if (oreg.eflags & X86_EFLAGS_CF)
            break;

        /* Some BIOSes stop returning SMAP in the middle of
           the search loop.  We don't know exactly how the BIOS
           screwed up the map at that point, we might have a
           partial map, the full map, or complete garbage, so
           just return failure. */
        if (oreg.eax != SMAP) {
            count = 0;
            break;
        }

        *desc++ = buf;
        count++;
    } while (ireg.ebx && count < ARRAY_SIZE(boot_params.e820_map));

    return boot_params.e820_entries = count;
}

The logic is also simple, it iterates on the BIOS table with int 0x15 and
stores the information to boo_params.e820_map.

Second, store and sort e820 table to kernel area

As we can see from the previous code, the memory layout is stored in
boot_param, which is not a safe place. And the raw table is not sorted. So
kernel copy the range to a safe place and sorts it.

This is done is function default_machine_specific_memory_setup().

char *__init default_machine_specific_memory_setup(void)
{
    char *who = "BIOS-e820";
    u32 new_nr;
    /*
     * Try to copy the BIOS-supplied E820-map.
     *
     * Otherwise fake a memory map; one section from 0k->640k,
     * the next section from 1mb->appropriate_mem_k
     */
    new_nr = boot_params.e820_entries;
    sanitize_e820_map(boot_params.e820_map,
            ARRAY_SIZE(boot_params.e820_map),
            &new_nr);
    boot_params.e820_entries = new_nr;
    if (append_e820_map(boot_params.e820_map, boot_params.e820_entries)
      < 0) {
        u64 mem_size;

        /* compare results from other methods and take the greater */
        if (boot_params.alt_mem_k
            < boot_params.screen_info.ext_mem_k) {
            mem_size = boot_params.screen_info.ext_mem_k;
            who = "BIOS-88";
        } else {
            mem_size = boot_params.alt_mem_k;
            who = "BIOS-e801";
        }

        e820.nr_map = 0;
        e820_add_region(0, LOWMEMSIZE(), E820_RAM);
        e820_add_region(HIGH_MEMORY, mem_size << 10, E820_RAM);
    }

    /* In case someone cares... */
    return who;
}

The details are hidden in the following two functions:
1. sanitize_e820_map(), sort e820 table
2. append_e820_map(), copy e820 table in boot_param to safe place

Take a big picture

It is time to see at which stage linux kernel accomplish the upper two tasks.

    /* Retrieve the e820 table from BIOS */
    main(), arch/x86/boot/main.c, called from arch/x86/boot/header.S
          detect_memory()
               detect_memory_e820(), save bios info to boot_params.e820_entries

    /* store and sort e820 table to kernel area */
    start_kernel()
          setup_arch()
               setup_memory_map(), 
                     default_machine_specific_memory_setup(), store and sort e820 table
                     e820_print_map()

The final function e820_print_map() is the one who print the message we
grabbed by “dmesg | grep e820”. Well, that’s it~

Some important APIs

I would like to talk more about some APIs of e820, which make e820 an
infrastructure to other part of the kernel. All these functions are defined in
“arch/x86/kernel/e820.c”.

e820_add_region()/e820_remove_range()

These two functions are a pair to manipulate the e820 map. And importantly
they don’t care about the order of the e820 map.

__e820_update_range()

This one update the “type” of a memory range.

For example, from E820_RAM to E820_RESERVED.

e820_search_gap()

This one looks for a “gap” in physical memory, which will be used in device
MMIO.

memblock_x86_fill()

This function fill the memblock layer with e820 input. This is a very
important function which builds a generic memory management layer at the very
beginning of linux kernel boots.

We will touch the memblock in another thread.

Some interesting algorithms

sanitize_e820_map()

This is a function which sorts or sanitize the original e820 map from BIOS.
And, it is interesting.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值