Chapter 7. Kernel Infrastructure for Component Initialization_how to understand infrastructure component-CSDN博客

Chapter 7. Kernel Infrastructure for Component Initialization

To fully understand a kernel component, you have to know not only what a given set of routines does, but also when those routines are invoked and by whom. The initialization of a subsystem is one of the basic tasks handled by the kernel according to its own model. This infrastructure is worth studying to help you understand how core components of the networking stack are initialized, including NIC device drivers.

The purpose of this chapter is to show how the kernel handles routines used to initialize kernel components, both for components statically included into the kernel and those loaded as kernel modules, with a special emphasis on network devices. We will therefore see:

How initialization functions are named and identified by special macros
How these macros are defined, based on the kernel configuration, to optimize memory usage and make sure that the various initializations are done in the correct order
When and how the functions are executed

We will not cover all details of the initialization infrastructure, but you'll have a sufficient overview to navigate the source code comfortably.

7.1. Boot-Time Kernel Options

Linux allows users to pass kernel configuration options to their boot loaders, which then pass the options to the kernel; experienced users can use this mechanism to fine-tune the kernel at boot time.^[*] During the boot phase, as shown in Figure 5-1 in Chapter 5, the two calls to parse_args take care of the boot-time configuration input. We will see in the next section why parse_args is called twice, with details in the later section "Two-Pass Parsing."

^[*] You can find some documentation and examples of the use of boot options in the Linux BootPrompt HOWTO.

parse_args is a routine that parses an input string with parameters in the form name_variable=value, looking for specific keywords and invoking the right handlers. parse_args is also used when loading a module, to parse the command-line parameters provided (if any).

We do not need to know the details of how parse_args implements the parsing, but it is interesting to see how a kernel component can register a handler for a keyword and how the handler is invoked. To have a clear picture we need to learn:

How a kernel component can register a keyword, along with the associated handler that will be executed when that keyword is provided with the boot string.
How the kernel resolves the association between keywords and handlers. I will offer a high-level overview of how the kernel parses the input string.
How the networking device subsystem uses this feature.

All the parsing code is in kernel/params.c. We'll cover the points in the list one by one.

7.1.1. Registering a Keyword

Kernel components can register a keyword and the associated handler with the _ _setup macro, defined in include/linux/init.h. This is its syntax:

_ _setup(string, function_handler)

where string is the keyword and function_handler is the associated handler. The example just shown instructs the kernel to execute function_handler when the input boot-time string includes string. string has to end with the = character to make the parsing easier for parse_args. Any text following the = will be passed as input to function_handler.

The following is an example from net/core/dev.c, where netdev_boot_setup is registered as the handler for the neTDev= keyword:

_ _setup("netdev=", netdev_boot_setup);

The same handler can be associated with different keywords. For instance net/ethernet/eth.c registers the same handler, netdev_boot_setup, for the ether= keyword.

When a piece of code is compiled as a module, the _ _setup macro is ignored (i.e., defined as a no-op). You can check how the definition of the _ _setup macro changes in include/linux/init.h depending on whether the code that includes the latter file is a module.

The reason why start_kernel calls parse_args twice to parse the boot configuration string is that boot-time options are actually divided into two classes, and each call takes care of one class:

Default options

Most options fall into this category. These options are defined with the _ _setup macro and are handled by the second call to parse_args.

Early options

Some options need to be handled earlier than others during the kernel boot. The kernel provides the early_param macro to declare these options instead of _ _setup. They are then taken care of by parse_early_params. The only difference between early_param and _ _setup is that the former sets a special flag so that the kernel will be able to distinguish between the two cases. The flag is part of the obs_kernel_param data structure that we will see in the section ".init.setup Memory Section."

The handling of boot-time options has changed with the 2.6 kernel, but not all the kernel code has been updated accordingly. Before the latest changes, there used to be only the _ _setup macro. Because of this, legacy code that is to be updated now uses the macro _ _obsolete_setup. When the user passes the kernel an option that is declared with the _ _obsolete_setup macro, the kernel prints a message warning about its obsolete status and provides a pointer to the file and source code line where the latter is declared.

Figure 7-1 summarizes the relationship between the various macros: all of them are wrappers around the generic routine _ _setup_param.

Note that the input routine passed to _ _setup is placed into the .init.setup memory section. The effect of this action will become clear in the section "Boot-Time Initialization Routines."

Figure 7-1. setup_param macro and its wrappers

7.1.2. Two-Pass Parsing

Because boot-time options used to be handled differently in previous kernel versions, and not all of them have been converted to the new model, the kernel handles both models. When the new infrastructure fails to recognize a keyword, it asks the obsolete infrastructure to handle it. If the obsolete infrastructure also fails, the keyword and value are passed on to the init process that will be invoked at the end of the init kernel thread via run_init_process (shown in Figure 5-1 in Chapter 5). The keyword and value are added either to the arg parameter list or to the envp environment variable list.

The previous section explained that, to allow early options to be handled in the necessary order, boot-string parsing and handler invocation are handled in two passes, shown in Figure 7-2 (the figure shows a snapshot from start_kernel, introduced in Chapter 5):

The first pass looks only for higher-priority options that must be handled early, which are identified by a special flag (early).
The second pass takes care of all other options. Most of the options fall into this category. All options following the obsolete model are handled in this pass.

The second pass first checks whether there is a match with the options implemented according to the new infrastructure. These options are stored in kernel_param data structures, filled in by the module_param macro introduced in the section "Module Options" in Chapter 5. The same macro makes sure that all of those data structures are placed into a specific memory section (_ _param), delimited by the pointers _ _ start_ _ _param and _ _stop_ _ _param.

When one of these options is recognized, the associated parameter is initialized to the value provided with the boot string. When there is no match for an option, unknown_bootoption tries to see whether the option should be handled by the obsolete model handler (Figure 7-2).

Figure 7-2. Two-pass option parsing

Obsolete and new model options are placed into two different memory areas:

_ _setup_start ... _ _setup_end

We will see in a later section that this area is freed at the end of the boot phase: once the kernel has booted, these options are not needed anymore. The user cannot view or change them at runtime.

_ _ start_ _ _param ... _ _ stop_ _ _param

This area is not freed. Its content is exported to /sys, where the options are exposed to the user.

See Chapter 5 for more details on module parameters.

Also note that all obsolete model options, regardless of whether they have the early flag set, are placed into the _ _setup_start ... _ _setup_end memory area.

7.1.3. .init.setup Memory Section

The two inputs to the _ _setup macro we introduced in the previous section are placed into a data structure of type obs_kernel_param, defined in include/linux/init.h:

struct obs_kernel_param {
    const char *str;
    int (*setup_func)(char*);
    int early;
};

str is the keyword, setup_func is the handler, and early is the flag we introduced in the section "Two-Pass Parsing."

The _ _setup_param macro places all of the obs_kernel_params instances into a dedicated memory area. This is done mainly for two reasons:

It is easier to walk through all of the instancesfor instance, when doing a lookup based on the str keyword. We will see how the kernel uses the two pointers _ _setup_start and _ _setup_end, that point respectively to the start and end of the previously mentioned area (as shown later in Figure 7-3), when doing a keyword lookup.
The kernel can quickly free all of the data structures when they are not needed anymore. We will go back to this point in the section "Memory Optimizations."

7.1.4. Use of Boot Options to Configure Network Devices

In light of what we saw in the previous sections, let's see how the networking code uses boot options.

We already mentioned in the section "Registering a Keyword" that both the ether= and netdev= keywords are registered to use the same handler, netdev_boot_setup. When this handler is invoked to process the input parameters (i.e., the string that follows the matching keyword), it stores the result into data structures of type neTDev_boot_setup, defined in include/linux/netdevice.h. The handler and the data structure type happen to share the same name, so make sure you do not confuse the two.

struct netdev_boot_setup {
    char name[IFNAMSIZ];
    struct ifmap map;
};

name is the device's name, and ifmap, defined in include/linux/if.h, is the data structure that stores the input configuration:

struct ifmap
{
    unsigned long mem_start;
    unsigned long mem_end;
    unsigned short base_addr;
    unsigned char irq;
    unsigned char dma;
    unsigned char port;
    /* 3 bytes spare */
};

The same keyword can be provided multiple times (for different devices) in the boot-time string, as in the following example:

LILO: linux ether=5,0x260,eth0 ether=15,0x300,eth1

However, the maximum number of devices that can be configured at boot time with this mechanism is NEtdEV_BOOT_SETUP_MAX, which is also the size of the static array dev_boot_setup used to store the configurations:

static struct netdev_boot_setup dev_boot_setup[NETDEV_BOOT_SETUP_MAX];

neTDev_boot_setup is pretty simple: it extracts the input parameters from the string, fills in an ifmap structure, and adds the latter to the dev_boot_setup array with netdev_boot_setup_add.

At the end of the booting phase, the networking code can use the neTDev_boot_setup_check function to check whether a given interface is associated with a boot-time configuration. The lookup on the array dev_boot_setup is based on the device name dev->name:

int netdev_boot_setup_check(struct net_device *dev)
{
    struct netdev_boot_setup *s = dev_boot_setup;
    int i;

    for (i = 0; i < NETDEV_BOOT_SETUP_MAX; i++) {
        if (s[i].name[0] != '/0' && s[i].name[0] != ' ' &&
            !strncmp(dev->name, s[i].name, strlen(s[i].name))) {
            dev->irq        = s[i].map.irq;
            dev->base_addr  = s[i].map.base_addr;
            dev->mem_start  = s[i].map.mem_start;
            dev->mem_end    = s[i].map.mem_end;
            return 1;
        }
    }
    return 0;
}

Devices with special capabilities, features, or limitations can define their own keywords and handlers if they need additional parameters on top of the basic ones provided by ether= and netdev= (one driver that does this is PLIP).