Linux文件系统分析与实战

最新推荐文章于 2024-01-25 12:55:54 发布

渣哥笔记

最新推荐文章于 2024-01-25 12:55:54 发布

阅读量454

点赞数

分类专栏： # kernel linux

本文链接：https://blog.csdn.net/dragonbody/article/details/103899712

版权

linux 同时被 2 个专栏收录

5 篇文章 0 订阅

订阅专栏

kernel

1 篇文章 0 订阅

订阅专栏

概述

引用毛德操在《Linux 内核源代码情景分析》中的一段话：

若要问构成一个“操作系统”的最重要的部件是什么，那就莫过于进程管理和文件系统了。事实上，有些操作系统（如一些“嵌入式”系统）可能有进程管理而没有文件系统；而另一些操作系统（如MSDOS）则有文件系统而没有进程管理。可是，要是二者都没有，那就称不上“操作系统”了。

可见文件系统对于一个操作系统的重要性。

通常意义上说的文件系统是指FAT32、NTFS、ext4这种对存储介质上的数据进行有效组织和管理的一种方法，可以称之为狭义文件系统。

linux中文件系统的作用远不止于此，linux的基本哲学是“万物皆文件”。普通文件、目录、link文件、字符设备、块设备、FIFO、socket都与文件系统有关。

linux还实现了一些特殊文件系统用于系统管理，例如：ramfs、tmpfs、作为初始化根目录的rootfs、用于/proc目录的procfs、用于/dev目录的devtmpfs、用于/sys目录的sysfs、用于debug的debugfs等等。

数据结构

dentry

在linux下目录也是一种文件，文件的内容就是若干个目录项。每个目录项表示一个子目录或者一个文件，目录项包含文件名、文件模式、所有者、文件大小、修改时间等内容。除了这些基本信息，还包含访问该文件内容所需要的索引信息，可以说目录项就是访问一个目录和文件的入口。

linux下使用两个数据结构来表示一个目录项：dentry和inode。dentry中保存了文件名和指向inode的指针，inode中保存了其余信息。为什么要这么做？这是为了支持硬链接文件。当创建一个硬链接文件时，实际上只新建了一个dentry，该dentry指向被链接文件的inode。两个文件可以在不同的目录下，也可以有不同的文件名，但实际上它们指向同一个文件，除了文件名以外的数据相同且只有一份。

struct dentry_operations {
    int (*d_revalidate)(struct dentry *, unsigned int);
    int (*d_weak_revalidate)(struct dentry *, unsigned int);
    int (*d_hash)(const struct dentry *, struct qstr *);
    int (*d_compare)(const struct dentry *,
            unsigned int, const char *, const struct qstr *);
    int (*d_delete)(const struct dentry *);
    int (*d_init)(struct dentry *);
    void (*d_release)(struct dentry *);
    void (*d_prune)(struct dentry *);
    void (*d_iput)(struct dentry *, struct inode *);
    char *(*d_dname)(struct dentry *, char *, int);
    struct vfsmount *(*d_automount)(struct path *);
    int (*d_manage)(const struct path *, bool);
    struct dentry *(*d_real)(struct dentry *, const struct inode *,
                 unsigned int);
} ____cacheline_aligned;

struct dentry {
    struct dentry *d_parent;    /* parent directory */
    struct qstr d_name;
    struct inode *d_inode;      /* Where the name belongs to - NULL is
                     * negative */
    unsigned char d_iname[DNAME_INLINE_LEN];    /* small names */
    const struct dentry_operations *d_op;
    struct super_block *d_sb;   /* The root of the dentry tree */
    //省略其它...
};

dentry除了包含文件名和inode，还包含super block类型指针d_sb和dentry_operations类型指针d_op。super block为整个文件系统的入口，dentry_operations是一组与dentry操作相关的函数。

inode

与dentry结构类似，inode除了包含文件基本信息以外，还包含inode_operations类型指针i_op、super_block类型指针i_sb、file_operations类型指针i_fop。

struct inode {
    umode_t         i_mode;
    unsigned short      i_opflags;
    kuid_t          i_uid;
    kgid_t          i_gid;
    unsigned int        i_flags;

    const struct inode_operations   *i_op;
    struct super_block  *i_sb;
    struct address_space    *i_mapping;

    /* Stat data, not accessed from path walking */
    unsigned long       i_ino;

    dev_t           i_rdev;
    loff_t          i_size;
    struct timespec     i_atime;
    struct timespec     i_mtime;
    struct timespec     i_ctime;
    spinlock_t      i_lock; /* i_blocks, i_bytes, maybe i_size */
    unsigned short          i_bytes;
    unsigned int        i_blkbits;
    blkcnt_t        i_blocks;

    const struct file_operations    *i_fop; /* former ->i_op->default_file_ops */
    //省略其它...
};

inode_operations是一组与inode操作相关的函数，lookup用于在目录中查找子目录和文件，create、link、unlink用于创建和删除文件，mkdir、rmdir用于创建和删除目录，mknod用于创建设备文件，setattr和getattr用于设置和获取属性，等等。

struct inode_operations {
    struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int);
    const char * (*get_link) (struct dentry *, struct inode *, struct delayed_call *);
    int (*permission) (struct inode *, int);
    struct posix_acl * (*get_acl)(struct inode *, int);
    int (*readlink) (struct dentry *, char __user *,int);
    int (*create) (struct inode *,struct dentry *, umode_t, bool);
    int (*link) (struct dentry *,struct inode *,struct dentry *);
    int (*unlink) (struct inode *,struct dentry *);
    int (*symlink) (struct inode *,struct dentry *,const char *);
    int (*mkdir) (struct inode *,struct dentry *,umode_t);
    int (*rmdir) (struct inode *,struct dentry *);
    int (*mknod) (struct inode *,struct dentry *,umode_t,dev_t);
    int (*rename) (struct inode *, struct dentry *,
            struct inode *, struct dentry *, unsigned int);
    int (*setattr) (struct dentry *, struct iattr *);
    int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *);
    ssize_t (*listxattr) (struct dentry *, char *, size_t);
    int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start,
              u64 len);
    int (*update_time)(struct inode *, struct timespec *, int);
    int (*atomic_open)(struct inode *, struct dentry *,
               struct file *, unsigned open_flag,
               umode_t create_mode, int *opened);
    int (*tmpfile) (struct inode *, struct dentry *, umode_t);
    int (*set_acl)(struct inode *, struct posix_acl *, int);
} ____cacheline_aligned;

file_operations

file_operations是一组与文件操作相关的函数，open、release用于文件打开和关闭，read、write用于文件读写，等等。

struct file_operations {
    struct module *owner;
    loff_t (*llseek) (struct file *, loff_t, int);
    ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
    ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
    ssize_t (*read_iter) (struct kiocb *, struct iov_iter *);
    ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);
    int (*iterate) (struct file *, struct dir_context *);
    int (*iterate_shared) (struct file *, struct dir_context *);
    unsigned int (*poll) (struct file *, struct poll_table_struct *);
    long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
    long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
    int (*mmap) (struct file *, struct vm_area_struct *);
    int (*open) (struct inode *, struct file *);
    int (*flush) (struct file *, fl_owner_t id);
    int (*release) (struct inode *, struct file *);
    int (*fsync) (struct file *, loff_t, loff_t, int datasync);
    int (*fasync) (int, struct file *, int);
    int (*lock) (struct file *, int, struct file_lock *);
    ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int);
    unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
    int (*check_flags)(int);
    int (*flock) (struct file *, int, struct file_lock *);
    //省略其它...
};

super_block

super block是整个文件系统的总入口。如果要检索文件系统中的文件，必须通过以下路径完成：super block->根目录dentry->子目录dentry->…->最终文件的dentry。super block一般放在文件系统的固定位置，如FAT32使用磁盘0扇区存放super block数据。

super_block包含file_system_type类型指针s_type、super_operations类型指针s_op、指向文件系统根目录的s_root。super_operations是一组与super block操作相关的函数。

struct super_operations {
    struct inode *(*alloc_inode)(struct super_block *sb);
    void (*destroy_inode)(struct inode *);

    void (*dirty_inode) (struct inode *, int flags);
    int (*write_inode) (struct inode *, struct writeback_control *wbc);
    int (*drop_inode) (struct inode *);
    void (*evict_inode) (struct inode *);
    void (*put_super) (struct super_block *);
    int (*sync_fs)(struct super_block *sb, int wait);
    int (*freeze_super) (struct super_block *);
    int (*freeze_fs) (struct super_block *);
    int (*thaw_super) (struct super_block *);
    int (*unfreeze_fs) (struct super_block *);
    int (*statfs) (struct dentry *, struct kstatfs *);
    int (*remount_fs) (struct super_block *, int *, char *);
    void (*umount_begin) (struct super_block *);

    int (*show_options)(struct seq_file *, struct dentry *);
    int (*show_devname)(struct seq_file *, struct dentry *);
    int (*show_path)(struct seq_file *, struct dentry *);
    int (*show_stats)(struct seq_file *, struct dentry *);
#ifdef CONFIG_QUOTA
    ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t);
    ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
    struct dquot **(*get_dquots)(struct inode *);
#endif
    int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
    long (*nr_cached_objects)(struct super_block *,
                  struct shrink_control *);
    long (*free_cached_objects)(struct super_block *,
                    struct shrink_control *);
};

struct super_block {
    struct list_head    s_list;     /* Keep this first */
    dev_t           s_dev;      /* search index; _not_ kdev_t */
    unsigned char       s_blocksize_bits;
    unsigned long       s_blocksize;
    loff_t          s_maxbytes; /* Max file size */
    struct file_system_type *s_type;
    const struct super_operations   *s_op;
    struct dentry       *s_root;
    struct block_device *s_bdev;
    const struct dentry_operations *s_d_op; /* default d_op for dentries */
    struct list_head    s_inodes;   /* all inodes */
    //省略其它...
};

file_system_type

file_system_type用于表示一种具体的文件系统类型，将它传递给register_filesystem函数注册一种文件系统到内核。

mount函数指针在挂载该类型的文件系统时调用，它返回文件系统根目录的dentry。kill_sb函数在卸载时调用。

struct file_system_type {
    const char *name;
    int fs_flags;
    struct dentry *(*mount) (struct file_system_type *, int,
               const char *, void *);
    void (*kill_sb) (struct super_block *);
    struct module *owner;
    struct file_system_type * next;
    struct hlist_head fs_supers;
    //省略其它...
};

mount

每次mount一个文件系统时，都会创建一个struct mount结构体，假设执行以下mount命令，则会创建两个struct mount。

# mount -t ext4 /dev/sda1 /
# mount -t ext4 /dev/sdb1 /mnt

对于第二个struct mount来说，mnt_parent指向第一个mount、mnt_mountpoint指向挂载点/dev/sda1 /mnt目录的dentry,、mnt_devname保存mount时的设备名（"/dev/sdb1"）、mnt_ns指向命名空间、mnt.mnt_root指向/dev/sdb1根目录的dentry，mnt.mnt_sb指向/dev/sdb1的super block。

既然vfsmount已经包含了super_block指针，而super_block又包含了指向根目录dentry的指针，为啥vfsmount还需要mnt_root来指向根目录？使用mnt_sb->s_root来访问不就行了吗？

struct mountpoint {
    struct hlist_node m_hash;
    struct dentry *m_dentry;
    struct hlist_head m_list;
    int m_count;
};

struct vfsmount {
    struct dentry *mnt_root;    /* root of the mounted tree */
    struct super_block *mnt_sb; /* pointer to superblock */
    int mnt_flags;
};

struct mount {
    struct hlist_node mnt_hash;
    struct mount *mnt_parent;
    struct dentry *mnt_mountpoint;
    struct vfsmount mnt;
    const char *mnt_devname;    /* Name of device e.g. /dev/dsk/hda1 */
    struct mnt_namespace *mnt_ns;   /* containing namespace */
    struct mountpoint *mnt_mp;  /* where is it mounted */
    struct hlist_node mnt_mp_list;  /* list mounts with the same mountpoint */
    //省略其它...
};

path

每个以绝对路径或相对路径字符串表示的一个文件或者目录，在内核里最终都会转换为一个struct path结构体。很明显path指向一个文件或者目录，所以需要包含一个dentry，还需要一个vfsmount来指向该文件所在文件系统的挂载信息。

struct path {
    struct vfsmount *mnt;
    struct dentry *dentry;
};

常见流程

注册文件系统

register_filesystem函数注册一种文件系统到内核，该函数实现比较简单，在file_systems链表中查找一个空位，并将新的文件系统插入。

static struct file_system_type **find_filesystem(const char *name, unsigned len)
{
    struct file_system_type **p;
    for (p = &file_systems; *p; p = &(*p)->next)
        if (strncmp((*p)->name, name, len) == 0 &&
            !(*p)->name[len])
            break;
    return p;
}

int register_filesystem(struct file_system_type * fs)
{
    int res = 0;
    struct file_system_type ** p;

    BUG_ON(strchr(fs->name, '.'));
    if (fs->next)
        return -EBUSY;
    write_lock(&file_systems_lock);
    p = find_filesystem(fs->name, strlen(fs->name));
    if (*p)
        res = -EBUSY;
    else
        *p = fs;
    write_unlock(&file_systems_lock);
    return res;
}

mount文件系统

do_new_mount函数mount文件系统到一个目录，path参数为挂载目录，fstype参数为文件系统类型，name参数为要mount的设备名称。

该函数先将字符串类型文件系统名转换为file_system_type，然后调用vfs_kern_mount分配一个新的mount并初始化，最后将新的mount关联到挂载点对应的path。

static int do_new_mount(struct path *path, const char *fstype, int flags,
            int mnt_flags, const char *name, void *data)
{
    struct file_system_type *type;
    struct vfsmount *mnt;
    int err;

    if (!fstype)
        return -EINVAL;

    type = get_fs_type(fstype);
    if (!type)
        return -ENODEV;

    mnt = vfs_kern_mount(type, flags, name, data);
    if (!IS_ERR(mnt) && (type->fs_flags & FS_HAS_SUBTYPE) &&
        !mnt->mnt_sb->s_subtype)
        mnt = fs_set_subtype(mnt, fstype);

    put_filesystem(type);
    if (IS_ERR(mnt))
        return PTR_ERR(mnt);

    if (mount_too_revealing(mnt, &mnt_flags)) {
        mntput(mnt);
        return -EPERM;
    }

    err = do_add_mount(real_mount(mnt), path, mnt_flags);
    if (err)
        mntput(mnt);
    return err;
}

vfs_kern_mount函数分配一个新的mount，调用文件系统类型mount函数（file_system_type->mount）创建文件系统对应的super block和root dentry。

struct vfsmount *
vfs_kern_mount(struct file_system_type *type, int flags, const char *name, void *data)
{
    struct mount *mnt;
    struct dentry *root;

    if (!type)
        return ERR_PTR(-ENODEV);

    mnt = alloc_vfsmnt(name);
    if (!mnt)
        return ERR_PTR(-ENOMEM);

    if (flags & MS_KERNMOUNT)
        mnt->mnt.mnt_flags = MNT_INTERNAL;

    root = mount_fs(type, flags, name, data);
    if (IS_ERR(root)) {
        mnt_free_id(mnt);
        free_vfsmnt(mnt);
        return ERR_CAST(root);
    }

    mnt->mnt.mnt_root = root;
    mnt->mnt.mnt_sb = root->d_sb;
    mnt->mnt_mountpoint = mnt->mnt.mnt_root;
    mnt->mnt_parent = mnt;
    lock_mount_hash();
    list_add_tail(&mnt->mnt_instance, &root->d_sb->s_mounts);
    unlock_mount_hash();
    return &mnt->mnt;
}

路径查找

路径查找用于将字符串路径转换为对应的path结构，所有与路径相关的系统调用都会涉及该操作。如：打开一个文件、进入一个目录等等。

filename_lookup函数用于路径查找。dfd参数指示查找起点，可以是当前目录（AT_FDCWD），也可以是一个已打开的句柄；name参数为要查找的全路径名，如："/usr/local/bin"；path参数用于返回结果；root参数用于指定起始目录。

nameidata类型用于保存查找的中间结果，path成员保存当前所在路径；last保存当前需要在path中查找的节点，全路径被路径分隔符"/"拆分为若干个节点；inode保存path对应dentry的inode指针；last_type保存当前节点的类型。

filename_lookup先调用set_nameidata对nd进行简单初始化，再调用path_lookupat完成路径查找。

#define EMBEDDED_LEVELS 2
struct nameidata {
    struct path path;
    struct qstr last;
    struct path root;
    struct inode    *inode; /* path.dentry.d_inode */
    unsigned int    flags;
    unsigned    seq, m_seq;
    int     last_type;
    unsigned    depth;
    int     total_link_count;
    struct saved {
        struct path link;
        struct delayed_call done;
        const char *name;
        unsigned seq;
    } *stack, internal[EMBEDDED_LEVELS];
    struct filename *name;
    struct nameidata *saved;
    struct inode    *link_inode;
    unsigned    root_seq;
    int     dfd;
};

static void set_nameidata(struct nameidata *p, int dfd, struct filename *name)
{
    struct nameidata *old = current->nameidata;
    p->stack = p->internal;
    p->dfd = dfd;
    p->name = name;
    p->total_link_count = old ? old->total_link_count : 0;
    p->saved = old;
    current->nameidata = p;
}

static int filename_lookup(int dfd, struct filename *name, unsigned flags,
               struct path *path, struct path *root)
{
    int retval;
    struct nameidata nd;
    if (IS_ERR(name))
        return PTR_ERR(name);
    if (unlikely(root)) {
        nd.root = *root;
        flags |= LOOKUP_ROOT;
    }
    set_nameidata(&nd, dfd, name);
    retval = path_lookupat(&nd, flags | LOOKUP_RCU, path);
    if (unlikely(retval == -ECHILD))
        retval = path_lookupat(&nd, flags, path);
    if (unlikely(retval == -ESTALE))
        retval = path_lookupat(&nd, flags | LOOKUP_REVAL, path);

    if (likely(!retval))
        audit_inode(name, path->dentry, flags & LOOKUP_PARENT);
    restore_nameidata();
    putname(name);
    return retval;
}

path_lookupat函数先调用path_init初始化起始目录nd->path，规则如下：

如果指定了nd->root，则以它做为起点；
如果路径名以"/"开头，则以当前进程的根目录current->fs->root做为起点；
如果nd->dfd等于AT_FDCWD，则以当前进程的当前目录current->fs->pwd做为起点；
否则以nd->dfd的dentry做为起点；

最后调用link_path_walk函数完成中间路径节点的查找，调用lookup_last函数完成最后一个节点的查找。

static int path_lookupat(struct nameidata *nd, unsigned flags, struct path *path)
{
    const char *s = path_init(nd, flags);
    int err;

    if (IS_ERR(s))
        return PTR_ERR(s);
    while (!(err = link_path_walk(s, nd))
        && ((err = lookup_last(nd)) > 0)) {
        s = trailing_symlink(nd);
        if (IS_ERR(s)) {
            err = PTR_ERR(s);
            break;
        }
    }
    if (!err)
        err = complete_walk(nd);

    if (!err && nd->flags & LOOKUP_DIRECTORY)
        if (!d_can_lookup(nd->path.dentry))
            err = -ENOTDIR;
    if (!err) {
        *path = nd->path;
        nd->path.mnt = NULL;
        nd->path.dentry = NULL;
    }
    terminate_walk(nd);
    return err;
}

路径节点查找的主体循环在link_path_walk函数中实现，针对当前节点名的不同种类，需要做不同的处理：

“.”：停留在当前目录
“..”：跳转到上级目录
软链接文件：根据flags决定是否跳转
mount点：跳转到挂载文件系统的root目录

开始查找前先过滤掉所有前置路径分隔符"/"，使name指向第一个需要查找的路径节点名。

static int link_path_walk(const char *name, struct nameidata *nd)
{
    int err;

    while (*name=='/')
        name++;
    if (!*name)
        return 0;

然后进入主体循环，先对当前目录dentry和当前节点名做一个哈希运算，然后将当前节点名和哈希结果存放到nd->last变量。接着让name指向下一个节点名，并跳过两个节点名之间的路径分隔符"/"。最后调用walk_component在nd->path中查找nd->last，找到之后将path结果写入nd->path，使它成为当前查找目录。这样就可以进入下一次循环，直到处理完所有节点。

name += hashlen_len(hash_len);
if (!*name)
    goto OK;
/*
 * If it wasn't NUL, we know it was '/'. Skip that
 * slash, and continue until no more slashes.
 */
do {
    name++;
} while (unlikely(*name == '/'));

rootfs的初始化

内核初始化时将rootfs文件系统挂载为最初始的根目录"/"，它以ramfs或tmpfs做为自己的后端。ramfs和tmpfs都是一种内存文件系统，不同之处在于：ramfs无大小限制，里面的数据不能被交换出去，只有超级用户才有权限写入；tmpfs限制了文件系统的大小，数据可以被交换出去，普通用户也可以写入。

注册rootfs

init_rootfs函数用于注册rootfs，调用register_filesystem注册到文件系统，并初始化ramfs或tmpfs后端。

static struct file_system_type rootfs_fs_type = {
    .name       = "rootfs",
    .mount      = rootfs_mount,
    .kill_sb    = kill_litter_super,
};

int __init init_rootfs(void)
{
    int err = register_filesystem(&rootfs_fs_type);

    if (err)
        return err;

    if (IS_ENABLED(CONFIG_TMPFS) && !saved_root_name[0] &&
        (!root_fs_names || strstr(root_fs_names, "tmpfs"))) {
        err = shmem_init();
        is_tmpfs = true;
    } else {
        err = init_ramfs_fs();
    }

    if (err)
        unregister_filesystem(&rootfs_fs_type);

    return err;
}

mount rootfs

init_mount_tree函数用于mount rootfs，调用vfs_kern_mount创建一个mount，调用create_mnt_ns创建一个命名空间，并初始化init_task命名空间，最后将当前进程的根目录和当前目录都指向rootfs的root。

static void __init init_mount_tree(void)
{
    struct vfsmount *mnt;
    struct mnt_namespace *ns;
    struct path root;
    struct file_system_type *type;

    type = get_fs_type("rootfs");
    if (!type)
        panic("Can't find rootfs type");
    mnt = vfs_kern_mount(type, 0, "rootfs", NULL);
    put_filesystem(type);
    if (IS_ERR(mnt))
        panic("Can't create rootfs");

    ns = create_mnt_ns(mnt);
    if (IS_ERR(ns))
        panic("Can't allocate initial namespace");

    init_task.nsproxy->mnt_ns = ns;
    get_mnt_ns(ns);

    root.mnt = mnt;
    root.dentry = mnt->mnt_root;
    mnt->mnt_flags |= MNT_LOCKED;

    set_fs_pwd(current->fs, &root);
    set_fs_root(current->fs, &root);
}

实战

重新挂载rootfs

在initramfs详解中说道，内核初始化时会将initramfs中的文件提取到根目录中，那个根目录就是rootfs。/init将真实根文件系统挂载到/root目录，并将根目录切换到/root完成初始化。

进入系统之后rootfs不再使用，那还可以将它mount到系统中查看里面的文件吗？不管那么多，先mount试一试：

root@debian:~# mount -t rootfs rootfs /rootfs
mount: unknown filesystem type ‘rootfs’

结果跟想像的差距有点大，检查代码发现，在rootfs_mount函数中对mount次数做了限制，只允许mount一次。删除该限制后再试一次：

root@debian:~# mount -t rootfs rootfs /rootfs
root@debian:~# cd /rootfs
root@debian:/rootfs# ls
root@debian:/rootfs#

这次成功mount上了，但里面空空如也，应该能看到initramfs中的文件才对啊。再次检查rootfs_mount函数，它使用的是mount_nodev来挂载，这样每次mount都会产生一个新的文件系统实例，我们需要的效果是重复mount都指向第一个实例。所以将mount_nodev改为mount_single，修改完成的代码如下：

static struct dentry *rootfs_mount(struct file_system_type *fs_type,
    int flags, const char *dev_name, void *data)
{
    static unsigned long once;
    void *fill = ramfs_fill_super;

    //if (test_and_set_bit(0, &once))
        //return ERR_PTR(-ENODEV);

    if (IS_ENABLED(CONFIG_TMPFS) && is_tmpfs)
        fill = shmem_fill_super;

    //return mount_nodev(fs_type, flags, data, fill);
    return mount_single(fs_type, flags, data, fill);
}

编译内核后再试一次：

root@debian:~# mount -t rootfs rootfs /rootfs
root@debian:~# cd /rootfs
root@debian:/rootfs# ls
root
root@debian:/rootfs#

在rootfs目录中只能看到一个root目录，其它文件去哪儿了？检查initramfs中的/init脚本发现，挂载真实文件系统到/root目录后，会调用switch_root命令切换根目录。该命令在切换之前会删除/root目录以外的所有文件，于是将switch_root命令改为chroot命令。再试一次看看：

root@debian:~# mount -t rootfs rootfs /rootfs
root@debian:~# cd /rootfs
root@debian:/rootfs# ls
bin conf dev etc init lib proc root run sbin scripts sys tmp var
root@debian:/rootfs#

这次可以看到initramfs中的所有文件和目录了。

使能rootfs/root

前面说到系统启动时会将真实根文件系统挂载到/root目录，进入系统后又成功将rootfs挂载到真实根文件系统的/rootfs目录，那这时候进入/rootfs/root目录会发生什么？是否能看到和根目录"/"一样的文件？试了一下，/rootfs/root目录里面是空的，为啥？

mount点的查找是通过__lookup_mnt函数完成的，该函数对当前目录的dentry地址和vfsmount地址进行哈希运算得到一个表头，然后在表中查找对应的mount点。由于每次mount都会分配一个新的vfsmount，虽然它们指向同一个文件系统实例，但自身的地址确不一样，所以进入/rootfs/root目录不能成功跳转。

于是对__lookup_mnt函数进行修改：

static inline struct hlist_head *m_hash(struct vfsmount *mnt, struct dentry *dentry)
{
    //unsigned long tmp = ((unsigned long)mnt / L1_CACHE_BYTES);
    unsigned long tmp = ((unsigned long)mnt->mnt_root / L1_CACHE_BYTES);
    tmp += ((unsigned long)dentry / L1_CACHE_BYTES);
    tmp = tmp + (tmp >> m_hash_shift);
    return &mount_hashtable[tmp & m_hash_mask];
}

struct mount *__lookup_mnt(struct vfsmount *mnt, struct dentry *dentry)
{
    struct hlist_head *head = m_hash(mnt, dentry);
    struct mount *p;

    hlist_for_each_entry_rcu(p, head, mnt_hash)
        //if (&p->mnt_parent->mnt == mnt && p->mnt_mountpoint == dentry)
        if (p->mnt_parent->mnt.mnt_sb == mnt->mnt_sb && p->mnt_mountpoint == dentry) {
            return p;
    return NULL;
}

将mnt改为mnt->mnt_sb，mnt指针虽然不一样，但它们指向的super block却是一样的。改完再试一次：

root@debian:~# cd /rootfs
root@debian:/rootfs# ls
全是/dev目录下的文件…

为啥进入/rootfs目录会跳转到/dev目录了？在devtmpfs分析中说到，devtmpfs初始化时会创建一个内核线程devtmpfsd，该线程会将devtmpfs文件系统mount到rootfs根目录。那devtmpfs的mount不会影响其它模块的初始化吗？答案是一个线程的mount操作只会影响同一个命名空间的其它线程，devtmpfsd函数在mount之前调用sys_unshare(CLONE_NEWNS)创建了自己的命名空间，所以不会影响其它模块。

修改__lookup_mnt函数之后，执行"cd /rootfs"时，实际上先跳转到rootfs的根目录，由于devtmpfs挂载到rootfs根目录，所以最终跳转到了devtmpfs的根目录下。为了达到想要的效果，需要将不在同一个命名空间的mount点过滤掉，于是在__lookup_mnt函数if条件中增加一项check_mnt(p)的检查。

最终效果如下：

root@debian:~# cd /rootfs/root
root@debian:/rootfs/root# ls
bin boot dev etc home initrd.img lib lost+found media mnt opt proc root rootfs run sbin srv sys tmp usr var vmlinuz
root@debian:/rootfs/root# cd /rootfs/root/rootfs/root
root@debian:/rootfs/root/rootfs/root# ls
bin boot dev etc home initrd.img lib lost+found media mnt opt proc root rootfs run sbin srv sys tmp usr var vmlinuz
root@debian:/rootfs/root/rootfs/root#

"/rootfs/root/…"可以无限循环下去了，有点意思。

公众号二维码