Linux内核学习笔记(一) 虚拟文件系统VFS

什么是VFS

Vritual Filesystem 是给用户空间程序提供统一的文件和文件系统访问接口的内核子系统。借助VFS,即使文件系统的类型不同(比如NTFS和ext3),也可以实现文件系统之间交互(移动、复制文件等),

  • 从用户空间程序的角度来看,VFS提供了一个统一的抽象、接口。这使得用户空间程序可以对不同类型的文件系统发起统一的系统调用,而不需要关心底层的文件系统类型。
  • 从文件系统的角度来看,VFS提供了一个基于Unix-style文件系统的通用文件模型(common file model),可以用来表示任何类型文件系统的通用特性和操作。底层文件系统提供VFS规定的接口和数据结构,从而实现对linux的支持。

VFS中的数据结构

VFS是面向对象的,VFS中的数据结构既包含数据也包含对该数据进行操作的函数的指针,虽然是使用C的数据结构来实现,但是思想上和面向对象编程是一致的。

VFS的通用数据模型主要包括4种对象类型:

  • Superblock对象,表示一个特定的已挂载文件系统
  • Inode对象,表示一个特定的文件
  • Dentry对象,表示一个directory entry,即dentry。路径上的每一个单独的组件,都是一个dentry。VFS中没有目录对象,目录只是一种文件。
  • File对象,表示进程中打开的文件。

每种对象类型都有着对应的操作操作函数表(相当于对象的方法)

Superblock对象

任何类型的文件系统都要实现Superblock对象,用于存储文件系统的描述信息。Superblock对象通常对应了磁盘上的filesystem superblock 或者 filesystem control block。非磁盘文件系统(比如基于内存的文件系统sysfs)需要动态地生成superblock对象,并将其保存在内存中。

创建、管理、删除superblock对象的代码在fs/super.c中

VFS使用super_block结构体来保存superblock对象。使用alloc_super()函数来创建和初始化superblock对象,文件系统挂载时,文件系统调用alloc_super()从磁盘中读取超级快,并填充super_block结构体.

super_block结构体在<linux/fs.h>中定义的,只给出了部分域

struct super_block 
{
    struct list_head        s_list;           /* list of all superblocks */ 
    dev_t                   s_dev;            /* identifier */ 
    unsigned long           s_blocksize;      /* block size in bytes */ 
    unsigned char           s_blocksize_bits; /* block size in bits */ 
    unsigned char           s_dirt;           /* dirty flag */ 
    unsigned long long      s_maxbytes;       /* max file size */ 
    struct file_system_type s_type;           /* filesystem type */ 
    struct super_operations s_op;             /* superblock methods */ 
    struct dquot_operations *dq_op;           /* quota methods */ 
    struct quotactl_ops     *s_qcop;          /* quota control methods */ 
    struct export_operations *s_export_op;    /* export methods */ 
    unsigned long            s_flags;         /* mount flags */ 
    unsigned long            s_magic;         /* filesystem’s magic number */ 
    struct dentry            *s_root;         /* directory mount point */ 
    struct rw_semaphore      s_umount;        /* unmount semaphore */ 
    struct semaphore         s_lock;          /* superblock semaphore */ 
    int                      s_count;         /* superblock ref count */ 
    int                      s_need_sync;     /* not-yet-synced flag */ 
    atomic_t                 s_active;        /* active reference count */ 
    void                     *s_security;     /* security module */ 
    struct xattr_handler  **s_xattr;  /* extended attribute handlers */
    struct list_head      s_inodes;        /* list of inodes */ 
    struct list_head      s_dirty;         /* list of dirty inodes */ 
    struct list_head      s_io;            /* list of writebacks */ 
    struct list_head      s_more_io;       /* list of more writeback */ 
    struct hlist_head     s_anon;          /* anonymous dentries */ 
    struct list_head      s_files;         /* list of assigned files */ 
    struct list_head      s_dentry_lru;    /* list of unused dentries */ 
    int                   s_nr_dentry_unused; /* number of dentries on list */ 
    struct block_device   *s_bdev;         /* associated block device */ 
    struct mtd_info       *s_mtd;          /* memory disk information */ 
    struct list_head      s_instances;     /* instances of this fs */ 
    struct quota_info     s_dquot;         /* quota-specific options */ 
    int                   s_frozen;        /* frozen status */ 
    wait_queue_head_t     s_wait_unfrozen; /* wait queue on freeze */ 
    char                  s_id[32];        /* text name */ 
    void                  *s_fs_info;      /* filesystem-specific info */ 
    fmode_t               s_mode;          /* mount permissions */ 
    struct semaphore      s_vfs_rename_sem; /* rename semaphore */ 
    u32                   s_time_gran;     /* granularity of timestamps */ 
    char                  *s_subtype;      /* subtype name */ 
    char                  *s_options;      /* saved mount options */
};

Superblock操作函数

superblock对象中最重要的成员是s_op指针,指向superblock_operations,superblock_operations在<linux/fs.h>中定义,下面仅包含部分的操作函数

struct super_operations { 
    struct inode *(*alloc_inode)(struct super_block *sb); 
    void (*destroy_inode)(struct inode *); 
    void (*dirty_inode) (struct inode *); 
    int (*write_inode) (struct inode *, int); 
    void (*drop_inode) (struct inode *); 
    void (*delete_inode) (struct inode *); 
    void (*put_super) (struct super_block *); 
    void (*write_super) (struct super_block *); 
    int (*sync_fs)(struct super_block *sb, int wait); 
    int (*freeze_fs) (struct super_block *); 
    int (*unfreeze_fs) (struct super_block *);
    int (*statfs) (struct dentry *, struct kstatfs *); 
    int (*remount_fs) (struct super_block *, int *, char *); 
    void (*clear_inode) (struct inode *); 
    void (*umount_begin) (struct super_block *); 
    int (*show_options)(struct seq_file *, struct vfsmount *); 
    int (*show_stats)(struct seq_file *, struct vfsmount *); 
    ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t); 
    ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t); 
    int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
};

这是一个函数表,每个指针都指向了一个对superlbock对象进行操作的函数(不含创建、删除superblock,这个是在fs/super.c中),这些操作函数对文件系统和它的inode执行low-level operations. 当文件系统想要调用某个方法时,比如写superblock,使用superblock的指针sb,调用方法为sb->s_op->write(sb).这里需要传入sb指针是因为C缺乏面向对象的特性(没有C++中的this指针),所以需要将sb作为参数传入。

函数表中有的函数是可选的,即可以选择不实现,文件系统可以将指针置为NULL,对于置NULL的函数,VFS将调用一个通用函数或者什么都不做,取决于是什么函数。

下面摘录了部分函数的说明,不一一翻译了

struct inode *(*alloc_inode)(struct super_block *sb)
Creates and initializes a new inode object under the given superblock.

void (destroy_inode)(struct inode )
Deallocates the given inode.

int (write_inode) (struct inode , int)
Writes the given inode to disk

void (delete_inode) (struct inode )
Deletes the given inode from the disk.

void (put_super) (struct super_block )
Called by the VFS on unmount to release the given superblock object

void (write_super) (struct super_block )
Updates the on-disk superblock with the specified superblock.

int (*sync_fs)(struct super_block *sb, int wait)
Synchronizes filesystem metadata with the on-disk filesystem

int (statfs) (struct dentry , struct kstatfs *)
Called by the VFS to obtain filesystem statistics

void (clear_inode) (struct inode )
Called by the VFS to release the inode and clear any pages containing related data.

void (umount_begin) (struct super_block )
Called by the VFS to interrupt a mount operation. It is used by network filesystems,
such as NFS.

Inode对象

Inode对象包含了内核操作一个文件或者目录需要的所有信息。对于Unix-style的文件系统,这些信息可以直接从磁盘中的inode读入,没有inode的文件系统需要根据磁盘上的数据动态生成inode的信息,并将这些信息填入内存中的inode对象

Inode对象使用inode结构体来存储,该结构体定义在<linux/fs.h>中

struct inode
{
    struct hlist_node       i_hash;              /* hash list */ 
    struct list_head        i_list;              /* list of inodes */ 
    struct list_head        i_sb_list;           /* list of superblocks */ 
    struct list_head        i_dentry;            /* list of dentries */ 
    unsigned long           i_ino;               /* inode number */ 
    atomic_t                i_count;             /* reference counter */ 
    unsigned int            i_nlink;             /* number of hard links */ 
    uid_t                   i_uid;               /* user id of owner */ 
    gid_t                   i_gid;               /* group id of owner */ 
    kdev_t                  i_rdev;              /* real device node */ 
    u64                     i_version;           /* versioning number */ 
    loff_t                  i_size;              /* file size in bytes */ 
    seqcount_t              i_size_seqcount;     /* serializer for i_size */ 
    struct timespec         i_atime;             /* last access time */ 
    struct timespec         i_mtime;             /* last modify time */ 
    struct timespec         i_ctime;             /* last change time */ 
    unsigned int            i_blkbits;           /* block size in bits */ 
    blkcnt_t                i_blocks;            /* file size in blocks */ 
    unsigned short          i_bytes;             /* bytes consumed */ 
    umode_t                 i_mode;              /* access permissions */ 
    spinlock_t              i_lock;              /* spinlock */ 
    struct rw_semaphore     i_alloc_sem;         /* nests inside of i_sem */ 
    struct semaphore        i_sem;               /* inode semaphore */ 
    struct inode_operations *i_op;               /* inode ops table */ 
    struct file_operations  *i_fop;              /* default inode ops */ 
    struct super_block      *i_sb;               /* associated superblock */ 
    struct file_lock  *i_flock;            /* file lock list */ 
    struct address_space    *i_mapping;          /* associated mapping */ 
    struct address_space    i_data;              /* mapping for device */ 
    struct dquot            *i_dquot[MAXQUOTAS]; /* disk quotas for inode */ 
    struct list_head        i_devices;           /* list of block devices */ 
    union 
    {
        struct pipe_inode_info  *i_pipe;         /* pipe information */ 
        struct block_device     *i_bdev;         /* block device driver */ 
        struct cdev             *i_cdev;         /* character device driver */
    }; 
    unsigned long           i_dnotify_mask;      /* directory notify mask */ 
    struct dnotify_struct   *i_dnotify;          /* dnotify */ 
    struct list_head        inotify_watches;     /* inotify watches */ 
    struct mutex  inotify_mutex;  /* protects inotify_watches */ 
    unsigned long           i_state;             /* state flags */ 
    unsigned long           dirtied_when;        /* first dirtying time */ 
    unsigned int            i_flags;             /* filesystem flags */ 
    atomic_t                i_writecount;        /* count of writers */ 
    void                    *i_security;         /* security module */ 
    void                    *i_private;          /* fs private pointer */
};

文件系统中的每个文件都可以用一个inode对象来表示,但是inode对象只有在文件被访问时才会在内存中构建。inode对象中一些域是和特殊文件相关的,比如i_pipe指向named pipe数据结构,i_bdev指向了block device数据结构,i_cdev指向character device数据结构,这三个指针存储在了union中,因为一个给定的inode最多指向这三个数据结构中的0个或者1个。
文件系统可能无法支持inode对象中的一些属性,比如有些文件系统没有access timestamp。这种情况下,文件系统可以自己决定怎么如实现这些特性(比如讲timestamp置为0)

Inode操作函数

inode中的i_op指针指向操作inode的函数表,该函数表定义在<linux/fs.h>中

struct inode_operations 
{ 

    int (*create) (struct inode *,struct dentry *,int, struct nameidata *); 
    struct dentry * (*lookup) (struct inode *,struct dentry *, struct nameidata *); 
    int (*link) (struct dentry *,struct inode *,struct dentry *); 
    int (*unlink) (struct inode *,struct dentry *); 
    int (*symlink) (struct inode *,struct dentry *,const char *);
    int (*mkdir) (struct inode *,struct dentry *,int); 
    int (*rmdir) (struct inode *,struct dentry *); 
    int (*mknod) (struct inode *,struct dentry *,int,dev_t); 
    int (*rename) (struct inode *, struct dentry *,
                   struct inode *, struct dentry *); 
    int (*readlink) (struct dentry *, char __user *,int); 
    void * (*follow_link) (struct dentry *, struct nameidata *); 
    void (*put_link) (struct dentry *, struct nameidata *, void *); 
    void (*truncate) (struct inode *); 
    int (*permission) (struct inode *, int); 
    int (*setattr) (struct dentry *, struct iattr *); 
    int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *); 
    int (*setxattr) (struct dentry *, const char *,const void *,size_t,int); 
    ssize_t (*getxattr) (struct dentry *, const char *, void *, size_t); 
    ssize_t (*listxattr) (struct dentry *, char *, size_t); 
    int (*removexattr) (struct dentry *, const char *); 
    void (*truncate_range)(struct inode *, loff_t, loff_t); 
    long (*fallocate)(struct inode *inode, int mode, loff_t offset,
                      loff_t len); 
    int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start,
    u64 len);
};

下面摘录了部分函数的说明

int create(struct inode *dir, struct dentry *dentry, int mode)
The VFS calls this function from the creat() and open() system calls to create a new inode associated with the given dentry object with the specified initial access mode.

struct dentry* lookup(struct inode *dir, struct dentry *dentry)
This function searches a directory for an inode corresponding to a filename specified in the given dentry.

int link(struct dentry *old_dentry, struct inode *dir, struct dentry *dentry)
Invoked by the link() system call to create a hard link of the file old_dentry in the directory dir with the new filename dentry.

int unlink(struct inode *dir, struct dentry *dentry)
Called from the unlink() system call to remove the inode specified by the directory entry dentry from the directory dir.

int follow_link(struct dentry *dentry, struct nameidata *nd)
Called by the VFS to translate a symbolic link to the inode to which it points.

int permission(struct inode *inode, int mask)
Checks whether the specified access mode is allowed for the file referenced by inode

Dentry对象

dentry是directory entry的简称,dentry是路径上具体的一个组件,一个路径上的每一个组件都是一个dentry,如路径/bin/vi.txt中,共有3个dentry,分别是 /, bin, vi.txt。

dentry对象使用dentry结构体来表示,该结构体定义在<linux/dcache.h>中

struct dentry
{
    atomic_t                 d_count;      /* usage count */ 
    unsigned int             d_flags;      /* dentry flags */ 
    spinlock_t               d_lock;       /* per-dentry lock */ 
    int                      d_mounted;    /* is this a mount point? */ 
    struct inode             *d_inode;     /* associated inode */ 
    struct hlist_node        d_hash;       /* list of hash table entries */ 
    struct dentry            *d_parent;    /* dentry object of parent */ 
    struct qstr              d_name;       /* dentry name */ 
    struct list_head         d_lru;        /* unused list */ 
    union 
    {
        struct list_head     d_child;      /* list of dentries within */ 
        struct rcu_head      d_rcu;        /* RCU locking */
    } d_u; 
    struct list_head         d_subdirs;    /* subdirectories */ 
    struct list_head         d_alias;  /* list of alias inodes */ 
    unsigned long            d_time;       /* revalidate time */ 
    struct dentry_operations *d_op;        /* dentry operations table */ 
    struct super_block       *d_sb;        /* superblock of file */ 
    void                     *d_fsdata;    /* filesystem-specific data */ 
    unsigned char            d_iname[DNAME_INLINE_LEN_MIN]; /* short name */
};

因为dentry对象没有在磁盘上的物理存储,所以denty结构体中没有用于标记对象是否被修改的域(即不需要判断对象是否dirty,从而需要写回磁盘)

Dentry的状态

dentry分为三种状态,user, unused, negative

used:
该dentry对应一个有效的inode(dentry的d_inode域指向一个有效的inode),并且d_count是正数,即有一个或者多个用户正在使用该dentry

unused:
该dentry对应一个有效的inode(dentry的d_inode域指向一个有效的inode),并且d_count为0,即VFS并没有使用该dentry,因为该dentry仍然指向一个有效的inode对象,dentry当前被保存在dentry cache中(等待可能再次被使用)

negtive:
该dentry没有对应一个有效的inode(dentry的d_inode为NULL),这种情况可能是因为对应的inode对象被销毁了或者是查找的路径名称不对。此时dentry仍然被保存在cache中,这样下次路径查找可以快速进行(直接从dentry cache中获得)

Dentry Cache

dentry cache的机制由三个部分组成

  • used dentry 双向链表:每个inode对象都有一个i_dentry域,这是一个双向链表,用于保存该inode对应的dentry对象(一个inode可以有很多个dentry对象)
  • least recently used双向链表:存储unused和negative状态的dentry对象。该链表按照lru的顺序存储,尾部的是最not lru的对象,当需要删除dentry来释放空间时,从链表的尾部删除对象。
  • 哈希表和哈希函数:哈希表存储路径和dentry的映射关系,哈希表使用dentry_hanshtable数组来存储,数组中每个元素都指向一个由哈希值相同的dentry组成的链表。哈希函数根据路径计算哈希值。具体的哈希计算方法由detry的操作函数d_hash()来决定,文件系统可以自己实现这个函数。

dentry存储在cache中时,dentry的存在导致对应的inode的使用计数大于0,这样dentry对象可以将inode钉在内存中,只要dentry被cache了,那么对应的inode就一定也被cache了(使用的是inode cache,即icache),所以当路径查找函数在dentry cache中命中时,其对应的inode一定也在内存中。

Dentry操作函数

dentry结构体中的d_op指针指向操作dentry的函数表,函数表定义在<linux/dcache.h>中

struct dentry_operations 
{
    int (*d_revalidate) (struct dentry *, struct nameidata *);
    int (*d_hash) (struct dentry *, struct qstr *); 
    int (*d_compare) (struct dentry *, struct qstr *, struct qstr *); 
    int (*d_delete) (struct dentry *); 
    void (*d_release) (struct dentry *); 
    void (*d_iput) (struct dentry *, struct inode *); 
    char *(*d_dname) (struct dentry *, char *, int);
};

下面摘录了部分函数的说明

int d_revalidate(struct dentry dentry, struct nameidata )
Determines whether the given dentry object is valid.The VFS calls this function whenever it is preparing to use a dentry from the dcache. Most filesystems set this method to NULL because their dentry objects in the dcache are always valid.

int d_hash(struct dentry *dentry, struct qstr *name)
Creates a hash value from the given dentry.

int d_compare(struct dentry *dentry, struct qstr *name1, struct qstr *name2)
Called by the VFS to compare two filenames, name1 and name2. Most filesystems leave this at the VFS default, which is a simple string compare

int d_delete (struct dentry *dentry)
Called by the VFS when the specified dentry object’s d_count reaches zero.This function requires the dcache_lock and the dentry’s d_lock.

void d_release(struct dentry *dentry)
Called by the VFS when the specified dentry is going to be freed.The default function does nothing.

void d_iput(struct dentry *dentry, struct inode *inode)
Called by the VFS when a dentry object loses its associated inode (say, because the entry was deleted from the disk). By default, the VFS simply calls the iput() function to release the inode.

File对象

File对象是打开的文件在内存中的表示(representation),用于在进程中表示打开的文件。进程和file对象直接进行交互,不会解除superblocks,inodes,dentrys。多个进程可以同时打开同一个文件,所以一个文件在内存中可以对应多个file对象。而inode和dentry在内存中只有唯一的对应。

File对象使用file结构体来表示,定义在<linux/fs.h>中

struct file
{
    union
    {
        struct list_head   fu_list;       /* list of file objects */ 
        struct rcu_head    fu_rcuhead;    /* RCU list after freeing */
    } f_u;
    struct path            f_path;        /* contains the dentry */ 
    struct file_operations *f_op;         /* file operations table */ 
    spinlock_t             f_lock;        /* per-file struct lock */ 
    atomic_t               f_count;       /* file object’s usage count */ 
    unsigned int           f_flags;       /* flags specified on open */ 
    mode_t                 f_mode;        /* file access mode */ 
    loff_t                 f_pos;         /* file offset (file pointer) */ 
    struct fown_struct     f_owner;       /* owner data for signals */ 
    const struct cred      *f_cred;       /* file credentials */ 
    struct file_ra_state   f_ra;  /* read-ahead state */ 
    u64                    f_version;     /* version number */ 
    void                   *f_security;   /* security module */ 
    void                   *private_data; /* tty driver hook */
    struct list_head       f_ep_links;    /* list of epoll links */
    spinlock_t             f_ep_lock;     /* epoll lock */ 
    struct address_space   *f_mapping;    /* page cache mapping */ 
    unsigned long          f_mnt_write_state; /* debugging state */
};

和dentry对象类似,file对象在磁盘上也没有对应的存储,所以在file对象也没有flag表示file是否dirty。file对象通过指针f_dentry指向对应的dentry对象,dentry对象指向对应的inode,inode中存储了文件本身是否dirty的信息。

File操作函数

file结构体中的f_op指针指向操作file的函数表,函数表定义在<linux/fs.h>中

struct file_operations 
{ 
    struct module *owner; 
    loff_t (*llseek) (struct file *, loff_t, int); 
    ssize_t (*read) (struct file *, char __user *, size_t, loff_t *); 
    ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *); 
    ssize_t (*aio_read) (struct kiocb *, const struct iovec *,
                         unsigned long, loff_t); 
    ssize_t (*aio_write) (struct kiocb *, const struct iovec *,
                          unsigned long, loff_t); 
    int (*readdir) (struct file *, void *, filldir_t); 
    unsigned int (*poll) (struct file *, struct poll_table_struct *); 
    int (*ioctl) (struct inode *, struct file *, unsigned int,
                  unsigned long); 
    long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long); 
    long (*compat_ioctl) (struct file *, unsigned int, unsigned long); 
    int (*mmap) (struct file *, struct vm_area_struct *); 
    int (*open) (struct inode *, struct file *); 
    int (*flush) (struct file *, fl_owner_t id); 
    int (*release) (struct inode *, struct file *); 
    int (*fsync) (struct file *, struct dentry *, int datasync); 
    int (*aio_fsync) (struct kiocb *, int datasync); 
    int (*fasync) (int, struct file *, int); 
    int (*lock) (struct file *, int, struct file_lock *); 
    ssize_t (*sendpage) (struct file *, struct page *,
                         int, size_t, loff_t *, int); 
    unsigned long (*get_unmapped_area) (struct file *,
                                        unsigned long,
                                        unsigned long, 
                                        unsigned long, 
                                        unsigned long);
    int (*check_flags) (int); 
    int (*flock) (struct file *, int, struct file_lock *); 
    ssize_t (*splice_write) (struct pipe_inode_info *,
                             struct file *, 
                             loff_t *, 
                             size_t, 
                             unsigned int);
    ssize_t (*splice_read) (struct file *, 
                            loff_t *, 
                            struct pipe_inode_info *, 
                            size_t, 
                            unsigned int);
    int (*setlease) (struct file *, long, struct file_lock **); 
}

文件系统可以实现自己的file操作函数,也可以使用file的通用操作函数。通用操作函数一般可以在标准的基于Unix的文件系统中正常工作。

下面摘录了部分函数的说明

int open(struct inode *inode, struct file *file)
Creates a new file object and links it to the corresponding inode object. It is called by the open() system call.

loff_t llseek(struct file *file, loff_t offset, int origin)
Updates the file pointer to the given offset. It is called via the llseek() system call.

ssize_t read(struct file *file, char *buf, size_t count, loff_t *offset)
Reads count bytes from the given file at position offset into buf.The file pointer is then updated.This function is called by the read() system call.

ssize_t aio_read(struct kiocb *iocb, char *buf, size_t count, loff_t offset)
Begins an asynchronous read of count bytes into buf of the file described in iocb. This function is called by the aio_read() system call.

ssize_t write(struct file *file, const char *buf, size_t count, loff_t *offset)
Writes count bytes from buf into the given file at position offset.The file pointer is then updated.This function is called by the write() system call.

int readdir(struct file *file, void *dirent, filldir_t filldir)
Returns the next directory in a directory listing.This function is called by the readdir() system call.

unsigned int poll(struct file *file, struct poll_table_struct *poll_table)
Sleeps, waiting for activity on the given file. It is called by the poll() system call.

int ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
Sends a command and argument pair to a device. It is used when the file is an open device node.This function is called from the ioctl() system call. Callers must hold the BKL.

int mmap(struct file *file, struct vm_area_struct *vma)
Memory maps the given file onto the given address space and is called by the mmap() system call.

int flush(struct file *file)
Called by the VFS whenever the reference count of an open file decreases. Its purpose is filesystem-dependent.

和文件系统相关的数据结构

内核使用两种数据结构来管理和文件系统相关的数据,file_system_type结构体用于表示文件系统类别。vfsmount结构体用于表示一个挂载的文件系统实例。

file_system_type

因为Linux支持那很多中文件系统,所以内核必须要有一个特殊的数据结构来描述每个文件系统的特性和行为,file_system_type结构体就是做这个的。

file_system_type定义在<linux/fs.h>中

struct file_system_type 
{ 
    const char              *name;     /* filesystem’s name */ 
    int                     fs_flags;  /* filesystem type flags */
    struct super_block      *(*get_sb) (struct file_system_type *, int, char *, void *);
    void                    (*kill_sb) (struct super_block *);
    struct module           *owner;    /* module owning the filesystem */ 
    struct file_system_type *next;     /* next file_system_type in list */ 
    struct list_head        fs_supers; /* list of superblock objects */
    struct lock_class_key   s_lock_key; 
    struct lock_class_key   s_umount_key; 
    struct lock_class_key   i_lock_key; 
    struct lock_class_key   i_mutex_key; 
    struct lock_class_key   i_mutex_dir_key; 
    struct lock_class_key   i_alloc_sem_key;
};

其中get_sb()函数在文件系统加载的时候读取磁盘上的superblock,并使用读入的数据填充内存中的superblock对象。每种文件系统不管有多少个实例(哪怕是0个),都会有且只有一个file_system_type。

vfsmount

vfsmount结构体在文件系统挂载时创建,该结构体表示一个具体的文件系统实例(挂载点)

下面是vfsmount结构体的定义,定义在<linux/mount.h>中

struct vfsmount 
{ 
    struct list_head   mnt_hash;        /* hash table list */
    struct vfsmount    *mnt_parent;     /* parent filesystem */ 
    struct dentry      *mnt_mountpoint; /* dentry of this mount point */ 
    struct dentry      *mnt_root;       /* dentry of root of this fs */ 
    struct super_block *mnt_sb;         /* superblock of this filesystem */ 
    struct list_head   mnt_mounts;      /* list of children */ 
    struct list_head   mnt_child;       /* list of children */ 
    int                mnt_flags;       /* mount flags */ 
    char               *mnt_devname;    /* device file name */ 
    struct list_head   mnt_list;        /* list of descriptors */ 
    struct list_head   mnt_expire;      /* entry in expiry list */ 
    struct list_head   mnt_share;       /* entry in shared mounts list */ 
    struct list_head   mnt_slave_list;  /* list of slave mounts */ 
    struct list_head   mnt_slave;       /* entry in slave list */ 
    struct vfsmount    *mnt_master;     /* slave’s master */ 
    struct mnt_namespace *mnt_namespace; /* associated namespace */ 
    int                mnt_id;           /* mount identifier */ 
    int                mnt_group_id;     /* peer group identifier */ 
    atomic_t           mnt_count;        /* usage count */ 
    int                mnt_expiry_mark;  /* is marked for expiration */ 
    int                mnt_pinned;       /* pinned count */ 
    int                mnt_ghosts;       /* ghosts count */ 
    atomic_t           __mnt_writers;    /* writers count */
};

vfsmount中含有指向文件系统示例的superlbock对象的指针。

和进程相关的数据结构

进程使用files_struct, fs_struct 和mnt_namesapce这三个数据结构来将进程和VFS层关联起来,记录已打开文件列表、进程的根文件系统、当前工作目录等信息。

file_struct

进程描述符的files指针指向file_struct,该结构体定义在<linux/fdtable.h>中

struct files_struct 
{ 
    atomic_t               count;              /* usage count */ 
    struct fdtable         *fdt;               /* pointer to other fd table */ 
    struct fdtable         fdtab;              /* base fd table */ 
    spinlock_t             file_lock;          /* per-file lock */ 
    int  next_fd;  /* cache of next available fd */ 
    struct embedded_fd_set close_on_exec_init; /* list of close-on-exec fds */ 
    struct embedded_fd_set open_fds_init       /* list of open fds */ 
    struct file            *fd_array[NR_OPEN_DEFAULT]; /* base files array */
};

fd_array指向一个已打开文件的列表。fd_array[i]指向文件描述符为i的file对象。NR_OPEN_DEFAULT是一个常数,在64bit机器中是64.当打开的文件数超过这个常数值时,内核会创建一个新的fdtable,并使fdt指向这个新的fdtable结构体。

fs_struct

fs_struct结构体用于存储和进程相关的文件系统信息。进程描述符中的fs指针指向进程的fs_struct结构体

fs_struct定义在 <linux/fs_struct.h>中

struct fs_struct 
{ 
    int         users;    /* user count */ 
    rwlock_t    lock;     /* per-structure lock */ 
    int         umask;    /* umask */ 
    int         in_exec;  /* currently executing a file */ 
    struct path root;     /* root directory */ 
    struct path pwd;      /* current working directory */
};

root保存了进程的根目录,pwd保存了进程的当前工作目录

mnt_namespace

mnt_namespace给了每个进程一个独立的文件系统视角。进程描述符中的mnt_namespace域指向进程的mnt_namespace结构体

linux中默认是所有进程共享一个namespace的,只有当clone()时指定了CLONE_NEWS标志,才会创建一个新的namespace。

mnt_namespace定义在<linux/mnt_namespace.h>

struct mnt_namespace 
{ 
    atomic_t            count; /* usage count */ 
    struct vfsmount     *root; /* root directory */
    struct list_head    list;  /* list of mount points */ 
    wait_queue_head_t   poll;  /* polling waitqueue */ 
    int                 event; /* event count */
};

list是一个双向链表,该链表将所有组成该namespace的已挂载文件系统连接到一起。

参考资料

《Linux Kernel Development 3rd Edition》
《Understanding The Linux Kernel 3rd Edition》

阅读更多
换一批

没有更多推荐了,返回首页