Linux 虚拟文件系统概览

Linux 虚拟文件系统概览

原作者: Richard Gooch <rgooch@atnf.csiro.au>

 

                     Last updated on June 24, 2007.

 

  Copyright (C) 1999 Richard Gooch

  Copyright (C) 2005 Pekka Enberg

 

  This file is released under the GPLv2.

 

Introduction

============

 

The Virtual File System (also known as the Virtual Filesystem Switch)

is the software layer in the kernel that provides the filesystem

interface to userspace programs. It also provides an abstraction

within the kernel which allows different filesystem implementations to

coexist.

引言

============

虚拟文件系统(也被称为虚拟文件开关)是Linux内核提供给用户空间程序的文件系统软件接口。它也在内核中提供了一层抽象,以使不同的文件系统实现能够共存。

 

VFS system calls open(2), stat(2), read(2), write(2), chmod(2) and so

on are called from a process context. Filesystem locking is described

in the document Documentation/filesystems/Locking.

VFS的系统调用open(2)stat(2)read(2)write(2)chmod(2)等接口,都是在某个进程上下文中被调用的。关于文件锁的描述,详见Documentation/filesystems/Locking

 

Directory Entry Cache (dcache)

------------------------------

 

The VFS implements the open(2), stat(2), chmod(2), and similar system

calls. The pathname argument that is passed to them is used by the VFS

to search through the directory entry cache (also known as the dentry

cache or dcache). This provides a very fast look-up mechanism to

translate a pathname (filename) into a specific dentry. Dentries live

in RAM and are never saved to disc: they exist only for performance.

 

The dentry cache is meant to be a view into your entire filespace. As

most computers cannot fit all dentries in the RAM at the same time,

some bits of the cache are missing. In order to resolve your pathname

into a dentry, the VFS may have to resort to creating dentries along

the way, and then loading the inode. This is done by looking up the

inode.

目录索引缓存(dcache

------------------------------

VFS实现了open(2)stat(2)chmod(2),以及诸如此类的系统调用。路径名当作参数被传递给这些函数,VFS根据路径名从目录索引缓存(也被称为dentry cache或者dcache)中查找(对应的文件对象)。这提供了一种非常快速的将路径名(文件名)转换成对应dentry的机制。Dentries只存在于RAM(内存)中,并且永远不会被保存到磁盘:性能才是它们存在的意义。

 

dcache体现的是整个文件空间的一个视图。由于大部分计算机无法同时将所有的dentries放入内存,因此会缺少某些(文件的)缓存。为了将路径名解析为dentryVFS可能会先创建dentries,随后加载inode。这个工作在查找inode时完成。

 

The Inode Object

----------------

An individual dentry usually has a pointer to an inode. Inodes are filesystem objects such as regular files, directories, FIFOs and other beasts.  They live either on the disc (for block device filesystems) or in the memory (for pseudo filesystems). Inodes that live on the disc are copied into the memory when required and changes to the inode are written back to disc. A single inode can be pointed to by multiple dentries (hard links, for example, do this).

To look up an inode requires that the VFS calls the lookup() method of the parent directory inode. This method is installed by the specific filesystem implementation that the inode lives in. Once the VFS has the required dentry (and hence the inode), we can do all those boring things like open(2) the file, or stat(2) it to peek at the inode data. The stat(2) operation is fairly simple: once the VFS has the dentry, it peeks at the inode data and passes some of it back to userspace.

 

Inode对象

----------------

一个dentry实体通常含有一个指向inode的指针。Inodes代表文件系统的实体对象,诸如规则文件(regular files),目录(directories),输入输出设备(FIFOs)或者其他对象(beats。这些对象既存在于磁盘上(块设备文件系统,block device filesystem),也存在于内存中(伪文件系统,pseudo filesystem)。磁盘上的的inode对象在需要(读写)的时候会被加载到内存中,inode的修改会被写回到磁盘上。多个dentry可以同时指向同一个inode(举个例子,硬链接就是用来干这事的)。

查找一个inode,需要VFS调用父目录inode对象的lookup()接口。此方法需要具体的文件系统来实现。一旦跟VFS对接了dentry(当然还有inode),我们就可以做所有那些乏味的事情,比如open(2)一个文件,或者stat(2)一个文件以获取它的inode属性。stat(2)操作相当简单:一旦VFS找到了(文件系统的)dentry,它只需要读取inode的相关数据并返回所需的部分给用户空间即可。

 

 

The File Object

---------------

Opening a file requires another operation: allocation of a file structure (this is the kernel-side implementation of file descriptors). The freshly allocated file structure is initialized with a pointer to the dentry and a set of file operation member functions. These are taken from the inode data. The open() file method is then called so the specific filesystem implementation can do its work. You can see that this is another switch performed by the VFS. The file structure is placed into the file descriptor table for the process.

Reading, writing and closing files (and other assorted VFS operations) is done by using the userspace file descriptor to grab the appropriate file structure, and then calling the required file structure method to do whatever is required. For as long as the file is open, it keeps the dentry in use, which in turn means that the VFS inode is still in use.

 

文件对象

---------------

打开文件需要另一个操作:分配一个file结构体(实现于内核态的文件描述符)。新分配的文件结构体被初始化为一个指向dentry和文件操作函数集的指针。这些(函数)取自inode信息(inode data)。然后调用文件操作中的open()方法,这样文件系统的实现的open就会被调用。可以看到这是VFS所执行的又一个转换file结构体被放置于进程的文件描述符表中。

执行读、写和关闭文件(以及其他VFS相关操作)时,首先利用用户空间的文件描述符找到相应的file结构体,然后调用结构体中对应的方法。只要文件还打开着,dentry就处于被使用状态,进而意味着VFS inode也处于使用状态。

 

Registering and Mounting a Filesystem

=====================================

 

To register and unregister a filesystem, use the following API

functions:

 

   #include <linux/fs.h>

 

   extern int register_filesystem(struct file_system_type *);

   extern int unregister_filesystem(struct file_system_type *);

 

The passed struct file_system_type describes your filesystem. When a

request is made to mount a filesystem onto a directory in your namespace,

the VFS will call the appropriate mount() method for the specific

filesystem.  New vfsmount referring to the tree returned by ->mount()

will be attached to the mountpoint, so that when pathname resolution

reaches the mountpoint it will jump into the root of that vfsmount.

 

You can see all filesystems that are registered to the kernel in the

file /proc/filesystems.

 

注册和挂载一个文件系统

=====================================

使用以下API来实现文件系统的注册和反注册:

functions:

 

   #include <linux/fs.h>

 

   extern int register_filesystem(struct file_system_type *);

   extern int unregister_filesystem(struct file_system_type *);

 

入参file_system_type结构体是具体的文件系统的描述。每当用户命名空间产生一个文件系统挂载请求时,VFS会调用对应文件系统的mount()方法。mount()方法返回新的fsmount树,将会与挂载点(mountpoint)相关联,以使当路径解析到挂载点(mountpoint)时,VFS就会跳转到vfsmount的根(root)。

 

你可以通过/proc/filesystems这个文件来查看所有注册到内核的文件系统。

 

struct file_system_type

-----------------------

 

This describes the filesystem. As of kernel 2.6.39, the following

members are defined:

 

struct file_system_type {

         const char *name;

         int fs_flags;

        struct dentry *(*mount) (struct file_system_type *, int,

                       const char *, void *);

        void (*kill_sb) (struct super_block *);

        struct module *owner;

        struct file_system_type * next;

        struct list_head fs_supers;

         struct lock_class_key s_lock_key;

         struct lock_class_key s_umount_key;

};

 

  name: the name of the filesystem type, such as "ext2", "iso9660",

         "msdos" and so on

 

  fs_flags: various flags (i.e. FS_REQUIRES_DEV, FS_NO_DCACHE, etc.)

 

  mount: the method to call when a new instance of this

         filesystem should be mounted

 

  kill_sb: the method to call when an instance of this filesystem

         should be shut down

 

  owner: for internal VFS use: you should initialize this to THIS_MODULE in

       most cases.

 

  next: for internal VFS use: you should initialize this to NULL

 

  s_lock_key, s_umount_key: lockdep-specific

 

The mount() method has the following arguments:

 

  struct file_system_type *fs_type: describes the filesystem, partly initialized

       by the specific filesystem code

 

  int flags: mount flags

 

  const char *dev_name: the device name we are mounting.

 

  void *data: arbitrary mount options, usually comes as an ASCII

         string (see "Mount Options" section)

 

The mount() method must return the root dentry of the tree requested by

caller.  An active reference to its superblock must be grabbed and the

superblock must be locked.  On failure it should return ERR_PTR(error).

 

The arguments match those of mount(2) and their interpretation

depends on filesystem type.  E.g. for block filesystems, dev_name is

interpreted as block device name, that device is opened and if it

contains a suitable filesystem image the method creates and initializes

struct super_block accordingly, returning its root dentry to caller.

 

->mount() may choose to return a subtree of existing filesystem - it

doesn't have to create a new one.  The main result from the caller's

point of view is a reference to dentry at the root of (sub)tree to

be attached; creation of new superblock is a common side effect.

 

The most interesting member of the superblock structure that the

mount() method fills in is the "s_op" field. This is a pointer to

a "struct super_operations" which describes the next level of the

filesystem implementation.

 

Usually, a filesystem uses one of the generic mount() implementations

and provides a fill_super() callback instead. The generic variants are:

 

  mount_bdev: mount a filesystem residing on a block device

 

  mount_nodev: mount a filesystem that is not backed by a device

 

  mount_single: mount a filesystem which shares the instance between

       all mounts

 

A fill_super() callback implementation has the following arguments:

 

  struct super_block *sb: the superblock structure. The callback

       must initialize this properly.

 

  void *data: arbitrary mount options, usually comes as an ASCII

         string (see "Mount Options" section)

 

  int silent: whether or not to be silent on error

 

file_system_type结构体

-----------------------

这一节描述文件系统。在Kernel 2.6.39中,定义了如下结构:

struct file_system_type {

         const char *name;

         int fs_flags;

        struct dentry *(*mount) (struct file_system_type *, int,

                       const char *, void *);

        void (*kill_sb) (struct super_block *);

        struct module *owner;

        struct file_system_type * next;

        struct list_head fs_supers;

         struct lock_class_key s_lock_key;

         struct lock_class_key s_umount_key;

};

name:文件系统名称,比如“ext2”,“iso9660”,“msdos”等

fs_flagsflags变量(比如FS_REQUIRES_DEVFS_NO_DCACHE等)

mount:挂载文件系统时被调用

kill_sb:关闭文件系统时被调用

ownerVFS内部使用:大多数情况下,你必须将它初始化为THIS_MODULE

nextVFS内部使用:你必须初始化为NULL

s_lock_keys_umount_key专用

 

mount()方法所需参数如下:

struct file_system_type *fs_type:用于描述文件系统,部分被文件系统初始化

int flags:挂载参数

const char *dev_name:挂载的设备名

void *data:任意的挂载点,通常是一个ASCII字符串(见“Mount Options”部分)

 

mount()方法必须返回文件树的根(root dentry)。对一个超级块(superblock)的主动引用必须抢占,并且此时超级块(superblock)会被锁定。如果执行失败,应该返回ERR_PTRerror)。

 

mount(2)方法的参数含义因具体的文件系统类型而异。举例来说,对于块设备文件系统(block filesystems),dev_name代表块设备名称(block device name),设备会被打开,并且如果它包含有合适的文件系统镜像,则mount()方法会创建并初始化合适的super_block结构体,同时向调用者返回root dentry

 

mount()也可以返回一个现有文件系统的子树——它并非一定要新创建一个文件系统。


转载于:https://my.oschina.net/jiuyueshouyi/blog/394898

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值