文件系统相关的笔记

最新推荐文章于 2022-11-10 01:00:29 发布

foryun_wasaima

最新推荐文章于 2022-11-10 01:00:29 发布

阅读量208

点赞数

文章标签： linux 内核操作系统

本文链接：https://blog.csdn.net/qq_37276615/article/details/106978476

版权

task_struct--->open file1
               (--->对应一个file结构体--->其中对应的file_operations和文件进行交互
                   --->一个file结构体对应一个dentry结构体)（反过来是多对一）
                       --->一个dentry对应一个inode结构体（反过来是多对一）
               最终是将file下的fops赋给所有打开文件的inode的i_fops
==============================================================================================================================================
page cache是以inode为单位，每一个inode对应一个page_cache

如果打开的文件是1M,我们要从16kB处开始读取，先要从page cache（记录在radix树）中查看是否命中
如果命中，则不用去读取硬盘，如果不命中则在内存中申请一页，把这页挂在inode的radix树上，把这页从硬盘中读进去，
在这个过程中会使用到addr_space_operation这个成员（注意file_ooperations与addr_space_operations之间的关系，前者hook up到vfs,后者完成page_cache访问，包括bio发起）
===============================================================================================
每一个inode就会对用一段地址空间：（inode->address_space--->addr_space_operations），这个成员就是read/write pages,将磁盘中内容读到page_cache或者将page_cache中的内容写回到磁盘

实现一个文件系统的步骤：
1.注册一个文件系统类型（register_filesystem(&file_system_type)）
   .mount->fill_super====>读取硬盘的super_block;初始化super_block:s_op;root inode初始化（i_ops和i_fop）;根目录的dentry:sb->s_root} umount:s_op.kill_sb()做清理
2.super_block 文件系统的总体信息：super_operations.alloc_inode(),destroy_inode()
3.inode(dir或者实体文件) i_ops:inode_operations .create/lookup/mkdir
4.dentry...
=================================================================================================================================================
注册文件系统的几个函数指针的定义：
file_system_type
struct file_system_type simplefs_fs_type = {
   .owner = THIS_MODULE,
   .name = "simplefs",
   .mount = simplefs_mount,/*挂载文件系统时被调用*/
   .kill_sb = simplefs_kill_superblock,
   .fs_flags = FS_REQUIRES_DEV,
};
   --->simplefs_mount
       --->ret = mount_bdev(fs_type, flags, dev_name, data, simplefs_fill_super/*钩子函数，填充超级块*/);/*通用mount函数*/
           --->simplefs_fill_super--->int simplefs_fill_super(struct super_block *sb, void *data, int silent)
{
   struct inode *root_inode;
   struct buffer_head *bh;
   struct simplefs_super_block *sb_disk;
   int ret = -EPERM;

bh = sb_bread(sb, SIMPLEFS_SUPERBLOCK_BLOCK_NUMBER);/*读取超级块*/
BUG_ON(!bh);

sb_disk = (struct simplefs_super_block *)bh->b_data;

printk(KERN_INFO "The magic number obtained in disk is: [%llu]\n",
sb_disk->magic);

   if (unlikely(sb_disk->magic != SIMPLEFS_MAGIC/*验证文件系统的合法性*/)) {
       printk(KERN_ERR
       "The filesystem that you try to mount is not of type simplefs. Magicnumber mismatch.");
       goto release;
   }

   if (unlikely(sb_disk->block_size != SIMPLEFS_DEFAULT_BLOCK_SIZE)) {
       printk(KERN_ERR
       "simplefs seem to be formatted using a non-standard block size.");
       goto release;
   }

   printk(KERN_INFO
   "simplefs filesystem of version [%llu] formatted with a block size of [%llu] detected in the device.\n",
   sb_disk->version, sb_disk->block_size);

/* A magic number that uniquely identifies our filesystem type */
sb->s_magic = SIMPLEFS_MAGIC;

/* For all practical purposes, we will be using this s_fs_info as the super block */
sb->s_fs_info = sb_disk;

   sb->s_maxbytes = SIMPLEFS_DEFAULT_BLOCK_SIZE;
   sb->s_op = &simplefs_sops;/*找到对应的s_ops*/   {

                                                   }

   root_inode = new_inode(sb);/*在mount过程中初始化根节点*/
   root_inode->i_ino = SIMPLEFS_ROOTDIR_INODE_NUMBER;
   inode_init_owner(root_inode, NULL, S_IFDIR);
   root_inode->i_sb = sb;
   root_inode->i_op = &simplefs_inode_ops;/*赋值对应的inode_ops*/
   {
       static struct inode_operations simplefs_inode_ops = {
           .create = simplefs_create,（inode分为两种，一种是dir,一种是普通文件，对应赋值对应的fops）
           .lookup = simplefs_lookup,
           .mkdir = simplefs_mkdir,
       };
   }
   root_inode->i_fop = &simplefs_dir_operations;/*赋值fops*/
   {
       const struct file_operations simplefs_dir_operations = {
           .owner = THIS_MODULE,
#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 11, 0)
           .iterate = simplefs_iterate,
           {一般还是使用高版本的内核吧
               #if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 11, 0)
               static int simplefs_iterate(struct file *filp, struct dir_context *ctx)
               #else
               static int simplefs_readdir(struct file *filp, void *dirent, filldir_t filldir)
               #endif
               {
                   loff_t pos;
                   struct inode *inode;
                   struct super_block *sb;
                   struct buffer_head *bh;
                   struct simplefs_inode *sfs_inode;
                   struct simplefs_dir_record *record;
                   int i;

               #if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 11, 0)
                   pos = ctx->pos;
               #else
                   pos = filp->f_pos;
               #endif
                   inode = filp->f_dentry->d_inode;
                   sb = inode->i_sb;

                   if (pos) {
                       /* FIXME: We use a hack of reading pos to figure if we have filled in all data.
                       * We should probably fix this to work in a cursor based model and
                       * use the tokens correctly to not fill too many data in each cursor based call */
                       return 0;
                   }

                   sfs_inode = SIMPLEFS_INODE(inode);

                   if (unlikely(!S_ISDIR(sfs_inode->mode))) {
                       printk(KERN_ERR
                           "inode [%llu][%lu] for fs object [%s] not a directory\n",
                           sfs_inode->inode_no, inode->i_ino,
                           filp->f_dentry->d_name.name);
                       return -ENOTDIR;
                   }

                   bh = sb_bread(sb, sfs_inode->data_block_number);
                   BUG_ON(!bh);

                   record = (struct simplefs_dir_record *)bh->b_data;
                   for (i = 0; i < sfs_inode->dir_children_count; i++) {
               #if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 11, 0)
                       dir_emit(ctx, record->filename, SIMPLEFS_FILENAME_MAXLEN,
                           record->inode_no, DT_UNKNOWN);
                       ctx->pos += sizeof(struct simplefs_dir_record);
               #else
                       filldir(dirent, record->filename, SIMPLEFS_FILENAME_MAXLEN, pos,
                           record->inode_no, DT_UNKNOWN);
                       filp->f_pos += sizeof(struct simplefs_dir_record);
               #endif
                       pos += sizeof(struct simplefs_dir_record);
                       record++;
                   }
                   brelse(bh);

                   return 0;
               }
           }
#else
           .readdir = simplefs_readdir,
#endif
       };
   }
   root_inode->i_atime = root_inode->i_mtime = root_inode->i_ctime =
   CURRENT_TIME;

root_inode->i_private =
simplefs_get_inode(sb, SIMPLEFS_ROOTDIR_INODE_NUMBER);

   /* TODO: move such stuff into separate header. */
#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 3, 0)
   sb->s_root = d_make_root(root_inode);
#else
   sb->s_root = d_alloc_root(root_inode);
   if (!sb->s_root)
       iput(root_inode);
#endif

   if (!sb->s_root) {
       ret = -ENOMEM;
       goto release;
   }

ret = 0;
release:
brelse(bh);

return ret;
}
================================================================================================================================
查看linux下文件系统类型
cat /proc/filesystem
我dd if=/dev/zero of=image bs=4096 count=100

mkfs_simplefs image

mount 到mnt ===>mount -t -o loop image /mnt

查看一个硬盘的sb/bitmap等等信息命令dumpe2fs image
访问硬盘的第18sb dd if=image bs=4096 skip=18 | hexdump -C -n 32

===========================================================================================
内核文件系统常用工具：
mkfs dumpe2fs dd
blkcat
debugfs
-R 'icheck bs' /dev/sda1 根据块号反推出inode号
-R 'ncheck bs' /dev/sda1 根据inode号来反推出路径
例如从硬盘中读出一个文件的内容：
debugfs -R 'stat '/home/baohua/main.c' /dev/sda1
在最后一行显示了内容的sb，直接使用blkcat /dev/sda1 sb 显示文件内容
需要安装ap
使用dd命令也可将硬盘的内容读取出来
dd if=/dev/sda1 of=1 skip=$((sb * 8)) bs=512c count=1

通过fdisk /dev/sda可以查看硬盘的扇区起始
============================================================================================
使用free命令查看内存统计信息-----》nr_blockdev_pages(void)
其中cached表示通过mount访问硬盘的inode，访问文件内容，如果直接访问硬盘或通过mount访问文件原数据则用buffer表示，
但是直接访问硬盘时，若使用O_DIRECT、O_SYNC没有cached功能
==============================================================================================
app访问page_cache--->addr_space_operations--->read/write page--->通过文件系统将page转化为硬盘的block

例如访问一个16K文件，它的data block:100,110-111,300
每一个连续的block对应一个bio
struct bio {

   struct bio       *bi_next;   /* request queue link */
   struct block_device   *bi_bdev;
   unsigned long       bi_flags;   /* status, command, etc */
   unsigned long       bi_rw;       /* bottom bits READ/WRITE,
                       * top bits priority
                       */

struct bvec_iter bi_iter;

   /* Number of segments in this BIO after
   * physical address coalescing is performed.
   */
   unsigned int       bi_phys_segments;

   /*
   * To keep track of the max segment size, we account for the
   * sizes of the first and last mergeable segments in this bio.
   */
   unsigned int       bi_seg_front_size;
   unsigned int       bi_seg_back_size;

atomic_t bi_remaining;

bio_end_io_t *bi_end_io;

   void           *bi_private;
#ifdef CONFIG_BLK_CGROUP
   /*
   * Optional ioc and css associated with this bio. Put on bio
   * release. Read comment on top of bio_associate_current().
   */
   struct io_context   *bi_ioc;
   struct cgroup_subsys_state *bi_css;
#endif
   union {
#if defined(CONFIG_BLK_DEV_INTEGRITY)
       struct bio_integrity_payload *bi_integrity; /* data integrity */
#endif
   };

unsigned short bi_vcnt; /* how many bio_vec's */

   /*
   * Everything starting with bi_max_vecs will be preserved by bio_reset()
   */

unsigned short bi_max_vecs; /* max bvl_vecs we can hold */

atomic_t bi_cnt; /* pin count */

struct bio_vec *bi_io_vec; /* the actual vec list */

struct bio_set *bi_pool;

   /*
   * We can inline a number of vecs at the end of the bio, to avoid
   * double allocations for a small number of bio_vecs. This member
   * MUST obviously be kept at the very end of the bio.
   */
   struct bio_vec       bi_inline_vecs[0];
};
例子：对txt写入数据，文件系统会转化为对块设备上扇区的访问，调用ll_rw_block函数，从这个函数开始进入设备层
file_system_type
   --->mount
       --->xxx_mount
           --->(mount_bdev(fs_type, flags, dev_name, data, ext4_fill_super);)
           --->ext4_fill_super
               --->sb->s_op = &ext4_sops;
                   --->ext4_dir_inode_operations
                       --->ll_rw_block
                           --->submit_bh //提交写标志的bufferhead,获取到对应的bio并提交
                               --->submit_bio
                                   --->generic_make_request(),把bio数据提交到相应块设备的请求队列
                                       --->__generic_make_request()首先由bio对应的block_device获取申请队列q，然后要检查对应的设备是不是分区，如果是分区的话要将扇区地址进行重新计算，最后调用q的成员函数make_request_fn完成bio的递交
                                           --->__make_request()使用内核中电梯调度算法
注册一个块设备：
步骤如下:

1在入口函数中:

1)使用register_blkdev()创建一个块设备
2) blk_init_queue()使用分配一个申请队列,并赋申请队列处理函数
3)使用alloc_disk()分配一个gendisk结构体
4)设置gendisk结构体的成员
->4.1)设置成员参数(major、first_minor、disk_name、fops)
->4.2)设置queue成员,等于之前分配的申请队列
->4.3)通过set_capacity()设置capacity成员,等于扇区数
5)使用kzalloc()来获取缓存地址,用做扇区
6)使用add_disk()注册gendisk结构体
2在申请队列的处理函数中

1) while循环使用elv_next_request()获取申请队列中每个未处理的申请
2)使用rq_data_dir()来获取每个申请的读写命令标志,为 0(READ)表示读, 为1(WRITE)表示写
3)使用memcp()来读或者写扇区(缓存)
4)使用end_request()来结束获取的每个申请
3在出口函数中

1)使用put_disk()和del_gendisk()来注销,释放gendisk结构体
2)使用kfree()释放磁盘扇区缓存
3)使用blk_cleanup_queue()清除内存中的申请队列
4)使用unregister_blkdev()卸载块设备
==============================================================================================
继续上面访问16KB文件：
datablock为100 100-111 300
每个连续的block对应一个bio,每个bio对应一个request
bio1 ----plug------- request1
bio2 ----plug------- request2
bio3 ----plug------- request3
假如再访问到第20KB,datablock为101 对应一个bio4,会将该bio对应的request merge到request1
===============================================================================================
多个个进程的plug泄洪就来到电梯调度层，进一步的排序和合并
最后到disk的request_queue
====================================================================================
查看硬盘的调度算法：
cd /sys/block/sda/queue
cat scheduler

ionice设置进程优先级
ionice -c 2 -n 0 dd if=/dev/sda of=/dev/null&,设置优先级为0
使用iotop来查看读写速率
===================================================================================
不同组的速率修改
cd /sys/fs/cgroup/blkio/
mkdir A B
cd A ,cat blkio.weight

创建两个进程查看速率
cgexec -g blkio:A dd if=/dev/sda of=/dev/null iflag=direct &
cgexec -g blkio:B dd if=/dev/sda of=/dev/null iflag=direct &

iotop

改变某一个组的权重

foryun_wasaima

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
文件系统相关的笔记

task_struct--->open file1 (--->对应一个file结构体--->其中对应的file_operations和文件进行交互 --->一个file结构体对应一个dentry结构体)（反过来是多对一） --->一个dentry对应一个inode结构体（反过来是多对一）最终是将file下的fops赋给所有打开...
复制链接

扫一扫