linux内核奇遇记之md源代码解读之九阵列raid5同步函数sync_request

最新推荐文章于 2024-08-24 08:15:58 发布

wh8_2011

最新推荐文章于 2024-08-24 08:15:58 发布

阅读量742

点赞数

分类专栏： linux-drive 文章标签： linux

linux-drive 专栏收录该内容

14 篇文章 0 订阅

订阅专栏

linux内核奇遇记之md源代码解读之九阵列raid5同步函数sync_request

转载请注明出处：http://blog.csdn.net/liumangxiong

我们再来回顾一下整个场景：

1）在运行阵列的时候调用md_wakeup_thread唤醒主线程

2）主线程调用md_check_recovery检查同步
3）md_check_recovery函数中检查需要同步调用md_register_thread创建同步线程

4）同步线程调用md_do_sync函数处理同步过程

5）md_do_sync做同步过程的管理，一步步推同步点，记录同步完成点，调用sync_request进行各种阵列级别的同步

6）sync_request做同步数据流的派发工作

对于raid5阵列来说，同步是按struct stripe_head为基本单位进行派发的。打个比方，我们现在要把一个土豆炸成薯片，这时首先要把土豆切成片，再把土豆片放到油锅里炸，炸开了再捞起来装盒。那么md_do_sync的作用就相当于把土豆切片，这个切片的大小也就是STRIPE_SECTORS大小了。sync_request接收到这个土豆片之后不能立即下锅，要用struct stripe_head把它包装一下，这就类似要在土豆片外面刷一层调料。然后再调用handle_stripe进行处理并最终下发到磁盘，这就类似于把土豆片放在锅里油炸加工的过程。最后调用bitmap_cond_end_sync保存同步完成记录，这就类似回收土豆片并盒装。

这里还有一个细节，就是为了周期性保存同步结果，每隔几秒钟都会等待所有同步请求返回再记录下来。这就类似于这个炸土豆的锅很小，一次只能放20片土豆，一开始我们不停的放薯片，等放满20片，我们就停下来直接到所有土豆都熟了然后一次性打捞上来，然后再放20片，重复上面的过程。

理解上以上的处理机制，再看代码就非常容易了。

[cpp]view plaincopy 
     
 4453 static inline sector_t sync_request(struct mddev *mddev, sector_t sector_nr, int *skipped, int go_faster)  
 4454 {  
 4455         struct r5conf *conf = mddev->private;  
 4456         struct stripe_head *sh;  
 4457         sector_t max_sector = mddev->dev_sectors;  
 4458         sector_t sync_blocks;  
 4459         int still_degraded = 0;  
 4460         int i;  
 4461  
 4462         if (sector_nr >= max_sector) {  
 4463                 /* just being told to finish up .. nothing much to do */  
 4464  
 4465                 if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery)) {  
 4466                         end_reshape(conf);  
 4467                         return 0;  
 4468                 }  
 4469  
 4470                 if (mddev->curr_resync < max_sector) /* aborted */  
 4471                         bitmap_end_sync(mddev->bitmap, mddev->curr_resync,  
 4472                                         &sync_blocks, 1);  
 4473                 else /* completed sync */  
 4474                         conf->fullsync = 0;  
 4475                 bitmap_close_sync(mddev->bitmap);  
 4476  
 4477                 return 0;  
 4478         }  

这一部分是处理同步完成的，同步完成有两种情况，一种是正常完成的，另一种是被中断的。

4462行，同步完成。

4470行，同步中断，通知bitmap最后一次同步是abort

4474行，同步成功完成，更新fullsync为0，fullsync表示阵列要强制完全同步。

4475行，通知bitmap同步完成。

虽然这部分代码是放在函数比较靠前的位置，但是这部分代码是在md_do_sync退出同步循环之后的7521行的sync_request调用到的。接下来这部分才是md_do_sync循环中sync_request会执行到的部分：

[cpp]view plaincopy 
     
 4480         /* Allow raid5_quiesce to complete */  
 4481         wait_event(conf->wait_for_overlap, conf->quiesce != 2);  
 4482  
 4483         if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery))  
 4484                 return reshape_request(mddev, sector_nr, skipped);  
 4485  
 4486         /* No need to check resync_max as we never do more than one 
 4487          * stripe, and as resync_max will always be on a chunk boundary, 
 4488          * if the check in md_do_sync didn't fire, there is no chance 
 4489          * of overstepping resync_max here 
 4490          */  
 4491  
 4492         /* if there is too many failed drives and we are trying 
 4493          * to resync, then assert that we are finished, because there is 
 4494          * nothing we can do. 
 4495          */  
 4496         if (mddev->degraded >= conf->max_degraded &&  
 4497             test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {  
 4498                 sector_t rv = mddev->dev_sectors - sector_nr;  
 4499                 *skipped = 1;  
 4500                 return rv;  
 4501         }  
 4502         if (!bitmap_start_sync(mddev->bitmap, sector_nr, &sync_blocks, 1) &&  
 4503             !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&  
 4504             !conf->fullsync && sync_blocks >= STRIPE_SECTORS) {  
 4505                 /* we can skip this block, and probably more */  
 4506                 sync_blocks /= STRIPE_SECTORS;  
 4507                 *skipped = 1;  
 4508                 return sync_blocks * STRIPE_SECTORS; /* keep things rounded to whole stripes */  
 4509         }  
 4510  
 4511         bitmap_cond_end_sync(mddev->bitmap, sector_nr);  
 4512  
 4513         sh = get_active_stripe(conf, sector_nr, 0, 1, 0);  
 4514         if (sh == NULL) {  
 4515                 sh = get_active_stripe(conf, sector_nr, 0, 0, 0);  
 4516                 /* make sure we don't swamp the stripe cache if someone else 
 4517                  * is trying to get access 
 4518                  */  
 4519                 schedule_timeout_uninterruptible(1);  
 4520         }  
 4521         /* Need to check if array will still be degraded after recovery/resync 
 4522          * We don't need to check the 'failed' flag as when that gets set, 
 4523          * recovery aborts. 
 4524          */  
 4525         for (i = 0; i < conf->raid_disks; i++)  
 4526                 if (conf->disks[i].rdev == NULL)  
 4527                         still_degraded = 1;  
 4528  
 4529         bitmap_start_sync(mddev->bitmap, sector_nr, &sync_blocks, still_degraded);  
 4530  
 4531         set_bit(STRIPE_SYNC_REQUESTED, &sh->state);  
 4532  
 4533         handle_stripe(sh);  
 4534         release_stripe(sh);  
 4535  
 4536         return STRIPE_SECTORS;  
 4537 }  

4481行，每一个wait_event都有一个同步的故事，wait_event就像是十字路口的红绿灯，没有红绿灯的话两边的车都以匀速前进很快就有悲剧发生。同样在linux内核中也有这样的问题，多个线程非原子地访问同一个资源时，都会发生不可预料的结果。这里的wait_event也是因为有了资源访问冲突，搜索wait_for_overlap发现有两种情况：一是正常读写请求，二是同步请求。即相当于两个写者，或者一个读者一个写者，所以就需要按次序去访问资源。

4492行，太多磁盘fail，同步就没必要进行下去了。

4496行，同步且太多fail盘，同步就是构建数据冗余，如果冗余盘都没了，就没必要玩下去了

4498-4500行，通知同步完成。

4502行，通知bitmap同步开始

4506行，很开心，bitmap说已经同步过了，那就跳过。

4511行，处理20土豆片炸好捞上来的情况

4513行，申请struct stripe_head

4525行，判断阵列是否降级，既然降级了为什么还要同步呢。前面讲过，同步就是构建数据冗余，对于Raid5来说只有一个数据冗余，所以降级了就不用同步了。但是对raid6来说有两份冗余数据，只有一个数据盘fail还可以进行同步，但是不更新bitmap。

4529行，通知bitmap开始同步

4531行，设置struct stripe_head同步标志，handle_stripe根据这个标志进行具体处理

4533行，开始处理具体的数据流，即炸土豆的过程

4536行，返回同步大小为STRIPE_SECTORS。