Linux教程網 >> Linux綜合 >> Linux內核 >> linux內核md源代碼解讀九陣列raid5同步函數sync_reque

linux內核md源代碼解讀九陣列raid5同步函數sync_reque

日期：2017/3/3 16:17:28 编辑：Linux內核

歡迎使用ueditor!

我們再來回顧一下整個場景：

1）在運行陣列的時候調用md_wakeup_thread喚醒主線程

2）主線程調用md_check_recovery檢查同步
3）md_check_recovery函數中檢查需要同步調用md_register_thread創建同步線程

4）同步線程調用md_do_sync函數處理同步過程

5）md_do_sync做同步過程的管理，一步步推同步點，記錄同步完成點，調用sync_request進行各種陣列級別的同步

6）sync_request做同步數據流的派發工作

對於raid5陣列來說，同步是按struct stripe_head為基本單位進行派發的。打個比方，我們現在要把一個土豆炸成薯片，這時首先要把土豆切成片，再把土豆片放到油鍋裡炸，炸開了再撈起來裝盒。那麼md_do_sync的作用就相當於把土豆切片，這個切片的大小也就是STRIPE_SECTORS大小了。sync_request接收到這個土豆片之後不能立即下鍋，要用struct stripe_head把它包裝一下，這就類似要在土豆片外面刷一層調料。然後再調用handle_stripe進行處理並最終下發到磁盤，這就類似於把土豆片放在鍋裡油炸加工的過程。最後調用bitmap_cond_end_sync保存同步完成記錄，這就類似回收土豆片並盒裝。

這裡還有一個細節，就是為了周期性保存同步結果，每隔幾秒鐘都會等待所有同步請求返回再記錄下來。這就類似於這個炸土豆的鍋很小，一次只能放20片土豆，一開始我們不停的放薯片，等放滿20片，我們就停下來直接到所有土豆都熟了然後一次性打撈上來，然後再放20片，重復上面的過程。

理解上以上的處理機制，再看代碼就非常容易了。

4453 static inline sector_t sync_request(struct mddev *mddev, sector_t sector_nr, int *skipped, int go_faster)  
4454 {  
4455         struct r5conf *conf = mddev->private;  
4456         struct stripe_head *sh;  
4457         sector_t max_sector = mddev->dev_sectors;  
4458         sector_t sync_blocks;  
4459         int still_degraded = 0;  
4460         int i;  
4461  
4462         if (sector_nr >= max_sector) {  
4463                 /* just being told to finish up .. nothing much to do */
4464  
4465                 if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery)) {  
4466                         end_reshape(conf);  
4467                         return 0;  
4468                 }  
4469  
4470                 if (mddev->curr_resync < max_sector) /* aborted */
4471                         bitmap_end_sync(mddev->bitmap, mddev->curr_resync,  
4472                                         &sync_blocks, 1);  
4473                 else /* completed sync */
4474                         conf->fullsync = 0;  
4475                 bitmap_close_sync(mddev->bitmap);  
4476  
4477                 return 0;  
4478         }

這一部分是處理同步完成的，同步完成有兩種情況，一種是正常完成的，另一種是被中斷的。

4462行，同步完成。

4470行，同步中斷，通知bitmap最後一次同步是abort

4474行，同步成功完成，更新fullsync為0，fullsync表示陣列要強制完全同步。

4475行，通知bitmap同步完成。

雖然這部分代碼是放在函數比較靠前的位置，但是這部分代碼是在md_do_sync退出同步循環之後的7521行的sync_request調用到的。接下來這部分才是md_do_sync循環中sync_request會執行到的部分：

4480         /* Allow raid5_quiesce to complete */
4481         wait_event(conf->wait_for_overlap, conf->quiesce != 2);  
4482  
4483         if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery))  
4484                 return reshape_request(mddev, sector_nr, skipped);  
4485  
4486         /* No need to check resync_max as we never do more than one 
4487          * stripe, and as resync_max will always be on a chunk boundary, 
4488          * if the check in md_do_sync didn't fire, there is no chance 
4489          * of overstepping resync_max here 
4490          */
4491  
4492         /* if there is too many failed drives and we are trying 
4493          * to resync, then assert that we are finished, because there is 
4494          * nothing we can do. 
4495          */
4496         if (mddev->degraded >= conf->max_degraded &&  
4497             test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {  
4498                 sector_t rv = mddev->dev_sectors - sector_nr;  
4499                 *skipped = 1;  
4500                 return rv;  
4501         }  
4502         if (!bitmap_start_sync(mddev->bitmap, sector_nr, &sync_blocks, 1) &&  
4503             !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&  
4504             !conf->fullsync && sync_blocks >= STRIPE_SECTORS) {  
4505                 /* we can skip this block, and probably more */
4506                 sync_blocks /= STRIPE_SECTORS;  
4507                 *skipped = 1;  
4508                 return sync_blocks * STRIPE_SECTORS; /* keep things rounded to whole stripes */
4509         }  
4510  
4511         bitmap_cond_end_sync(mddev->bitmap, sector_nr);  
4512  
4513         sh = get_active_stripe(conf, sector_nr, 0, 1, 0);  
4514         if (sh == NULL) {  
4515                 sh = get_active_stripe(conf, sector_nr, 0, 0, 0);  
4516                 /* make sure we don't swamp the stripe cache if someone else 
4517                  * is trying to get access 
4518                  */
4519                 schedule_timeout_uninterruptible(1);  
4520         }  
4521         /* Need to check if array will still be degraded after recovery/resync 
4522          * We don't need to check the 'failed' flag as when that gets set, 
4523          * recovery aborts. 
4524          */
4525         for (i = 0; i < conf->raid_disks; i++)  
4526                 if (conf->disks[i].rdev == NULL)  
4527                         still_degraded = 1;  
4528  
4529         bitmap_start_sync(mddev->bitmap, sector_nr, &sync_blocks, still_degraded);  
4530  
4531         set_bit(STRIPE_SYNC_REQUESTED, &sh->state);  
4532  
4533         handle_stripe(sh);  
4534         release_stripe(sh);  
4535  
4536         return STRIPE_SECTORS;  
4537 }

4481行，每一個wait_event都有一個同步的故事，wait_event就像是十字路口的紅綠燈，沒有紅綠燈的話兩邊的車都以勻速前進很快就有悲劇發生。同樣在linux內核中也有這樣的問題，多個線程非原子地訪問同一個資源時，都會發生不可預料的結果。這裡的wait_event也是因為有了資源訪問沖突，搜索wait_for_overlap發現有兩種情況：一是正常讀寫請求，二是同步請求。即相當於兩個寫者，或者一個讀者一個寫者，所以就需要按次序去訪問資源。4492行，太多磁盤fail，同步就沒必要進行下去了。4496行，同步且太多fail盤，同步就是構建數據冗余，如果冗余盤都沒了，就沒必要玩下去了4498-4500行，通知同步完成。4502行，通知bitmap同步開始4506行，很開心，bitmap說已經同步過了，那就跳過。4511行，處理20土豆片炸好撈上來的情況4513行，申請struct stripe_head4525行，判斷陣列是否降級，既然降級了為什麼還要同步呢。前面講過，同步就是構建數據冗余，對於Raid5來說只有一個數據冗余，所以降級了就不用同步了。但是對raid6來說有兩份冗余數據，只有一個數據盤fail還可以進行同步，但是不更新bitmap。4529行，通知bitmap開始同步4531行，設置struct stripe_head同步標志，handle_stripe根據這個標志進行具體處理4533行，開始處理具體的數據流，即炸土豆的過程4536行，返回同步大小為STRIPE_SECTORS。下一節開始介紹raid5數據流，轟轟烈烈的炸土豆秘方將為大家揭曉。

出處：http://blog.csdn.net/liumangxiong