MooseFS是一個分布式的文件系統,有關它的具體信息,我這裡就不多做介紹了,大家可以去參考我之前寫過的三篇博文:
分布式文件系統之MooseFS----介紹
分布式文件系統之MooseFS----部署
分布式文件系統之MooseFS----管理優化
這裡簡單先介紹一下,目前我們這套存儲的架構設計:
服務器的配置情況:
mfsmaster主和mfsmaster備
品牌:Dell PowerEdge R720
CPU:Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz
內存:8 * 8G
磁盤:300G * 6
RAID級別:10
mfschunkserver
品牌:Dell PowerEdge R720
CPU:Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz
內存:4 * 8G
磁盤:2T * 6
RAID級別:10
架構介紹:
整個MooseFS的架構,是以兩台mfsmaster,一台主,一台備,中間heartbeat+drbd技術來做該服務的高可用,後端放置3台存儲節點,負責提供數據存儲服務。
災難演練計劃:
本次災難演練主要從幾個會觸發故障的因素出發,采用人工模擬故障的方式來真實觸發故障,然後記錄故障發生前後相關服務的狀態,相關服務的日志記錄和故障發生前後客戶端的服務使用情況等,然後通過分析記錄的信息來判斷服務的高可用性是否OK。
幾個記錄點的目標:
通過記錄故障發生前後的服務狀態來判斷服務切換是否正常
通過記錄發生故障時相關服務的日志記錄來分析故障發生時,高可用軟件的決策和動作
通過記錄發生故障前後客戶端的服務使用情況,來判斷故障對客戶端的影響程度
模擬如下幾個災難:
1、heartbeat服務崩潰
2、mfsmaster主對外服務的網絡中斷
3、mfsmaster主的drbd同步網絡中斷
備注:
在測試之前,我會在 mfs 客戶端放一個持續輸入腳本,它會以一秒的間隔向掛載的mfs目錄中的某個文件進行文字輸入,以此用來判斷 mfs 的恢復時間。
[root@web-phy13-rj ~]# for i in {1..20000};do echo `date` $i >> /mfsdata/testxxxx;sleep 1;done
故障1:heartbeat服務崩潰
模擬故障:
[root@kvm-phy11-rj ~]# /etc/init.d/heartbeat stop Stopping High-Availability services: Done.
故障發生之前的mfsmaster主備服務器狀態:
mfsmaster主:
[root@mfs-master01-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}' mfs 48931 1 0 20:03 ? 00:00:00 /usr/local/mfs/sbin/mfsmaster start [root@mfs-master01-rj ~]# ip a|grep em4 5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 inet 10.1.1.27/24 brd 10.1.1.255 scope global em4 inet 10.1.1.26/24 brd 10.1.1.255 scope global secondary em4 [root@mfs-master01-rj ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2015-06-25 17:20:34 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----- ns:10824 nr:3912 dw:12716 dr:16539 al:11 bm:14 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
mfsmaster備:
[root@mfs-master02-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}' [root@mfs-master02-rj ~]# ip a|grep em4 5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 inet 10.1.1.28/24 brd 10.1.1.255 scope global em4 [root@mfs-master02-rj ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2015-06-25 17:20:33 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----- ns:3912 nr:10896 dw:106643992 dr:16616 al:10 bm:6414 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
故障發生時,mfsmaster主備服務器的日志信息:
mfsmaster主:
Jun 25 20:05:29 mfs-master01-rj.btr heartbeat: [48425]: info: Heartbeat shutdown in progress. (48425) Jun 25 20:05:29 mfs-master01-rj.btr heartbeat: [48980]: info: Giving up all HA resources. ResourceManager(default)[48993]: 2015/06/25_20:05:29 info: Releasing resource group: mfs-master01-rj.btr IPaddr::10.1.1.26/24/em4 drbddisk::drbd Filesystem::/dev/drbd0::/usr/local/mfs::ext4 mfsmaster ResourceManager(default)[48993]: 2015/06/25_20:05:29 info: Running /etc/ha.d/resource.d/mfsmaster stop ResourceManager(default)[48993]: 2015/06/25_20:05:31 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /usr/local/mfs ext4 stop Filesystem(Filesystem_/dev/drbd0)[49056]: 2015/06/25_20:05:31 INFO: Running stop for /dev/drbd0 on /usr/local/mfs Filesystem(Filesystem_/dev/drbd0)[49056]: 2015/06/25_20:05:31 INFO: Trying to unmount /usr/local/mfs Filesystem(Filesystem_/dev/drbd0)[49056]: 2015/06/25_20:05:31 INFO: unmounted /usr/local/mfs successfully /usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[49048]: 2015/06/25_20:05:31 INFO: Success ResourceManager(default)[48993]: 2015/06/25_20:05:31 info: Running /etc/ha.d/resource.d/drbddisk drbd stop ResourceManager(default)[48993]: 2015/06/25_20:05:31 info: Running /etc/ha.d/resource.d/IPaddr 10.1.1.26/24/em4 stop IPaddr(IPaddr_10.1.1.26)[401]: 2015/06/25_20:05:31 INFO: IP status = ok, IP_CIP= /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.1.1.26)[373]: 2015/06/25_20:05:31 INFO: Success Jun 25 20:05:31 mfs-master01-rj.btr heartbeat: [48980]: info: All HA resources relinquished. Jun 25 20:05:32 mfs-master01-rj.btr heartbeat: [48425]: WARN: 1 lost packet(s) for [mfs-master02-rj.btr] [2673:2675] Jun 25 20:05:32 mfs-master01-rj.btr heartbeat: [48425]: info: No pkts missing from mfs-master02-rj.btr! Jun 25 20:05:32 mfs-master01-rj.btr heartbeat: [48425]: info: killing /usr/lib64/heartbeat/ipfail process group 48439 with signal 15 Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: killing HBFIFO process 48428 with signal 15 Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: killing HBWRITE process 48429 with signal 15 Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: killing HBREAD process 48430 with signal 15 Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: killing HBWRITE process 48431 with signal 15 Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: killing HBREAD process 48432 with signal 15 Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: killing HBWRITE process 48433 with signal 15 Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: killing HBREAD process 48434 with signal 15 Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: killing HBWRITE process 48435 with signal 15 Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: killing HBREAD process 48436 with signal 15 Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: Core process 48430 exited. 9 remaining Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: Core process 48433 exited. 8 remaining Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: Core process 48431 exited. 7 remaining Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: Core process 48428 exited. 6 remaining Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: Core process 48432 exited. 5 remaining Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: Core process 48434 exited. 4 remaining Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: Core process 48435 exited. 3 remaining Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: Core process 48436 exited. 2 remaining Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: Core process 48429 exited. 1 remaining Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: mfs-master01-rj.btr Heartbeat shutdown complete.mfsmaster備:
Jun 25 20:05:31 mfs-master02-rj.btr heartbeat: [44613]: info: Received shutdown notice from 'mfs-master01-rj.btr'. Jun 25 20:05:31 mfs-master02-rj.btr heartbeat: [44613]: info: Resources being acquired from mfs-master01-rj.btr. Jun 25 20:05:31 mfs-master02-rj.btr heartbeat: [47668]: info: acquire local HA resources (standby). Jun 25 20:05:31 mfs-master02-rj.btr heartbeat: [47669]: info: No local resources [/usr/share/heartbeat/ResourceManager listkeys mfs-master02-rj.btr] to acquire. Jun 25 20:05:31 mfs-master02-rj.btr heartbeat: [47668]: info: local HA resource acquisition completed (standby). Jun 25 20:05:31 mfs-master02-rj.btr heartbeat: [44613]: info: Standby resource acquisition done [foreign]. harc(default)[47694]: 2015/06/25_20:05:31 info: Running /etc/ha.d//rc.d/status status mach_down(default)[47711]: 2015/06/25_20:05:31 info: Taking over resource group IPaddr::10.1.1.26/24/em4 ResourceManager(default)[47738]: 2015/06/25_20:05:31 info: Acquiring resource group: mfs-master01-rj.btr IPaddr::10.1.1.26/24/em4 drbddisk::drbd Filesystem::/dev/drbd0::/usr/local/mfs::ext4 mfsmaster /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.1.1.26)[47766]: 2015/06/25_20:05:31 INFO: Resource is stopped ResourceManager(default)[47738]: 2015/06/25_20:05:31 info: Running /etc/ha.d/resource.d/IPaddr 10.1.1.26/24/em4 start IPaddr(IPaddr_10.1.1.26)[47889]: 2015/06/25_20:05:31 INFO: Adding inet address 10.1.1.26/24 with broadcast address 10.1.1.255 to device em4 IPaddr(IPaddr_10.1.1.26)[47889]: 2015/06/25_20:05:31 INFO: Bringing device em4 up IPaddr(IPaddr_10.1.1.26)[47889]: 2015/06/25_20:05:31 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-10.1.1.26 em4 10.1.1.26 auto not_used not_used /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.1.1.26)[47863]: 2015/06/25_20:05:31 INFO: Success ResourceManager(default)[47738]: 2015/06/25_20:05:31 info: Running /etc/ha.d/resource.d/drbddisk drbd start /usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[48017]: 2015/06/25_20:05:32 INFO: Resource is stopped ResourceManager(default)[47738]: 2015/06/25_20:05:32 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /usr/local/mfs ext4 start Filesystem(Filesystem_/dev/drbd0)[48103]: 2015/06/25_20:05:32 INFO: Running start for /dev/drbd0 on /usr/local/mfs /usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[48095]: 2015/06/25_20:05:32 INFO: Success ResourceManager(default)[47738]: 2015/06/25_20:05:32 info: Running /etc/ha.d/resource.d/mfsmaster start mach_down(default)[47711]: 2015/06/25_20:05:32 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired mach_down(default)[47711]: 2015/06/25_20:05:32 info: mach_down takeover complete for node mfs-master01-rj.btr. Jun 25 20:05:32 mfs-master02-rj.btr heartbeat: [44613]: info: mach_down takeover complete. Jun 25 20:05:42 mfs-master02-rj.btr heartbeat: [44613]: WARN: node mfs-master01-rj.btr: is dead Jun 25 20:05:42 mfs-master02-rj.btr heartbeat: [44613]: info: Dead node mfs-master01-rj.btr gave up resources. Jun 25 20:05:42 mfs-master02-rj.btr ipfail: [44630]: info: Status update: Node mfs-master01-rj.btr now has status dead Jun 25 20:05:42 mfs-master02-rj.btr heartbeat: [44613]: info: Link mfs-master01-rj.btr:em2 dead. Jun 25 20:05:44 mfs-master02-rj.btr ipfail: [44630]: info: NS: We are still alive! Jun 25 20:05:44 mfs-master02-rj.btr ipfail: [44630]: info: Link Status update: Link mfs-master01-rj.btr/em2 now has status dead Jun 25 20:05:45 mfs-master02-rj.btr ipfail: [44630]: info: Asking other side for ping node count. Jun 25 20:05:45 mfs-master02-rj.btr ipfail: [44630]: info: Checking remote count of ping nodes.故障發生後,mfsmaster主備服務器的狀態:
mfsmaster主:
[root@mfs-master01-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}' [root@mfs-master01-rj ~]# ip a|grep em4 5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 inet 10.1.1.27/24 brd 10.1.1.255 scope global em4 [root@mfs-master01-rj ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2015-06-25 17:20:34 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----- ns:11876 nr:4536 dw:14392 dr:16551 al:12 bm:14 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
mfsmaster備:
[root@mfs-master02-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}' mfs 48197 1 0 20:05 ? 00:00:00 /usr/local/mfs/sbin/mfsmaster start [root@mfs-master02-rj ~]# ip a|grep em4 5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 inet 10.1.1.28/24 brd 10.1.1.255 scope global em4 inet 10.1.1.26/24 brd 10.1.1.255 scope global secondary em4 [root@mfs-master02-rj ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2015-06-25 17:20:33 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----- ns:4604 nr:11864 dw:106645652 dr:18533 al:12 bm:6414 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
客戶端的服務中斷情況:
Thu Jun 25 18:52:31 CST 2015 86 Thu Jun 25 18:52:32 CST 2015 87 Thu Jun 25 18:52:33 CST 2015 88 Thu Jun 25 18:52:34 CST 2015 89 ######## Thu Jun 25 18:52:47 CST 2015 90 ######## Thu Jun 25 18:52:48 CST 2015 91 Thu Jun 25 18:52:49 CST 2015 92
恢復故障1:mfsmaster 的heartbeat服務恢復之後
恢復heartbeat服務:
[root@mfs-master01-rj ~]# /etc/init.d/heartbeat start Starting High-Availability services: INFO: Resource is stopped Done.
故障恢復時,mfsmaster主備服務器的日志信息:
mfsmaster主:
Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [653]: info: Pacemaker support: false Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [653]: WARN: Logging daemon is disabled --enabling logging daemon is recommended Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [653]: info: ************************** Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [653]: info: Configuration validated. Starting heartbeat 3.0.4 Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: heartbeat: version 3.0.4 Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: Heartbeat generation: 1435221812 Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: glib: UDP multicast heartbeat started for group 225.0.0.192 port 694 interface em2 (ttl=1 loop=0) Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: glib: ping heartbeat started. Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: glib: ping heartbeat started. Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: glib: ping heartbeat started. Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: G_main_add_TriggerHandler: Added signal manual handler Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: G_main_add_TriggerHandler: Added signal manual handler Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: G_main_add_SignalHandler: Added signal handler for signal 17 Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: Local status now set to: 'up' Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: Status update for node 10.1.1.27: status ping Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: Link 10.1.1.27:10.1.1.27 up. Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: Status update for node 10.1.1.28: status ping Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: Link 10.1.1.28:10.1.1.28 up. Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: Link mfs-master02-rj.btr:em2 up. Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: Status update for node mfs-master02-rj.btr: status active harc(default)[667]: 2015/06/25_20:09:29 info: Running /etc/ha.d//rc.d/status status Jun 25 20:09:30 mfs-master01-rj.btr heartbeat: [654]: info: Link 10.1.1.1:10.1.1.1 up. Jun 25 20:09:30 mfs-master01-rj.btr heartbeat: [654]: info: Status update for node 10.1.1.1: status ping Jun 25 20:09:30 mfs-master01-rj.btr heartbeat: [654]: info: Comm_now_up(): updating status to active Jun 25 20:09:30 mfs-master01-rj.btr heartbeat: [654]: info: Local status now set to: 'active' Jun 25 20:09:30 mfs-master01-rj.btr heartbeat: [654]: info: Starting child client "/usr/lib64/heartbeat/ipfail" (497,496) Jun 25 20:09:30 mfs-master01-rj.btr heartbeat: [685]: info: Starting "/usr/lib64/heartbeat/ipfail" as uid 497 gid 496 (pid 685) Jun 25 20:09:30 mfs-master01-rj.btr heartbeat: [654]: info: remote resource transition completed. Jun 25 20:09:30 mfs-master01-rj.btr heartbeat: [654]: info: remote resource transition completed. Jun 25 20:09:30 mfs-master01-rj.btr heartbeat: [654]: info: Local Resource acquisition completed. (none) Jun 25 20:09:31 mfs-master01-rj.btr heartbeat: [654]: info: mfs-master02-rj.btr wants to go standby [foreign] Jun 25 20:09:33 mfs-master01-rj.btr heartbeat: [654]: info: standby: acquire [foreign] resources from mfs-master02-rj.btr Jun 25 20:09:33 mfs-master01-rj.btr heartbeat: [688]: info: acquire local HA resources (standby). ResourceManager(default)[701]: 2015/06/25_20:09:34 info: Acquiring resource group: mfs-master01-rj.btr IPaddr::10.1.1.26/24/em4 drbddisk::drbd Filesystem::/dev/drbd0::/usr/local/mfs::ext4 mfsmaster /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.1.1.26)[729]: 2015/06/25_20:09:34 INFO: Resource is stopped ResourceManager(default)[701]: 2015/06/25_20:09:34 info: Running /etc/ha.d/resource.d/IPaddr 10.1.1.26/24/em4 start IPaddr(IPaddr_10.1.1.26)[855]: 2015/06/25_20:09:34 INFO: Adding inet address 10.1.1.26/24 with broadcast address 10.1.1.255 to device em4 IPaddr(IPaddr_10.1.1.26)[855]: 2015/06/25_20:09:34 INFO: Bringing device em4 up IPaddr(IPaddr_10.1.1.26)[855]: 2015/06/25_20:09:34 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-10.1.1.26 em4 10.1.1.26 auto not_used not_used /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.1.1.26)[828]: 2015/06/25_20:09:34 INFO: Success ResourceManager(default)[701]: 2015/06/25_20:09:34 info: Running /etc/ha.d/resource.d/drbddisk drbd start /usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[985]: 2015/06/25_20:09:34 INFO: Resource is stopped ResourceManager(default)[701]: 2015/06/25_20:09:34 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /usr/local/mfs ext4 start Filesystem(Filesystem_/dev/drbd0)[1071]: 2015/06/25_20:09:34 INFO: Running start for /dev/drbd0 on /usr/local/mfs /usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[1063]: 2015/06/25_20:09:34 INFO: Success ResourceManager(default)[701]: 2015/06/25_20:09:34 info: Running /etc/ha.d/resource.d/mfsmaster start Jun 25 20:09:34 mfs-master01-rj.btr heartbeat: [688]: info: local HA resource acquisition completed (standby). Jun 25 20:09:34 mfs-master01-rj.btr heartbeat: [654]: info: Standby resource acquisition done [foreign]. Jun 25 20:09:34 mfs-master01-rj.btr heartbeat: [654]: info: Initial resource acquisition complete (auto_failback) Jun 25 20:09:35 mfs-master01-rj.btr heartbeat: [654]: info: remote resource transition completed. Jun 25 20:09:42 mfs-master01-rj.btr ipfail: [685]: info: Ping node count is balanced. Jun 25 20:09:42 mfs-master01-rj.btr ipfail: [685]: info: Giving up foreign resources (auto_failback). Jun 25 20:09:42 mfs-master01-rj.btr ipfail: [685]: info: Delayed giveup in 4 seconds. Jun 25 20:09:46 mfs-master01-rj.btr ipfail: [685]: info: giveup() called (timeout worked) Jun 25 20:09:47 mfs-master01-rj.btr heartbeat: [654]: info: mfs-master01-rj.btr wants to go standby [foreign] Jun 25 20:09:47 mfs-master01-rj.btr heartbeat: [654]: info: standby: mfs-master02-rj.btr can take our foreign resources Jun 25 20:09:47 mfs-master01-rj.btr heartbeat: [1166]: info: give up foreign HA resources (standby). Jun 25 20:09:48 mfs-master01-rj.btr heartbeat: [1166]: info: foreign HA resource release completed (standby). Jun 25 20:09:48 mfs-master01-rj.btr heartbeat: [654]: info: Local standby process completed [foreign]. Jun 25 20:09:48 mfs-master01-rj.btr heartbeat: [654]: WARN: 1 lost packet(s) for [mfs-master02-rj.btr] [2816:2818] Jun 25 20:09:48 mfs-master01-rj.btr heartbeat: [654]: info: remote resource transition completed. Jun 25 20:09:48 mfs-master01-rj.btr heartbeat: [654]: info: No pkts missing from mfs-master02-rj.btr! Jun 25 20:09:48 mfs-master01-rj.btr heartbeat: [654]: info: Other node completed standby takeover of foreign resources.
mfsmaster備:
Jun 25 20:09:29 mfs-master02-rj.btr heartbeat: [44613]: info: Heartbeat restart on node mfs-master01-rj.btr Jun 25 20:09:29 mfs-master02-rj.btr heartbeat: [44613]: info: Link mfs-master01-rj.btr:em2 up. Jun 25 20:09:29 mfs-master02-rj.btr heartbeat: [44613]: info: Status update for node mfs-master01-rj.btr: status init Jun 25 20:09:29 mfs-master02-rj.btr ipfail: [44630]: info: Link Status update: Link mfs-master01-rj.btr/em2 now has status up Jun 25 20:09:29 mfs-master02-rj.btr ipfail: [44630]: info: Status update: Node mfs-master01-rj.btr now has status init Jun 25 20:09:29 mfs-master02-rj.btr heartbeat: [44613]: info: Status update for node mfs-master01-rj.btr: status up Jun 25 20:09:29 mfs-master02-rj.btr ipfail: [44630]: info: Status update: Node mfs-master01-rj.btr now has status up harc(default)[48262]: 2015/06/25_20:09:29 info: Running /etc/ha.d//rc.d/status status harc(default)[48279]: 2015/06/25_20:09:29 info: Running /etc/ha.d//rc.d/status status Jun 25 20:09:30 mfs-master02-rj.btr heartbeat: [44613]: info: Status update for node mfs-master01-rj.btr: status active Jun 25 20:09:30 mfs-master02-rj.btr ipfail: [44630]: info: Status update: Node mfs-master01-rj.btr now has status active harc(default)[48296]: 2015/06/25_20:09:30 info: Running /etc/ha.d//rc.d/status status Jun 25 20:09:30 mfs-master02-rj.btr heartbeat: [44613]: info: remote resource transition completed. Jun 25 20:09:30 mfs-master02-rj.btr heartbeat: [44613]: info: mfs-master02-rj.btr wants to go standby [foreign] Jun 25 20:09:31 mfs-master02-rj.btr heartbeat: [44613]: info: standby: mfs-master01-rj.btr can take our foreign resources Jun 25 20:09:31 mfs-master02-rj.btr heartbeat: [48313]: info: give up foreign HA resources (standby). ResourceManager(default)[48326]: 2015/06/25_20:09:31 info: Releasing resource group: mfs-master01-rj.btr IPaddr::10.1.1.26/24/em4 drbddisk::drbd Filesystem::/dev/drbd0::/usr/local/mfs::ext4 mfsmaster ResourceManager(default)[48326]: 2015/06/25_20:09:31 info: Running /etc/ha.d/resource.d/mfsmaster stop Jun 25 20:09:32 mfs-master02-rj.btr ipfail: [44630]: info: Asking other side for ping node count. ResourceManager(default)[48326]: 2015/06/25_20:09:33 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /usr/local/mfs ext4 stop Filesystem(Filesystem_/dev/drbd0)[48389]: 2015/06/25_20:09:33 INFO: Running stop for /dev/drbd0 on /usr/local/mfs Filesystem(Filesystem_/dev/drbd0)[48389]: 2015/06/25_20:09:33 INFO: Trying to unmount /usr/local/mfs Filesystem(Filesystem_/dev/drbd0)[48389]: 2015/06/25_20:09:33 INFO: unmounted /usr/local/mfs successfully /usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[48381]: 2015/06/25_20:09:33 INFO: Success ResourceManager(default)[48326]: 2015/06/25_20:09:33 info: Running /etc/ha.d/resource.d/drbddisk drbd stop ResourceManager(default)[48326]: 2015/06/25_20:09:33 info: Running /etc/ha.d/resource.d/IPaddr 10.1.1.26/24/em4 stop IPaddr(IPaddr_10.1.1.26)[48545]: 2015/06/25_20:09:33 INFO: IP status = ok, IP_CIP= /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.1.1.26)[48519]: 2015/06/25_20:09:33 INFO: Success Jun 25 20:09:33 mfs-master02-rj.btr heartbeat: [48313]: info: foreign HA resource release completed (standby). Jun 25 20:09:33 mfs-master02-rj.btr heartbeat: [44613]: info: Local standby process completed [foreign]. Jun 25 20:09:34 mfs-master02-rj.btr heartbeat: [44613]: WARN: 1 lost packet(s) for [mfs-master01-rj.btr] [13:15] Jun 25 20:09:34 mfs-master02-rj.btr heartbeat: [44613]: info: remote resource transition completed. Jun 25 20:09:34 mfs-master02-rj.btr heartbeat: [44613]: info: No pkts missing from mfs-master01-rj.btr! Jun 25 20:09:34 mfs-master02-rj.btr heartbeat: [44613]: info: Other node completed standby takeover of foreign resources. Jun 25 20:09:42 mfs-master02-rj.btr ipfail: [44630]: info: No giveup timer to abort. Jun 25 20:09:47 mfs-master02-rj.btr heartbeat: [44613]: info: mfs-master01-rj.btr wants to go standby [foreign] Jun 25 20:09:47 mfs-master02-rj.btr heartbeat: [44613]: info: standby: acquire [foreign] resources from mfs-master01-rj.btr Jun 25 20:09:47 mfs-master02-rj.btr heartbeat: [48610]: info: acquire local HA resources (standby). Jun 25 20:09:47 mfs-master02-rj.btr heartbeat: [48610]: info: local HA resource acquisition completed (standby). Jun 25 20:09:47 mfs-master02-rj.btr heartbeat: [44613]: info: Standby resource acquisition done [foreign]. Jun 25 20:09:48 mfs-master02-rj.btr heartbeat: [44613]: info: remote resource transition completed.
故障恢復後,mfsmaster主備服務器的狀態:
mfsmaster主:
[root@mfs-master01-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}' mfs 1165 1 0 20:09 ? 00:00:00 /usr/local/mfs/sbin/mfsmaster start [root@mfs-master01-rj ~]# ip a|grep em4 5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 inet 10.1.1.27/24 brd 10.1.1.255 scope global em4 inet 10.1.1.26/24 brd 10.1.1.255 scope global secondary em4 [root@mfs-master01-rj ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2015-06-25 17:20:34 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----- ns:12288 nr:5600 dw:15868 dr:18468 al:13 bm:14 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
mfsmaster備:
[root@mfs-master02-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}' [root@mfs-master02-rj ~]# ip a|grep em4 5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 inet 10.1.1.28/24 brd 10.1.1.255 scope global em4 [root@mfs-master02-rj ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2015-06-25 17:20:33 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----- ns:5600 nr:12324 dw:106647108 dr:18541 al:12 bm:6414 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0mfs客戶端數據恢復信息:
Thu Jun 25 18:56:33 CST 2015 314 Thu Jun 25 18:56:34 CST 2015 315 Thu Jun 25 18:56:35 CST 2015 316 Thu Jun 25 18:56:36 CST 2015 317 ####### Thu Jun 25 18:56:49 CST 2015 318 ####### Thu Jun 25 18:56:50 CST 2015 319 Thu Jun 25 18:56:51 CST 2015 320 Thu Jun 25 18:56:52 CST 2015 321