您现在的位置： Linux教程網 >> UnixLinux > >> Linux基礎 >> 關於Linux

Linux-HA開源軟件Heartbeat的測試

如何才能得知HA集群是否正常工作，模擬環境測試是個不錯的方法，在把Heartbeat高可用性集群放到生產環境中之前，需要做如下五個步驟的測試，從而確定HA是否正常工作。

一、正常關閉和重啟主節點的heartbeat

首先在主節點node1上執行“service heartbeat stop”正常關閉主節點的Heartbeat進程，此時通過ifconfig命令查看主節點網卡信息，正常情況下，應該可以看到主節點已經釋放了集群的服務IP地址，同時也釋放了掛載的共享磁盤分區，然後查看備份節點，現在備份節點已經接管了集群的服務IP，同時也自動掛載上了共享的磁盤分區。

在這個過程中，使用ping命令對集群服務IP進行測試，可以看到，集群IP一致處於可通狀態，並沒有任何延時和阻塞現象，也就是說在正常關閉主節點的情況下，主備節點的切換是無縫的，HA對外提供的服務也可以不間斷運行。

接著，將主節點heartbeat正常啟動，heartbeat啟動後，備份節點將自動釋放集群服務IP，同時卸載共享磁盤分區，而主節點將再次接管集群服務IP和掛載共享磁盤分區，其實備份節點釋放資源與主節點綁定資源是同步進行的。因而，這個過程也是一個無縫切換。

二、在主節點上拔去網線

拔去主節點連接公共網絡的網線後，heartbeat插件ipfail通過ping測試可以立刻檢測到網絡連接失敗，接著自動釋放資源，而就在此時，備用節點的ipfail插件也會檢測到主節點出現網絡故障，在等待主節點釋放資源完畢後，備用節點馬上接管了集群資源，從而保證了網絡服務不間斷持續運行。

同理，當主節點網絡恢復正常時，由於設置了“auto_failback on”選項，集群資源將自動從備用節點切會主節點。

在主節點拔去網線後日志信息如下，注意日志中的斜體部分：

Nov 26 09:04:09 node1 heartbeat: [3689]: info: Link node2:eth0 dead.

Nov 26 09:04:09 node1 heartbeat: [3689]: info: Link 192.168.60.1:192.168.60.1 dead.

Nov 26 09:04:09 node1 ipfail: [3712]: info: Status update: Node 192.168.60.1 now has status dead

Nov 26 09:04:09 node1 harc[4279]: info: Running /etc/ha.d/rc.d/status status

Nov 26 09:04:10 node1 ipfail: [3712]: info: NS: We are dead. :<

Nov 26 09:04:10 node1 ipfail: [3712]: info: Link Status update: Link node2/eth0 now has status dead

…… 中間部分省略 ……

Nov 26 09:04:20 node1 heartbeat: [3689]: info: node1 wants to go standby [all]

Nov 26 09:04:20 node1 heartbeat: [3689]: info: standby: node2 can take our all resources

Nov 26 09:04:20 node1 heartbeat: [4295]: info: give up all HA resources (standby).

Nov 26 09:04:21 node1 ResourceManager[4305]: info: Releasing resource group: node1 192.168.60.200/24/eth0 Filesystem::/dev/sdb5::/webdata::ext3

Nov 26 09:04:21 node1 ResourceManager[4305]: info: Running /etc/ha.d/resource.d/Filesystem /dev/sdb5 /webdata ext3 stop

Nov 26 09:04:21 node1 Filesystem[4343]: INFO: Running stop for /dev/sdb5 on /webdata

Nov 26 09:04:21 node1 Filesystem[4343]: INFO: Trying to unmount /webdata

Nov 26 09:04:21 node1 Filesystem[4343]: INFO: unmounted /webdata successfully

Nov 26 09:04:21 node1 Filesystem[4340]: INFO: Success

Nov 26 09:04:22 node1 ResourceManager[4305]: info: Running /etc/ha.d/resource.d/IPaddr 192.168.60.200/24/eth0 stop

Nov 26 09:04:22 node1 IPaddr[4428]: INFO: /sbin/ifconfig eth0:0 192.168.60.200 down

Nov 26 09:04:22 node1 avahi-daemon[1854]: Withdrawing address record for 192.168.60.200 on eth0.

Nov 26 09:04:22 node1 IPaddr[4407]: INFO: Success

備用節點在接管主節點資源時的日志信息如下：

Nov 26 09:02:58 node2 heartbeat: [2110]: info: Link node1:eth0 dead.

Nov 26 09:02:58 node2 ipfail: [2134]: info: Link Status update: Link node1/eth0 now has status dead

Nov 26 09:02:59 node2 ipfail: [2134]: info: Asking other side for ping node count.

Nov 26 09:02:59 node2 ipfail: [2134]: info: Checking remote count of ping nodes.

Nov 26 09:03:02 node2 ipfail: [2134]: info: Telling other node that we have more visible ping nodes.

Nov 26 09:03:09 node2 heartbeat: [2110]: info: node1 wants to go standby [all]

Nov 26 09:03:10 node2 heartbeat: [2110]: info: standby: acquire [all] resources from node1

Nov 26 09:03:10 node2 heartbeat: [2281]: info: acquire all HA resources (standby).

Nov 26 09:03:10 node2 ResourceManager[2291]: info: Acquiring resource group: node1 192.168.60.200/24/eth0 Filesystem::/dev/sdb5::/webdata::ext3

Nov 26 09:03:10 node2 IPaddr[2315]: INFO: Resource is stopped

Nov 26 09:03:11 node2 ResourceManager[2291]: info: Running /etc/ha.d/resource.d/IPaddr 192.168.60.200/24/eth0 start

Nov 26 09:03:11 node2 IPaddr[2393]: INFO: Using calculated netmask for 192.168.60.200: 255.255.255.0

Nov 26 09:03:11 node2 IPaddr[2393]: DEBUG: Using calculated broadcast for 192.168.60.200: 192.168.60.255

Nov 26 09:03:11 node2 IPaddr[2393]: INFO: eval /sbin/ifconfig eth0:0 192.168.60.200 netmask 255.255.255.0 broadcast 192.168.60.255

Nov 26 09:03:12 node2 avahi-daemon[1844]: Registering new address record for 192.168.60.200 on eth0.

Nov 26 09:03:12 node2 IPaddr[2393]: DEBUG: Sending Gratuitous Arp for 192.168.60.200 on eth0:0 [eth0]

Nov 26 09:03:12 node2 IPaddr[2372]: INFO: Success

Nov 26 09:03:12 node2 Filesystem[2482]: INFO: Resource is stopped

Nov 26 09:03:12 node2 ResourceManager[2291]: info: Running /etc/ha.d/resource.d/Filesystem /dev/sdb5 /webdata ext3 start

Nov 26 09:03:13 node2 Filesystem[2523]: INFO: Running start for /dev/sdb5 on /webdata

Nov 26 09:03:13 node2 kernel: kjournald starting. Commit interval 5 seconds

Nov 26 09:03:13 node2 kernel: EXT3 FS on sdb5, internal journal

Nov 26 09:03:13 node2 kernel: EXT3-fs: mounted filesystem with ordered data mode.

Nov 26 09:03:13 node2 Filesystem[2520]: INFO: Success

三、在主節點上拔去電源線

在主節點拔去電源後，備用節點的heartbeat進程會立刻收到主節點已經shutdown的消息，如果在集群上配置了Stonith設備，那麼備用節點將會把電源關閉或者復位到主節點，當Stonith設備完成所有操作時，備份節點才拿到接管主節點資源的所有權，從而接管主節點的資源。

在主節點拔去電源後，備份節點有類似如下的日志輸出：

Nov 26 09:24:54 node2 heartbeat: [2110]: info: Received shutdown notice from 'node1'.

Nov 26 09:24:54 node2 heartbeat: [2110]: info: Resources being acquired from node1.

Nov 26 09:24:54 node2 heartbeat: [2712]: info: acquire local HA resources (standby).

Nov 26 09:24:55 node2 ResourceManager[2762]: info: Running /etc/ha.d/resource.d/IPaddr 192.168.60.200/24/eth0 start

Nov 26 09:24:57 node2 ResourceManager[2762]: info: Running /etc/ha.d/resource.d/Filesystem /dev/sdb5 /webdata ext3 start

四、切斷主節點的所有網絡連接

在主節點上斷開心跳線後，主備節點都會在日志中輸出“eth1 dead”的信息，但是不會引起節點間的資源切換，如果再次拔掉主節點連接公共網絡的網線，那麼就會發生主備節點資源切換，資源從主節點轉移到備用節點，此時，連上主節點的心跳線，觀察系統日志，可以看到，備用節點的heartbeat進程將會重新啟動，進而再次控制集群資源，最後，連上主節點的對外網線，集群資源再次從備用節點轉移到主節點，這就是整個的切換過程。

五、在主節點上非正常關閉heartbeat守護進程

在主節點上通過“killall -9 heartbeat”命令關閉heartbeat進程，由於是非法關閉heartbeat進程，因此heartbeat所控制的資源並沒有釋放，備份節點在很短一段時間沒有收到主節點的響應後，就會認為主節點出現故障，進而接管主節點資源，在這種情況下，就出現了資源爭用情況，兩個節點都占用一個資源，造成數據沖突。針對這個情況，可以通過linux提供的內核監控模塊watchdog來解決這個問題，將watchdog集成到heartbeat中，如果heartbeat異常終止，或者系統出現故障，watchdog都會自動重啟系統，從而釋放集群資源，避免了數據沖突的發生。

本內容我們沒有配置watchdog到集群中，如果配置了watchdog，在執行“killall -9 heartbeat”時，會在/var/log/messages中看到如下信息：

Softdog: WDT device closed unexpectedly. WDT will not stop!

這個錯誤告訴我們，系統出現問題，將重新啟動。

本文出自 “技術成就夢想” 博客，請務必保留此出處http://ixdba.blog.51cto.com/2895551/548627

上一篇文章： Linux進程管理機制概述
下一篇文章： Linux-HA開源軟件Heartbeat的配置

關於Linux

陸首群：開源軟件的是與非

Linux和開源軟件真的能用嗎？

Linux、開源軟件發展史

Linux-HA開源軟件Heartbeat的配置

Linux-HA開源軟件Heartbeat的安裝

Linux-HA開源軟件Heartbeat的概念