歡迎來到Linux教程網
Linux教程網
Linux教程網
Linux教程網
Linux教程網 >> Unix知識 >> 關於Unix >> HP-UX下EMS的使用說明

HP-UX下EMS的使用說明

日期:2017/2/28 11:24:18   编辑:關於Unix


1. EMS介紹
EMS(Event Monitoring Service)是一項HP-UX的集成服務,它能夠對主機硬件進行實時監控,並可以通過指定方式將監控信息報告給系統維護人員,這有助於運維人員及時、准確的發現主機故障,並輔助判定故障所在,提高主機的可用時間。
EMS可以通過MRM(Monitoring Request Manager)進行管理,通過MRM可以對EMS的監控范圍、事情報警觸發條件、事件信息報警方式進行設置。
MRM調用方法如下:
(1)用root身份登陸主機系統
(2)運行/etc/opt/resmon/lbin/monconfig
(3)通過(MRM)Monitoring Request Manager Main Menu進行配置
在MRM菜單中,可以查看、檢查、修改、刪除、啟用、禁用檢測器。
如下:
============================================================================
=================== Event Monitoring Service ===================
=================== Monitoring Request Manager ===================
============================================================================

EVENT MONITORING IS CURRENTLY ENABLED.
EMS Version : A.04.10
STM Version : C.46.15

============================================================================
============== Monitoring Request Manager Main Menu ==============
============================================================================

Note: Monitoring requests let you specify the events for monitors
to report and the notification methods to use.

Select:
(S)how monitoring requests configured via monconfig
(C)heck detailed monitoring status
(L)ist descriptions of available monitors
(A)dd a monitoring request
(D)elete a monitoring request
(M)odify an existing monitoring request
(E)nable Monitoring
(K)ill (disable) monitoring
(H)elp
(Q)uit
Enter selection: [s]

下面以定制一個monitor為例子,說明MRM的配置方法:
(1)以root身份登陸系統
(2)運行/etc/opt/resmon/lbin/monconfig進入MRM主菜單(就是上面看到的)
(3)選擇a並回車,對應的功能選項是(A)dd a monitoring request
(4)此時將顯示出可供監控的硬件模塊,一般全選,鍵入a並回車就行
(5)選擇基准事件級別,建議選擇2)MINOR WARNING
(6)選擇報警觸發的條件,選擇4)>=
(7)選擇監控事件信息報警的方式,選擇6)EMAIL
(8)選擇事件報警郵件的接收人,這裡可根據需要輸入相應的用戶名,例如:monitor
(9)對此次monitor進行注釋說明,選擇(A)dd
(10)Client Configuration File,這裡選擇(C)lear
(11)保存上述配置信息,此後將退回到主菜單
(12)在主菜單下,選擇(S)how monitoring requests configured via monconfig查看新建的monitor是否存在
(13)退回到MRM主菜單,選擇(C)heck detailed monitoring status,可查看所有有效的監控狀態,因主機配置而異,對於主機中不存在的硬件,EMS將會忽略,即使在上述第“4”步中設置為監控所有硬件
(14)(E)nable Monitoring,開啟EMS服務功能
說明:通過上述步驟,我們新建的monitor是針對所有硬件模塊(step 4)實時監控,但僅對嚴重程度大於等於Minor Warning(step 5 & step 6)的事件,通過email(step 6)的方式報告給用戶monitor(step 8)。

2. 如何從event mail獲取信息
EMS產生的時間警告郵件可通過內部網絡接收,無需另外配置域名服務器。EMS產生的郵件,根據事先定義發給目標用戶monitor,可通過PC上的郵件客戶端軟件(outlook等)進行接收。
以outlook為例子,為了接收event mail,郵件客戶端軟件需要新建郵件賬號,用戶名為在MRM中指定的HP-UX用戶名,口令為HP-UX中對應的口令,pop3/smtp服務器為被檢測主機的IP地址,建議outlook設定自動收取郵件的間隔時間,以便能及時收到來自EMS的事件信息。
說明:
(1)因為HP-UX自身的安全機制,root用戶的e-mail無法通過客戶端軟件收取,因此在MRM中指定事件郵件接收用戶時指定為其他普通用戶,例如此次就新建了monitor這個用戶
(2)網絡中應該開放pop3/pop的110/109兩個端口
(3)供event mail使用的用戶是HP-UX中的用戶,也能夠登陸主機,建議定期修改HP-UX中該用戶的密碼,對應的,也要修改outlook的密碼

下面舉例說明EMS生成的事件報警郵件的內容,下述故障來自人為帶電拔出一塊硬盤導致的系統異常(中文部分為注釋)

>------------ Event Monitoring Service Event Notification ------------<
Notification Time: Wed Jun 8 23:26:18 2005 事件觸發時間
hpux1 sent Event Monitor notification information: 可反映主機名
/storage/events/disks/default/0_0_1_1.15.0 is >= 2. 硬件模塊、觸發器
Its current value is CRITICAL(5). 該事件嚴重程度
User Comments:
Just a test:)
Event data from monitor:
Event Time..........: Wed Jun 8 23:26:16 2005
Severity............: CRITICAL
Monitor.............: disk_em
Event #.............: 101
System..............: hpux1
Summary: 事件概述
Disk at hardware path 0/0/1/1.15.0 : Device removed from monitoring

Description of Error: 故障描述
The device has been removed from the list of devices being monitored by
this monitor.
Probable Cause / Recommended Action: 可能原因/推薦處理辦法
The device was removed from the system, has stopped responding to the
system or it has been replaced with a device that is not supported by this
monitor.
Run ioscan to determine the state and type of the device.
Check the /var/stm/data/os_decode_xref for the information indicating
which devices are supported by this monitor.
Check other monitors to determine if they are now monitoring the
device by running /etc/opt/resmon/lbin/monconfig and using the "Check
monitoring" command.
Additional Event Data:
System IP Address...: 15.85.114.14 主機IP
Event Id............: 0x42a70e1800000000
Monitor Version.....: B.01.01
Event Class.........: I/O 事件類別
Client Configuration File...........:
/var/stm/config/tools/monitor/default_disk_em.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 1
Associated OS error log entry id(s):
None
Additional System Data:
System Model Number.............: 9000/800/A500-44 主機model號
OS Version......................: B.11.11 操作系統版本
STM Version.....................: A.45.00
EMS Version.....................: A.04.00
Latest information on this event:

v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v

Component Data:
Physical Device Path...: 0/0/1/1.15.0 故障設備物理路徑
Device Class...........: Disk 設備類型
Inquiry Vendor ID......: SEAGATE 設備生產商
Inquiry Product ID.....: ST34572WC 產品號
Firmware Version.......: HP03 固件版本
Serial Number..........: JKJ118650QPJCX 故障備件序列號
>---------- End Event Monitoring Service Event Notification ----------<

Enven mail中顯示了故障發生的事件、主機名字、事件嚴重等級、故障盤的物理路徑、硬盤的product ID、建議的檢查步驟、主機型號、操作系統版本等信息,有助於發現並排查主機硬件故障。
但因主機硬件故障可能並非單一部件的簡單故障,故event mail中Probable Cause / Recommended Action 描述有可能更最終發現確認的故障鑒定不一致,這是正常情形。往往對故障分析,需輔助更多的工具和手段進行排查。
Copyright © Linux教程網 All Rights Reserved