Weitere ähnliche Inhalte
Ähnlich wie Raise your Uptime - How to monitor heterogeneous server environments with Linux
Ähnlich wie Raise your Uptime - How to monitor heterogeneous server environments with Linux (20)
Kürzlich hochgeladen (20)
Raise your Uptime - How to monitor heterogeneous server environments with Linux
- 2. Raise your Uptime
How to monitor heterogeneous server
environments with Linux
LPI Forum Warsaw, 28th September 2012
Slide 2/15
- 4. 1) Introduction
who I am ... who
I'm not
Werner Linux user Teamlead Kernel or
Fischer since 2001 R&D at H/W dev.
Slide 4/15
- 5. 2) Why monitoring?
You'll get alerts
in realtime
It tells you the
“SOMETHING”
It'll save you
a lot of time!
Slide 5/15
- 6. 2) Why monitoring?
● So why do monitoring?
● Check Availability
→ send realtime alerts
● Check Performance
→ discover trends
● Collect SLA Data
→ prove uptimes
Slide 6/15
- 7. 2) What can I monitor?
● Hardware ● Services
● Server (IPMI) ● eg. DNS, FTP, HTTP
● Storage Systems ● SSH, SMTP, …
● Environment ● TCP & UDP ports
● Operating Systems ● Applications
● CPU, Memory, Disk ● SAP
● Processes ● all Databases
● Log files ● Directory services
● ... ● ...
Slide 7/15
- 8. 3) Icinga Setup
● To setup your monitoring environment:
● Install Ubuntu 12.04
● sudo apt-get install icinga
● To get nice diagrams:
● sudo apt-get install pnp4nagios
Slide 8/15
- 10. 4) IPMI Introduction
● IPMI = Intelligent Platform Management Interface
● Developed 1998 by Intel, HP, NEC, Dell
● Current IPMI v2.0 since 2004
● Purpose:
Monitoring Logging
(temp, fans,...) (system event log)
Recovery Control Inventory
(power on/off/reset) (FRU information)
Slide 10/15
- 11. 4) IPMI Introduction
access req.
Remote Mmgt. Card
username & (KVM over IP, ...)
ICMB
LAN
Connector
Serial
Connector
password Auxillary
IPMB Connector
ICMB
bridge
Chassis
PCI mgmt. bus IPMB mgmt.
NVS Storage (Satellite
SDR
Controller)
Network
LAN SEL
(LAN)
interface FRU
Controller
Baseboard FRU Temp.
Sensors & Controls
Management sensor
access req. Controller
Fan sensor
Temp. sensor
…
(BMC) Power control
root privileges Reset control
…
Chassis board
Serial BMC
Serial/Modem
Port Serial private mgmt. busses FRU
interface
Sharing Controller
FRU FRU
Redundant Power
M/B board
Temp. s.
Serial System
Controller interface Memory Processor
board board
System bus
Motherboard
Slide 11/15
- 12. 4) IPMI Sensor Classes
● No need to configure threshold values
Discrete sensors Threshold sensors
[root@test ~]# ipmitool sdr get "PS2 Status" [root@test ~]# ipmitool sdr get "Fan 1"
[root@test ~]# ipmitool sdr get "PS2 Status" [root@test ~]# ipmitool sdr get "Fan 1"
Sensor ID : PS2 Status (0x71) Sensor ID : Fan 1 (0x50)
Sensor ID : PS2 Status (0x71) Sensor ID : Fan 1 (0x50)
Entity ID : 10.2 (Power Supply) Entity ID : 29.1 (Fan Device)
Entity ID : 10.2 (Power Supply) Entity ID : 29.1 (Fan Device)
Sensor Type (Discrete): Power Supply Sensor Type (Analog) : Fan
Sensor Type (Discrete): Power Supply Sensor Type (Analog) : Fan
States Asserted : Power Supply Sensor Reading : 5719 (+/ 0) RPM
States Asserted : Power Supply Sensor Reading : 5719 (+/ 0) RPM
[Presence detected] Status : ok
[Presence detected] Status : ok
[Power Supply AC lost] Nominal Reading : 6708.000
[Power Supply AC lost] Nominal Reading : 6708.000
Assertion Events : Power Supply Normal Minimum : 2451.000
Assertion Events : Power Supply Normal Minimum : 2451.000
[Presence detected] Normal Maximum : 10965.000
[Presence detected] Normal Maximum : 10965.000
[Power Supply AC lost] Lower critical : 1720.000
[Power Supply AC lost] Lower critical : 1720.000
Assertions Enabled : Power Supply Lower noncritical : 1978.000
Assertions Enabled : Power Supply Lower noncritical : 1978.000
[Presence detected] Positive Hysteresis : 86.000
[Presence detected] Positive Hysteresis : 86.000
[Failure detected] Negative Hysteresis : 86.000
[Failure detected] Negative Hysteresis : 86.000
[Predictive failure] Minimum sensor range : Unspecified
[Predictive failure] Minimum sensor range : Unspecified
[Power Supply AC lost] Maximum sensor range : Unspecified
[Power Supply AC lost] Maximum sensor range : Unspecified
[...] Event Message Control : Perthreshold
[...] Event Message Control : Perthreshold
Deassertions Enabled : Power Supply Readable Thresholds : lcr lnc
Deassertions Enabled : Power Supply Readable Thresholds : lcr lnc
[...] Settable Thresholds : lcr lnc
[...] Settable Thresholds : lcr lnc
Threshold Read Mask : lcr lnc
Threshold Read Mask : lcr lnc
Assertion Events :
Assertion Events :
Assertions Enabled : lnc lcr
Assertions Enabled : lnc lcr
Deassertions Enabled : lnc lcr
Deassertions Enabled : lnc lcr
Slide 12/15
- 13. 4) IPMI Plugin
● Developed by
Thomas Krenn
● Open Source
(GPL v3)
● www.thomas-
krenn.com/en/oss
Slide 13/15
- 14. 4) IPMI Service Check
● IPMI service check shows hardware issues:
Slide 14/15
- 15. 5) Conclusions
Monitor hardware
with Icinga & IPMI
Problems?
They will tell you!
It'll save you
time & money
Slide 15/15