In this talk we’ll introduce an open source project being used to monitor large Power Systems clusters, such as in the IBM collaboration with Oak Ridge and Lawrence Livermore laboratories for the Summit project, a large deployment of custom AC922 Power Systems nodes augmented by GPUs that work in tandem to implement the (currently) largest Supercomputer in the world.
Data is collected out-of-band directly from the firmware layer and then redistributed to various components using an open source component called Crassd. In addition, in-band operating-system and service level metrics, logs and alerts can also be collected and used to enrich the visualization dashboards. Open source components such as the Elastic Stack (Elasticsearch, Logstash, Kibana and select Beats) and Netdata are used for monitoring scenarios appropriate to each tool’s strengths, with other components such as Prometheus and Grafana in the process of being implemented. We’ll briefly discuss our experience to put these components together, and the decisions we had to make in order to automate their deployment and configuration for our goals. Finally, we lay out collaboration possibilities and future directions to enhance our project as a convenient starting point for others in the open source community to easily monitor their own Power Systems environments.
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by Marcelo Perazolo
1. Your logo
here
Monitoring Alerts and Metrics on
Large Power Systems Clusters
Marcelo Perazolo
Cognitive Systems Architect
IBM Systems
mperazolo@us.ibm.com
Nuremberg, Nov 4-7, 2019
http://osmc.de
2. • Introduction
• CORAL & Summit Supercomputer case
• Power Firmware Monitoring – The CRASSD open source project
• Power-Ops open source project – an open source collaboration
• Demo
• Conclusion
Agenda
3. Why Power/OpenPOWER is popular for certain Workloads
• Open Hardware Architecture
• Multiple vendors
• OpenPOWER Foundation
4. • CORAL: Collaboration of Oak Ridge, Argonne and Lawrence Livermore
• Summit is located at the Oak Ridge Laboratory, used for civilian research
• Sister project: Sierra supercomputer at Lawrence Livermore (nuclear weapons research)
• First supercomputer to reach exaOps performance
• ~ interconnected by 185 miles of fiber optic cables
• ~ 5,600 sqft of data center floor space
• ~ 340 tons of hardware and overhead infrastructure
• ~ 13MW power consumption
• 4,608 Power9 AC922 22-core systems
• 27,648 NVIDIA GPUs (6 per node)
• 250 Peta Bytes of Storage
• 200Gbps InfiniBand bandwidth between nodes
• Pumps up to 200 petaFLOPS / 3 exaOps
• Helps researchers with AI / BigData / Analytics, HPC capabilities
Case Study: The Summit Supercomputer
5. Summit: The Most Energy-Efficient Supercomputer
“The world’s smartest supercomputer is sharing data with its cooling
plant, reducing energy consumption and cost”
• “Summit is also the most energy-efficient supercomputer in
its Green500 class—based on gigaflops per watt—outranking systems a 10th as
fast.”
• “We wanted to couple Summit’s mechanical cooling system with its
computational workload to optimize efficiency, which can translate to significant
cost savings for a system of this size.”
• “We’ve developed the infrastructure architecture to scale to millions of events
per second using containerized microservices and popular enterprise open-
source software.”
• “On each Summit node OpenBMC provides real-time data readings from dozens
of sensors totaling more than 460,000 metrics per second that describe power
consumption, temperature, and performance for the entire supercomputer.”
• ”Facility staff can now visualize Summit behavior across all 4,608 nodes with a
temperature heat map, a power consumption map, and power and consumption
data broken down by CPUs and GPUs.”
• “Capturing all possible data in real time allows operators and researchers to
gain powerful insights into job behavior, machine performance, and cooling
response.”
*** Quoted from: https://www.hpcwire.com/off-the-wire/olcf-and-providentia-worldwide-build-intelligence-
system-for-supercomputer-cooling-plant/
6. Summit: High Level Hardware/Architecture View
CRASSD
Firmware Alerts & Telemetry from Power nodes flow to Crassd servers and then to open tools for
visualization such as Grafana, Elastic Stack. Data includes power consumption, frequencies, cooling, etc.
7. CRASSD: Open tooling for Power Firmware Monitoring
CRASSD Facts
▪ CORAL required telemetry data for all
nodes/layers in the Power Cluster
▪ Proposed RAS architecture had flaws:
▪ No method existed to route errors from the BMC
▪ Built CRASSD as an open tool:
–To collect error events and sort using policy tables
– extended the daemon to gather sensor readings to
fulfill ORNL telemetry requirements
–Provides an API that makes it easy to develop plug-
ins using various Open Source monitoring tools
▪ The results have been impressive, and many
more use cases are being developed
▪ CRASSD currently being incorporated into
other Solutions where the same requirements
exist, e.g. Power-Ops stack.
Available at: https://github.com/open-power-ref-design-toolkit/ibm-crassd
8. Motivations
• Replace legacy tools and solutions with modern/open alternatives for Power clusters
• Monitoring for x86 is feature-rich and commoditized with extensive support
• Not so much for Power, e.g.: Elastic on Power still on v5.x; new v7.x now has binaries (x86 only)
• Power users often need to port / build / configure these tools from scratch !!
➔ May influence cost of maintenance, thus decision to user Power at all
• Automate a complete ecosystem of tools that fit all needs of a modern Ops stack
• types of data: logs/alerts vs. telemetry
• analysis: historical vs. real-time
• multi-layer aggregation: firmware, OS, services, etc.
• single system or cluster-wide
➔ Popular stacks use Grafana & Prometheus, ELK, Nagios / Icinga / Zabbix, Netdata, etc.
and are deployed/configured by tools such as Ansible, Terraform, Salt, Puppet, etc.
Proposal: Build & curate a key set of modern open tools for Power systems, engage Power systems
users and open source monitoring/ops community
Value 1: reduce cost of modernizing Operations for existing Power clusters (legacy → open)
Value 2: enable adding Power nodes easily into data centers that already use modern Ops tooling
Value 3: reduced entry cost of Operation for new solutions interested on Power advantages
Beyond Power Firmware Monitoring: Power-Ops project
9. Power-Ops: Open tooling for Power Cluster Operations
Power-Ops Facts
▪ Management stack runs on Power LE architecture
▪ Managed endpoints supported are Power Linux
(could also be easily used on x86):
▪ RedHat family of OSs
▪ Debian/Ubuntu family of OSs
▪ AIX (limited, starting to be supported as endpoints)
▪ Composed of automation components using
Ansible playbooks
▪ 3 Main goals:
▪ Bring-up and pre-configure target platforms
(Bare-Metal, Virtual Machines, Containers*)
▪ Build components not currently available on the
Power platform
▪ Deploy and Configure tooling and start-up dashboards that
work off-the-shelf with Power
▪ Growing community of interested end-users
10. Power-Ops: Bring-Up
The Bring-Up Process
▪ DevOps professional triggers process on
CI/CD platform
▪ CI/CD tools invoke Ansible
▪ Ansible Playbooks interact with IaaS of choice
▪ Nodes are brought up targeted for different roles:
–Builders
–Controllers
–Endpoints
▪ Bring-up includes powering-up (if needed) and
laying down pre-requisites for building or
deployment
–OS
–Packages & Libraries
–Access configuration
–Software configuration
devops CI/CD
builders
controllers
endpoints
This could be one of several choices, e.g.
- Bare-Metal
- Hypervisors or Power
- Power Hyperconverged Infrastructure
- Containers on OpenShift, etc.
(integrations are easy, just drop playbook)
11. Power-Ops: Build
The Build Process
▪ Many components are already available on Power,
but there are exceptions
▪ CRASSD: source on github
▪ Build process generates packages for Debian, RedHat
▪ Go Lang
▪ Go Daemon binary must be recompiled on Power
▪ Elastic Stack
–Up to v5.x code is implemented in Java
–Newer releases include binaries (not yet supported)
–Beats must be re-packaged for Debian, RedHat
▪ All relevant packages are then stored on a
local repository
▪ Doesn’t have to run frequently
–DevOps orgs could automate upstream integration
devops CI/CD
builders
repo
Generates binaries/packages for Power
not yet widely available on public repos
Long-term goal is to
integrate Power packages
onto upstream repositories
libs
12. Power-Ops: Deploy
The Deploy Process
▪ Choose deployment topology
▪ Where each component is deployed to
▪ How they interconnect with each other
▪ Deploy tooling to nodes
▪ Elastic Stack, Netadata, Crassd go to Controller nodes
▪ Beats (FileBeat, MetricBeat) go to Endpoint nodes
▪ Deploy configuration & visualizations/dashboards
▪ Crassd is configured to collect firmware data:
Telemetry data goes to Netdata
Alerting data goes to Logstash
▪ FileBeat collects logs and sends to Logstash
▪ MetricBeat collects telemetry and sends to Elasticsearch
▪ Visualizations/Dashboards are deployed to Netdata and
Kibana
▪ Operators can then access User Interfaces from
Kibana and Netdata
devops CI/CD
repo
CRASSD
Flexible deployment to both
controllers and endpoints
13. Demo Overview
(controller)
wmdepos
P8 bare metal
Marcelo’s Laptop
(endpoint/VM)
pops-ubuntu-ept
crassd
(endpoint/VM)
pops-redhat-ept
(endpoint / P9)
bos-1
github
repos
deploy
f/w
alerts
telemetry
+ logs
(controller)
launchgr01
P9 bare metal
crassd
Dashboards:
- F/W Alerts (Kibana)
- Logs/Infrastructure (Kibana)
- Cluster Metrics (Kibana)
- OS & F/W Metrics (Netdata)
firmware
192.168.10.25
IPMI OBMCtelemetry
deployment
playbooks
(*)
(*) F/W data supported on Power9 systems
(endpoint/VM)
pops-aix-ept
15. Next Steps
Grow the community
1. Engage with traditional Power systems users (e.g. AIX, legacy Power) promoting modernization
2. Engage with Power Linux community, foster benefits of sharing solutions for everybody’s benefit
3. Engage with Open Source communities, promote support of Power out of the box (when such doesn’t yet exist)
4. Use as a catalyst for monitoring of new large Power clusters (taking advantage of lower cost of entry on Power)
Enhance the Operational Stack
• Add Call Home support to CRASSD
• Support more deployment use cases, such as:
• Containers (development under way)
• Broader integration targeting other IaaS/PaaS solutions (e.g. OpenShift clusters)
• Support additional tools, such as:
• Prometheus / Grafana (development planned)
• Zabbix and/or Nagios / Icinga, others… (feel free to suggest / collaborate !!!)
• Support additional hardware, such as:
• Support other/newer BMC Firmware interfaces such as Redfish
• Monitor GPUs, Networking & Storage equipment
• More Power / OpenPOWER system models
• Currency work to support and maintain newer releases of tooling, e.g.
• Migrate to Elastic Stack v7.x (needs automation)
• Add support for more Beats
• More AIX support