1. STAT: A Debugging Tool
For Extreme Scale
Martin Schulz
Center for Applied Scientific Computing
Lawrence Livermore National Laboratory
ASC STAT Team: Greg Lee, Dong Ahn (LLNL), Dane Gardner (LANL)
Developed at LLNL, University of Wisconsin &
University of New Mexico
Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551
This work performed under the auspices of the U.S. Department of Energy by
Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344
LLNL-PRES-426152
2. STAT: Debugging Support at Scale
The debugging challenge at scale
• Traditional debuggers break down at scale
• Data and control for too many tasks
• Sequential paradigm
How can STAT help?
• Identify equivalence classes
• Pre-analysis for subset debugging
Typical use case
• Application hang (life or dead-lock)
• Answer the question: What is my code doing now?
Lawrence Livermore National Laboratory
4. Gathering Stack Traces
STAT gathers stack traces from
• Multiple processes
• Multiple samples per process
3D 2D Trace/Space Call Graph Prefix Tree
Trace/Space/Time Call Graph Prefix Tree
MPI MPI MPI
Lawrence Livermore National Laboratory
7. Availability
Platform Ver. Usage Documentation POC
LLNL/TLCC 0.9.4 STATGUI https://computing.llnl.gov/code/STAT/ Greg Lee
OCF STAT lee218@llnl.gov
LLNL/TLCC 0.9.4 STATGUI https://computing.llnl.gov/code/STAT/ Greg Lee
SCF STAT lee218@llnl.gov
LLNL/uBGL 0.9.0 STAT https://computing.llnl.gov/code/STAT/ Greg Lee
beta lee218@llnl.gov
LLNL/Dawn 0.9.4 STATGUI https://computing.llnl.gov/code/STAT/ Greg Lee
beta STAT lee218@llnl.gov
SNL/Glory 0.9.2 see below https://computing.llnl.gov/code/STAT/ Mahesh Rajan
mrajan@sandia.gov
LANL/Yellow 0.9.1b Mod: hpc-tools man stat consult@lanl.gov
Turing Mod: stat
LANL/Turquoise 0.9.2 Mod: hpc-tools man stat consult@lanl.gov
Lobo Mod: stat
Usage for SNL/Glory: Note: Red Storm has a poor-man STAT-like
module switch mpi mpi/mvapich-1.1_intel-11.1-f064-c064 utility called fast_where.
module load /home/jgalaro/privatemodules/openss-mvapich Try "man fast_where” for usage instructions.
Lawrence Livermore National Laboratory
8. Usage Instructions
Option 1: Graphical User Interface
• Launch GUI: STATGUI
• Attach, create stacktraces & views through GUI
Option 2: Command line
• STAT <MPI launcher pid>
− -t: number of traces
− -T: time between traces
• Reports output file to stdout
• STATview <output file>
Additional information
• man STAT / STAT –h
• acroread /usr/local/tools/stat/doc/*.pdf
Lawrence Livermore National Laboratory
9. Advanced Topics
Scalable Implementation FE
• Tree-based overlay networks
− Data aggregation on the fly CP CP
− Tree depth configurable
CP CP
• Parameters to STAT
• Useful for 10,000+ tasks BE BE
… BE BE
Temporal Analysis
• Finer grain analysis of process location
• Disambiguation of iteration instances
• Employs static analysis to determine loop variables
Lawrence Livermore National Laboratory
10. Reference & Demo Session
Usage documentation
• https://computing.llnl.gov/code/STAT/
Man page
• man STAT or man STATGUI
• STAT -h
Background information
• http://www.paradyn.org/STAT/STAT.html
Demo Session / Track 3
Lawrence Livermore National Laboratory