2. Agenda
• TotalView on Blue Gene
– A little history
– Current status
• Recent TotalView improvements
– ReplayEngine (reverse debugging)
– Remote Display
– TotalView Script (batch debugging)
• Future work
– BG/*
– Heterogeneous systems
– Many core, transactional memory, speculative execution
– Petascale debugging
TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice
www.totalviewtech.com
2
3. Supported Blue Gene
Architectures and Compilers
• Blue Gene/L and Blue Gene/P
• Languages / Compilers
– C/C++, Fortran, Assembly
– GNU Compilers
– IBM Compilers
– IBM OpenMP (on BG/P)
• Parallel Environments
– IBM MPI
– IBM OpenMP (on BG/P)
– Pthreads (BG/P)
• Runtime linking/loading (BG/P)
– Shared libraries
– Dynamically loaded shared libraries
TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice
www.totalviewtech.com
3
4. Blue Gene Architecture
• TotalView client (GUI/CLI)
runs on the Front End node
• Client communicates with
the TotalView debugger
servers running on the I/O
nodes via a socket
• The debugger servers
communicate with the
CIOD to control processes
and threads running on the
Compute nodes
• Fanout ratios (CNs/server)
– BG/L: 3264, 2 cores/CN,
128 threads/server
– BG/P:128256, 4 cores/CN,
1024 threads/server
– Ratio increasing (8K thr/svr?)
– Parallelize server operation
TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice
www.totalviewtech.com
4
5. TotalView Blue Gene/L Support
• TotalView involvement since 2003
• Support for Blue Gene/L since 2005
• Debugging interfaces developed via close
collaboration with IBM
• Used on DOE/NNSA/LLNL's Blue Gene/L system
containing 212 K cores
– Heap memory debugging support added
– Blue Gene/L scaling and performance tuning project
– TotalView has debugged jobs as large as 8,192 processes
(LLNL)
• Work on Blue Gene/L facilitated Blue Gene/P
support
TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice
www.totalviewtech.com
5
6. TotalView Blue Gene/P Support
• Blue Gene/P supported since Q4 2007
• Continued close collaboration with IBM to
develop multithreaded debugging interfaces
• Support for shared libraries and dynamically
loaded libraries
• Scalability improvements
• TotalView has debugged jobs as large as 32K
(Jülich)
TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice
www.totalviewtech.com
6
7. TotalView Blue Gene/P Sites
• Currently running at over 30 sites in Germany,
France, UK, and US, including
– Argonne
– Boston University
– Daresbury
– IDRIS
– Jülich
– LLNL
– Max Planck
– ORNL
– Princeton University
– Rensselaer Polytechnic Institute
• Jülich workshop, March 08
• Argonne workshop, May 08
TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice
www.totalviewtech.com
7
8. Recent TotalView Improvements
on Blue Gene and Linux
• Remote Display
– Run a remote version of the TotalView GUI…
– …display it locally, with fast, interactive performance
– Easy, fast, secure
• tvscript
– Simplifies debugging batch jobs
– Event/action paradigm
– Configurable
• ReplayEngine
– Step execution back in time
– Uses reverse debugging technology
– Linux x86 and x8664 (currently only)
TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice
www.totalviewtech.com
8
9. Remote Display
• Presents a window on your
machine that will display
TotalView executing on a
remote system
• Two components:
– Client, runs on the local
system, available for
Linux x86, x8664
Windows XP, Vista
– Server, which runs on any
system supported by
TotalView, invisibly
managing the connections
between the host and client
• The Client also provides for
submission of jobs to
batch queuing systems
PBS Pro and LoadLeveler
TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice
www.totalviewtech.com
9
10. Batch Scripting
• Designed for debugging in a batch environment
• tvscript lets you define the events to act on, the actions to
take when an event occurs
• Typical events
– Action point (e.g., breakpoint)
– Memory error (e.g., malloc returns 0, guard block corruption)
– Errors (e.g., SEGV, FPE)
• Typical actions
– Display a backtrace
– List memory leaks
– Print variables and arrays
• Configurable
– Supports external script files
– Allows generation of even more complex actions and events
TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice
www.totalviewtech.com
10
11. Replay Engine
• Intuitive user interface, integrated with TotalView
Step forward over functions Step backward over functions
Step forward into functions Step backward into functions
Advance forward out of current Advance backward out of current
Function, after the call Function, to before the call
Advance forward to selected line Advance backward to selected line
Advance forward to “live” session
TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice
www.totalviewtech.com
11
12. Possible Future Blue Gene Work
• BG/* support
– Support future generations of Blue Gene
• Fast conditional breakpoints/watchpoints
– Expressions compiled/patched into target, excute in parallel,
about 10usecs/expression
• Asynchronous thread control
– Thread barrier breakpoint, thread single stepping
• User programmable visual data
– Allows user define complex data access function
• Debugging optimized code
• Postmortem debugging
• Fast DLL debugging interface
• LLNL collaboration for scalable subset attach
– Integrates with lightweight tools such as STAT
TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice
www.totalviewtech.com
12
13. Possible Other Future Work
• Scalability/performance
– Continue scalability and performance improvements
– Treebased infrastructure for logarithmic scaling
– Petascale debugging
– Hundreds of thousands of threads
• Heterogeneous systems
– IBM Roadrunner (x8664/Cell)
– GPUs
• Emerging technologies
– Many core
– Transactional memory
– Speculative execution
TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice
www.totalviewtech.com
<number>
14. Questions?
More Information
• Blue Gene Technical Development Interest Group
Contact chris.gottbrath@totalviewtech.com
–
• Technical support
support@totalviewtech.com
–
• BG LLNL case study
www.totalviewtech.com/pdf/case_study_scientific_computing.pdf
–
• Customer training or webinars
contacttraininggroup@totalviewtech.com
–
• Web site
– www.totalviewtech.com
TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice
www.totalviewtech.com
<number>