SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Downloaden Sie, um offline zu lesen
Scalable Debugging with TotalView
          on Blue Gene 

         John DelSignore, CTO
        TotalView Technologies
Agenda

    •   TotalView on Blue Gene
        –   A little history
        –   Current status
    •   Recent TotalView improvements
        –   ReplayEngine (reverse debugging)
        –   Remote Display
        –   TotalView Script (batch debugging)
    •   Future work
        –   BG/*
        –   Heterogeneous systems
        –   Many core, transactional memory, speculative execution
        –   Peta­scale debugging


        TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice
                                           www.totalviewtech.com
2
Supported Blue Gene 
    Architectures and Compilers

    •   Blue Gene/L and Blue Gene/P
    •   Languages / Compilers
        –   C/C++, Fortran, Assembly
        –   GNU Compilers
        –   IBM Compilers
        –   IBM OpenMP (on BG/P)
    •   Parallel Environments
        –   IBM MPI 
        –   IBM OpenMP (on BG/P)
        –   Pthreads (BG/P)
    •   Runtime linking/loading (BG/P)
        –   Shared libraries
        –   Dynamically loaded shared libraries

        TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice
                                           www.totalviewtech.com
3
Blue Gene Architecture

    •   TotalView client (GUI/CLI) 
        runs on the Front End node
    •   Client communicates with 
        the TotalView debugger 
        servers running on the I/O 
        nodes via a socket
    •   The debugger servers 
        communicate with the 
        CIOD to control processes 
        and threads running on the 
        Compute nodes
    •   Fan­out ratios (CNs/server)
         –   BG/L: 32­64, 2 cores/CN, 
             128 threads/server
         –   BG/P:128­256, 4 cores/CN, 
             1024 threads/server
         –   Ratio increasing (8K thr/svr?)
         –   Parallelize server operation

         TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice
                                            www.totalviewtech.com
4
TotalView Blue Gene/L Support

    •   TotalView involvement since 2003
    •   Support for Blue Gene/L since 2005
    •   Debugging interfaces developed via close 
        collaboration with IBM
    •   Used on DOE/NNSA/LLNL's  Blue Gene/L system 
        containing 212 K cores
        –   Heap memory debugging support added
        –   Blue Gene/L scaling and performance tuning project
        –   TotalView has debugged jobs as large as 8,192 processes 
            (LLNL)
    •   Work on Blue Gene/L facilitated Blue Gene/P 
        support

        TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice
                                           www.totalviewtech.com
5
TotalView Blue Gene/P Support

    •   Blue Gene/P supported since Q4 2007
    •   Continued close collaboration with IBM to 
        develop multi­threaded debugging interfaces
    •   Support for shared libraries and dynamically 
        loaded libraries
    •   Scalability improvements
    •   TotalView has debugged jobs as large as 32K 
        (Jülich)




        TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice
                                           www.totalviewtech.com
6
TotalView Blue Gene/P Sites

    •   Currently running at over  30 sites in Germany, 
        France, UK, and US, including
        –   Argonne
        –   Boston University
        –   Daresbury
        –   IDRIS
        –   Jülich
        –   LLNL
        –   Max Planck
        –   ORNL
        –   Princeton University
        –   Rensselaer Polytechnic Institute
    •   Jülich workshop, March 08
    •   Argonne workshop, May 08

        TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice
                                           www.totalviewtech.com
7
Recent TotalView Improvements
    on Blue Gene and Linux

    •   Remote Display
        –   Run a remote version of the TotalView GUI…
        –   …display it locally, with fast, interactive performance
        –   Easy, fast, secure
    •   tvscript
        –   Simplifies debugging batch jobs
        –   Event/action paradigm
        –   Configurable
    •   ReplayEngine
        –   Step execution back in time
        –   Uses reverse debugging technology
        –   Linux x86 and x86­64 (currently only)


        TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice
                                           www.totalviewtech.com
8
Remote Display

                                                      •    Presents a window on your 
                                                           machine that will display 
                                                           TotalView executing on a 
                                                           remote system
                                                      •    Two components: 
                                                             –   Client, runs on the local 
                                                                 system, available for 
                                                                      Linux x86, x86­64
                                                                      Windows XP, Vista
                                                             –   Server, which runs on any 
                                                                 system supported by 
                                                                 TotalView, invisibly 
                                                                 managing the connections 
                                                                 between the host and client
                                                      •    The Client also provides for 
                                                           submission of jobs to 
                                                           batch queuing systems 
                                                           PBS Pro and LoadLeveler

      TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice
                                         www.totalviewtech.com
9
Batch Scripting

     •   Designed for debugging in a batch environment
     •   tvscript lets you define the events to act on, the actions to 
         take when an event occurs
     •   Typical events
          –   Action point (e.g., breakpoint)
          –   Memory error (e.g., malloc returns 0, guard block corruption)
          –   Errors (e.g., SEGV, FPE)
     •   Typical actions
          –   Display a backtrace
          –   List memory leaks
          –   Print variables and arrays
     •   Configurable
          –   Supports external script files
          –   Allows generation of even more complex actions and events


          TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice
                                             www.totalviewtech.com
10
Replay Engine


     •   Intuitive user interface, integrated with TotalView


     Step forward over functions                                      Step backward over functions


     Step forward into functions                                       Step backward into functions


     Advance forward out of current                                   Advance backward out of  current
     Function, after the call                                         Function, to before the call


     Advance forward to selected line                                  Advance backward to selected line


                                           Advance forward to “live” session

          TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice
                                             www.totalviewtech.com
11
Possible Future Blue Gene Work

     •   BG/* support
         –   Support future generations of Blue Gene
     •   Fast conditional breakpoints/watchpoints
         –   Expressions compiled/patched into target, excute in parallel, 
             about 10usecs/expression
     •   Asynchronous thread control
         –   Thread barrier breakpoint, thread single stepping
     •   User programmable visual data
         –   Allows user define complex data access function
     •   Debugging optimized code
     •   Post­mortem debugging
     •   Fast DLL debugging interface
     •   LLNL collaboration for scalable subset attach
         –   Integrates with lightweight tools such as STAT

         TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice
                                            www.totalviewtech.com
12
Possible Other Future Work

           •   Scalability/performance
               –   Continue scalability and performance improvements
               –   Tree­based infrastructure for logarithmic scaling
               –   Peta­scale debugging
               –   Hundreds of thousands of threads
           •   Heterogeneous systems
               –   IBM Roadrunner (x86­64/Cell)
               –   GPUs
           •   Emerging technologies
               –   Many core
               –   Transactional memory
               –   Speculative execution


               TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice
                                                  www.totalviewtech.com
<number>
Questions?  
            More Information


   •       Blue Gene Technical Development Interest Group
               Contact chris.gottbrath@totalviewtech.com
           –

   •       Technical support 
               support@totalviewtech.com
           –

   •       BG LLNL case study
               www.totalviewtech.com/pdf/case_study_scientific_computing.pdf 
           –

   •       Customer training or webinars
               contacttraininggroup@totalviewtech.com
           –

   •       Web site 
           –   www.totalviewtech.com
                 TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice
                                                    www.totalviewtech.com
<number>

Weitere ähnliche Inhalte

Was ist angesagt?

documents of blue gene/L
documents of blue gene/Ldocuments of blue gene/L
documents of blue gene/Lmsramakrishna
 
Synergistic processing in cell's multicore architecture
Synergistic processing in cell's multicore architectureSynergistic processing in cell's multicore architecture
Synergistic processing in cell's multicore architectureMichael Gschwind
 
A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...
A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...
A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...inside-BigData.com
 
01 From K to Fugaku
01 From K to Fugaku01 From K to Fugaku
01 From K to FugakuRCCSRENKEI
 
The Basics of Cell Computing Technology
The Basics of Cell Computing TechnologyThe Basics of Cell Computing Technology
The Basics of Cell Computing TechnologySlide_N
 
High performance computing - building blocks, production & perspective
High performance computing - building blocks, production & perspectiveHigh performance computing - building blocks, production & perspective
High performance computing - building blocks, production & perspectiveJason Shih
 
Intro to Cell Broadband Engine for HPC
Intro to Cell Broadband Engine for HPCIntro to Cell Broadband Engine for HPC
Intro to Cell Broadband Engine for HPCSlide_N
 
Cell Technology for Graphics and Visualization
Cell Technology for Graphics and VisualizationCell Technology for Graphics and Visualization
Cell Technology for Graphics and VisualizationSlide_N
 
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialSCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialGanesan Narayanasamy
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...Ganesan Narayanasamy
 
ARM-based Supercomputer from Fujitsu and RIKEN - "Post-K"
ARM-based Supercomputer from Fujitsu and RIKEN - "Post-K"ARM-based Supercomputer from Fujitsu and RIKEN - "Post-K"
ARM-based Supercomputer from Fujitsu and RIKEN - "Post-K"Phil Hughes
 
Intel's Nehalem Microarchitecture by Glenn Hinton
Intel's Nehalem Microarchitecture by Glenn HintonIntel's Nehalem Microarchitecture by Glenn Hinton
Intel's Nehalem Microarchitecture by Glenn Hintonparallellabs
 
High performance computing with accelarators
High performance computing with accelaratorsHigh performance computing with accelarators
High performance computing with accelaratorsEmmanuel college
 
System on chip architectures
System on chip architecturesSystem on chip architectures
System on chip architecturesA B Shinde
 

Was ist angesagt? (20)

Blue Gene
Blue GeneBlue Gene
Blue Gene
 
blue gene ppt
blue gene pptblue gene ppt
blue gene ppt
 
Blue Gene Active Storage
Blue Gene Active StorageBlue Gene Active Storage
Blue Gene Active Storage
 
documents of blue gene/L
documents of blue gene/Ldocuments of blue gene/L
documents of blue gene/L
 
Synergistic processing in cell's multicore architecture
Synergistic processing in cell's multicore architectureSynergistic processing in cell's multicore architecture
Synergistic processing in cell's multicore architecture
 
A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...
A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...
A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...
 
01 From K to Fugaku
01 From K to Fugaku01 From K to Fugaku
01 From K to Fugaku
 
The Basics of Cell Computing Technology
The Basics of Cell Computing TechnologyThe Basics of Cell Computing Technology
The Basics of Cell Computing Technology
 
High performance computing - building blocks, production & perspective
High performance computing - building blocks, production & perspectiveHigh performance computing - building blocks, production & perspective
High performance computing - building blocks, production & perspective
 
Intro to Cell Broadband Engine for HPC
Intro to Cell Broadband Engine for HPCIntro to Cell Broadband Engine for HPC
Intro to Cell Broadband Engine for HPC
 
Cell Technology for Graphics and Visualization
Cell Technology for Graphics and VisualizationCell Technology for Graphics and Visualization
Cell Technology for Graphics and Visualization
 
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialSCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
 
AI Hardware
AI HardwareAI Hardware
AI Hardware
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
 
ARM-based Supercomputer from Fujitsu and RIKEN - "Post-K"
ARM-based Supercomputer from Fujitsu and RIKEN - "Post-K"ARM-based Supercomputer from Fujitsu and RIKEN - "Post-K"
ARM-based Supercomputer from Fujitsu and RIKEN - "Post-K"
 
Super Computer
Super ComputerSuper Computer
Super Computer
 
Intel's Nehalem Microarchitecture by Glenn Hinton
Intel's Nehalem Microarchitecture by Glenn HintonIntel's Nehalem Microarchitecture by Glenn Hinton
Intel's Nehalem Microarchitecture by Glenn Hinton
 
High performance computing with accelarators
High performance computing with accelaratorsHigh performance computing with accelarators
High performance computing with accelarators
 
System on chip architectures
System on chip architecturesSystem on chip architectures
System on chip architectures
 
Japan's post K Computer
Japan's post K ComputerJapan's post K Computer
Japan's post K Computer
 

Ähnlich wie TotalView Debugger On Blue Gene

PEARC17: Live Integrated Visualization Environment: An Experiment in General...
PEARC17: Live Integrated Visualization Environment: An Experiment in General...PEARC17: Live Integrated Visualization Environment: An Experiment in General...
PEARC17: Live Integrated Visualization Environment: An Experiment in General...moneyjh
 
Debugging Numerical Simulations on Accelerated Architectures - TotalView fo...
 Debugging Numerical Simulations on Accelerated Architectures  - TotalView fo... Debugging Numerical Simulations on Accelerated Architectures  - TotalView fo...
Debugging Numerical Simulations on Accelerated Architectures - TotalView fo...Rogue Wave Software
 
Kitware: Qt and Scientific Computing
Kitware: Qt and Scientific ComputingKitware: Qt and Scientific Computing
Kitware: Qt and Scientific Computingaccount inactive
 
Lahav Savir - Massively Scaleable Mobile Gateways
Lahav Savir - Massively Scaleable Mobile GatewaysLahav Savir - Massively Scaleable Mobile Gateways
Lahav Savir - Massively Scaleable Mobile GatewaysLahav Savir
 
Pull, Don't Push! Sensu Summit 2018 Talk
Pull, Don't Push! Sensu Summit 2018 TalkPull, Don't Push! Sensu Summit 2018 Talk
Pull, Don't Push! Sensu Summit 2018 TalkJulian Dunn
 
Pull, don’t push: Architectures for monitoring and configuration in a microse...
Pull, don’t push: Architectures for monitoring and configuration in a microse...Pull, don’t push: Architectures for monitoring and configuration in a microse...
Pull, don’t push: Architectures for monitoring and configuration in a microse...Sensu Inc.
 
Monitoring federation open stack infrastructure
Monitoring federation open stack infrastructureMonitoring federation open stack infrastructure
Monitoring federation open stack infrastructureFernando Lopez Aguilar
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyPeter Clapham
 
Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures Subbu Rama
 
2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...
2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...
2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...Hendrik van Run
 
A GitOps model for High Availability and Disaster Recovery on EKS
A GitOps model for High Availability and Disaster Recovery on EKSA GitOps model for High Availability and Disaster Recovery on EKS
A GitOps model for High Availability and Disaster Recovery on EKSWeaveworks
 
Operational Visibiliy and Analytics - BU Seminar
Operational Visibiliy and Analytics - BU SeminarOperational Visibiliy and Analytics - BU Seminar
Operational Visibiliy and Analytics - BU SeminarCanturk Isci
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansPeter Clapham
 
Srikanth_PILLI_CV_latest
Srikanth_PILLI_CV_latestSrikanth_PILLI_CV_latest
Srikanth_PILLI_CV_latestSrikanth Pilli
 
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...The Linux Foundation
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications OpenEBS
 

Ähnlich wie TotalView Debugger On Blue Gene (20)

PEARC17: Live Integrated Visualization Environment: An Experiment in General...
PEARC17: Live Integrated Visualization Environment: An Experiment in General...PEARC17: Live Integrated Visualization Environment: An Experiment in General...
PEARC17: Live Integrated Visualization Environment: An Experiment in General...
 
Debugging Numerical Simulations on Accelerated Architectures - TotalView fo...
 Debugging Numerical Simulations on Accelerated Architectures  - TotalView fo... Debugging Numerical Simulations on Accelerated Architectures  - TotalView fo...
Debugging Numerical Simulations on Accelerated Architectures - TotalView fo...
 
Kitware: Qt and Scientific Computing
Kitware: Qt and Scientific ComputingKitware: Qt and Scientific Computing
Kitware: Qt and Scientific Computing
 
Lahav Savir - Massively Scaleable Mobile Gateways
Lahav Savir - Massively Scaleable Mobile GatewaysLahav Savir - Massively Scaleable Mobile Gateways
Lahav Savir - Massively Scaleable Mobile Gateways
 
Pull, Don't Push! Sensu Summit 2018 Talk
Pull, Don't Push! Sensu Summit 2018 TalkPull, Don't Push! Sensu Summit 2018 Talk
Pull, Don't Push! Sensu Summit 2018 Talk
 
Pull, don’t push: Architectures for monitoring and configuration in a microse...
Pull, don’t push: Architectures for monitoring and configuration in a microse...Pull, don’t push: Architectures for monitoring and configuration in a microse...
Pull, don’t push: Architectures for monitoring and configuration in a microse...
 
Monitoring federation open stack infrastructure
Monitoring federation open stack infrastructureMonitoring federation open stack infrastructure
Monitoring federation open stack infrastructure
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
SFSCON23 - Andrea Alfonsi - Kubernetes for IoT
SFSCON23 - Andrea Alfonsi - Kubernetes for IoTSFSCON23 - Andrea Alfonsi - Kubernetes for IoT
SFSCON23 - Andrea Alfonsi - Kubernetes for IoT
 
Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures
 
OpenVINO introduction
OpenVINO introductionOpenVINO introduction
OpenVINO introduction
 
2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...
2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...
2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...
 
A GitOps model for High Availability and Disaster Recovery on EKS
A GitOps model for High Availability and Disaster Recovery on EKSA GitOps model for High Availability and Disaster Recovery on EKS
A GitOps model for High Availability and Disaster Recovery on EKS
 
Operational Visibiliy and Analytics - BU Seminar
Operational Visibiliy and Analytics - BU SeminarOperational Visibiliy and Analytics - BU Seminar
Operational Visibiliy and Analytics - BU Seminar
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
Srikanth_PILLI_CV_latest
Srikanth_PILLI_CV_latestSrikanth_PILLI_CV_latest
Srikanth_PILLI_CV_latest
 
운영체제론 Ch22
운영체제론 Ch22운영체제론 Ch22
운영체제론 Ch22
 
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications
 

Kürzlich hochgeladen

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Kürzlich hochgeladen (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

TotalView Debugger On Blue Gene

  • 1. Scalable Debugging with TotalView on Blue Gene  John DelSignore, CTO TotalView Technologies
  • 2. Agenda • TotalView on Blue Gene – A little history – Current status • Recent TotalView improvements – ReplayEngine (reverse debugging) – Remote Display – TotalView Script (batch debugging) • Future work – BG/* – Heterogeneous systems – Many core, transactional memory, speculative execution – Peta­scale debugging TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice www.totalviewtech.com 2
  • 3. Supported Blue Gene  Architectures and Compilers • Blue Gene/L and Blue Gene/P • Languages / Compilers – C/C++, Fortran, Assembly – GNU Compilers – IBM Compilers – IBM OpenMP (on BG/P) • Parallel Environments – IBM MPI  – IBM OpenMP (on BG/P) – Pthreads (BG/P) • Runtime linking/loading (BG/P) – Shared libraries – Dynamically loaded shared libraries TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice www.totalviewtech.com 3
  • 4. Blue Gene Architecture • TotalView client (GUI/CLI)  runs on the Front End node • Client communicates with  the TotalView debugger  servers running on the I/O  nodes via a socket • The debugger servers  communicate with the  CIOD to control processes  and threads running on the  Compute nodes • Fan­out ratios (CNs/server) – BG/L: 32­64, 2 cores/CN,  128 threads/server – BG/P:128­256, 4 cores/CN,  1024 threads/server – Ratio increasing (8K thr/svr?) – Parallelize server operation TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice www.totalviewtech.com 4
  • 5. TotalView Blue Gene/L Support • TotalView involvement since 2003 • Support for Blue Gene/L since 2005 • Debugging interfaces developed via close  collaboration with IBM • Used on DOE/NNSA/LLNL's  Blue Gene/L system  containing 212 K cores – Heap memory debugging support added – Blue Gene/L scaling and performance tuning project – TotalView has debugged jobs as large as 8,192 processes  (LLNL) • Work on Blue Gene/L facilitated Blue Gene/P  support TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice www.totalviewtech.com 5
  • 6. TotalView Blue Gene/P Support • Blue Gene/P supported since Q4 2007 • Continued close collaboration with IBM to  develop multi­threaded debugging interfaces • Support for shared libraries and dynamically  loaded libraries • Scalability improvements • TotalView has debugged jobs as large as 32K  (Jülich) TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice www.totalviewtech.com 6
  • 7. TotalView Blue Gene/P Sites • Currently running at over  30 sites in Germany,  France, UK, and US, including – Argonne – Boston University – Daresbury – IDRIS – Jülich – LLNL – Max Planck – ORNL – Princeton University – Rensselaer Polytechnic Institute • Jülich workshop, March 08 • Argonne workshop, May 08 TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice www.totalviewtech.com 7
  • 8. Recent TotalView Improvements on Blue Gene and Linux • Remote Display – Run a remote version of the TotalView GUI… – …display it locally, with fast, interactive performance – Easy, fast, secure • tvscript – Simplifies debugging batch jobs – Event/action paradigm – Configurable • ReplayEngine – Step execution back in time – Uses reverse debugging technology – Linux x86 and x86­64 (currently only) TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice www.totalviewtech.com 8
  • 9. Remote Display • Presents a window on your  machine that will display  TotalView executing on a  remote system • Two components:  – Client, runs on the local  system, available for   Linux x86, x86­64  Windows XP, Vista – Server, which runs on any  system supported by  TotalView, invisibly  managing the connections  between the host and client • The Client also provides for  submission of jobs to  batch queuing systems  PBS Pro and LoadLeveler TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice www.totalviewtech.com 9
  • 10. Batch Scripting • Designed for debugging in a batch environment • tvscript lets you define the events to act on, the actions to  take when an event occurs • Typical events – Action point (e.g., breakpoint) – Memory error (e.g., malloc returns 0, guard block corruption) – Errors (e.g., SEGV, FPE) • Typical actions – Display a backtrace – List memory leaks – Print variables and arrays • Configurable – Supports external script files – Allows generation of even more complex actions and events TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice www.totalviewtech.com 10
  • 11. Replay Engine • Intuitive user interface, integrated with TotalView Step forward over functions Step backward over functions Step forward into functions Step backward into functions Advance forward out of current  Advance backward out of  current Function, after the call Function, to before the call Advance forward to selected line  Advance backward to selected line Advance forward to “live” session TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice www.totalviewtech.com 11
  • 12. Possible Future Blue Gene Work • BG/* support – Support future generations of Blue Gene • Fast conditional breakpoints/watchpoints – Expressions compiled/patched into target, excute in parallel,  about 10usecs/expression • Asynchronous thread control – Thread barrier breakpoint, thread single stepping • User programmable visual data – Allows user define complex data access function • Debugging optimized code • Post­mortem debugging • Fast DLL debugging interface • LLNL collaboration for scalable subset attach – Integrates with lightweight tools such as STAT TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice www.totalviewtech.com 12
  • 13. Possible Other Future Work • Scalability/performance – Continue scalability and performance improvements – Tree­based infrastructure for logarithmic scaling – Peta­scale debugging – Hundreds of thousands of threads • Heterogeneous systems – IBM Roadrunner (x86­64/Cell) – GPUs • Emerging technologies – Many core – Transactional memory – Speculative execution TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice www.totalviewtech.com <number>
  • 14. Questions?   More Information • Blue Gene Technical Development Interest Group Contact chris.gottbrath@totalviewtech.com – • Technical support  support@totalviewtech.com – • BG LLNL case study www.totalviewtech.com/pdf/case_study_scientific_computing.pdf  – • Customer training or webinars contacttraininggroup@totalviewtech.com – • Web site  – www.totalviewtech.com TotalView Technologies – Confidential and Proprietary – Plans Subject to Change without Notice www.totalviewtech.com <number>