SlideShare ist ein Scribd-Unternehmen logo
1 von 10
STAT: A Debugging Tool
                 For Extreme Scale


                        Martin Schulz
           Center for Applied Scientific Computing
           Lawrence Livermore National Laboratory
ASC STAT Team: Greg Lee, Dong Ahn (LLNL), Dane Gardner (LANL)
        Developed at LLNL, University of Wisconsin &
                  University of New Mexico
         Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551

            This work performed under the auspices of the U.S. Department of Energy by
            Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344

                                                                                         LLNL-PRES-426152
STAT: Debugging Support at Scale
  The debugging challenge at scale
    • Traditional debuggers break down at scale
    • Data and control for too many tasks
    • Sequential paradigm
  How can STAT help?
    • Identify equivalence classes
    • Pre-analysis for subset debugging
  Typical use case
    • Application hang (life or dead-lock)
    • Answer the question: What is my code doing now?


Lawrence Livermore National Laboratory
Stacktraces: The Basis for STAT




Lawrence Livermore National Laboratory
Gathering Stack Traces
  STAT gathers stack traces from
    • Multiple processes
    • Multiple samples per process




            3D 2D Trace/Space Call Graph Prefix Tree
               Trace/Space/Time Call Graph Prefix Tree

   MPI                           MPI            MPI


Lawrence Livermore National Laboratory
Interpreting Stacktrace Trees




Task 0              Task 1               Task 2



      Your Favorite Debugger




Lawrence Livermore National Laboratory
STAT GUI




Lawrence Livermore National Laboratory
Availability
Platform           Ver.      Usage             Documentation                           POC
LLNL/TLCC          0.9.4     STATGUI           https://computing.llnl.gov/code/STAT/   Greg Lee
OCF                          STAT                                                      lee218@llnl.gov
LLNL/TLCC          0.9.4     STATGUI           https://computing.llnl.gov/code/STAT/   Greg Lee
SCF                          STAT                                                      lee218@llnl.gov
LLNL/uBGL          0.9.0     STAT              https://computing.llnl.gov/code/STAT/   Greg Lee
                   beta                                                                lee218@llnl.gov
LLNL/Dawn          0.9.4     STATGUI           https://computing.llnl.gov/code/STAT/   Greg Lee
                   beta      STAT                                                      lee218@llnl.gov
SNL/Glory          0.9.2     see below         https://computing.llnl.gov/code/STAT/   Mahesh Rajan
                                                                                       mrajan@sandia.gov
LANL/Yellow        0.9.1b     Mod: hpc-tools   man stat                                consult@lanl.gov
Turing                        Mod: stat
LANL/Turquoise     0.9.2      Mod: hpc-tools   man stat                                consult@lanl.gov
Lobo                          Mod: stat



Usage for SNL/Glory:                                             Note: Red Storm has a poor-man STAT-like
module switch mpi mpi/mvapich-1.1_intel-11.1-f064-c064           utility called fast_where.
module load /home/jgalaro/privatemodules/openss-mvapich          Try "man fast_where” for usage instructions.

  Lawrence Livermore National Laboratory
Usage Instructions
  Option 1: Graphical User Interface
    • Launch GUI: STATGUI
    • Attach, create stacktraces & views through GUI
  Option 2: Command line
    • STAT <MPI launcher pid>
       − -t: number of traces
       − -T: time between traces
    • Reports output file to stdout
    • STATview <output file>
  Additional information
    • man STAT / STAT –h
    • acroread /usr/local/tools/stat/doc/*.pdf
Lawrence Livermore National Laboratory
Advanced Topics
  Scalable Implementation                               FE

    • Tree-based overlay networks
       − Data aggregation on the fly               CP         CP

       − Tree depth configurable
                                              CP                   CP
    • Parameters to STAT
    • Useful for 10,000+ tasks           BE         BE
                                                         …    BE        BE

  Temporal Analysis
    • Finer grain analysis of process location
    • Disambiguation of iteration instances
    • Employs static analysis to determine loop variables

Lawrence Livermore National Laboratory
Reference & Demo Session
  Usage documentation
    • https://computing.llnl.gov/code/STAT/
  Man page
    • man STAT or man STATGUI
    • STAT -h
  Background information
    • http://www.paradyn.org/STAT/STAT.html

  Demo Session / Track 3




Lawrence Livermore National Laboratory

Weitere ähnliche Inhalte

Ähnlich wie Lee.stat

Automating the Hunt for Non-Obvious Sources of Latency Spreads
Automating the Hunt for Non-Obvious Sources of Latency SpreadsAutomating the Hunt for Non-Obvious Sources of Latency Spreads
Automating the Hunt for Non-Obvious Sources of Latency SpreadsScyllaDB
 
Managing Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic OptimizingManaging Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic OptimizingDatabricks
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodDatabricks
 
Buildinga billionuserloadbalancer may2015-sre-con15europe-shuff
Buildinga billionuserloadbalancer may2015-sre-con15europe-shuffBuildinga billionuserloadbalancer may2015-sre-con15europe-shuff
Buildinga billionuserloadbalancer may2015-sre-con15europe-shuffPatrick Shuff
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...confluent
 
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland LeusdenTestistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland LeusdenTurkish Testing Board
 
Softshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with CouchbaseSoftshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with CouchbaseTugdual Grall
 
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward
 
Bharath Ram Chandrasekar_Tele 6603_SDN &NFV
Bharath Ram Chandrasekar_Tele 6603_SDN &NFVBharath Ram Chandrasekar_Tele 6603_SDN &NFV
Bharath Ram Chandrasekar_Tele 6603_SDN &NFVBharath Ram Chandrasekar
 
Proactive ops for container orchestration environments
Proactive ops for container orchestration environmentsProactive ops for container orchestration environments
Proactive ops for container orchestration environmentsDocker, Inc.
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Databricks
 
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...Paragon_Science_Inc
 
Openlab.2014 02-13.major.vi sion
Openlab.2014 02-13.major.vi sionOpenlab.2014 02-13.major.vi sion
Openlab.2014 02-13.major.vi sionCcie Light
 
What’s New in the Upcoming Apache Spark 3.0
What’s New in the Upcoming Apache Spark 3.0What’s New in the Upcoming Apache Spark 3.0
What’s New in the Upcoming Apache Spark 3.0Databricks
 
Self Regulating Streaming - Data Platforms Conference 2018
Self Regulating Streaming - Data Platforms Conference 2018Self Regulating Streaming - Data Platforms Conference 2018
Self Regulating Streaming - Data Platforms Conference 2018Streamlio
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataJames Sirota
 
Metabolomics in Galaxy - ICG8 Shenzhen 2013
Metabolomics in Galaxy - ICG8 Shenzhen 2013Metabolomics in Galaxy - ICG8 Shenzhen 2013
Metabolomics in Galaxy - ICG8 Shenzhen 2013Robert Davidson
 
Python for Data Logistics
Python for Data LogisticsPython for Data Logistics
Python for Data LogisticsKen Farmer
 

Ähnlich wie Lee.stat (20)

Automating the Hunt for Non-Obvious Sources of Latency Spreads
Automating the Hunt for Non-Obvious Sources of Latency SpreadsAutomating the Hunt for Non-Obvious Sources of Latency Spreads
Automating the Hunt for Non-Obvious Sources of Latency Spreads
 
Managing Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic OptimizingManaging Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic Optimizing
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
 
Buildinga billionuserloadbalancer may2015-sre-con15europe-shuff
Buildinga billionuserloadbalancer may2015-sre-con15europe-shuffBuildinga billionuserloadbalancer may2015-sre-con15europe-shuff
Buildinga billionuserloadbalancer may2015-sre-con15europe-shuff
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
 
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland LeusdenTestistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
 
Softshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with CouchbaseSoftshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with Couchbase
 
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
 
Bharath Ram Chandrasekar_Tele 6603_SDN &NFV
Bharath Ram Chandrasekar_Tele 6603_SDN &NFVBharath Ram Chandrasekar_Tele 6603_SDN &NFV
Bharath Ram Chandrasekar_Tele 6603_SDN &NFV
 
HDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and FutureHDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and Future
 
Proactive ops for container orchestration environments
Proactive ops for container orchestration environmentsProactive ops for container orchestration environments
Proactive ops for container orchestration environments
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0
 
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
 
Openlab.2014 02-13.major.vi sion
Openlab.2014 02-13.major.vi sionOpenlab.2014 02-13.major.vi sion
Openlab.2014 02-13.major.vi sion
 
What’s New in the Upcoming Apache Spark 3.0
What’s New in the Upcoming Apache Spark 3.0What’s New in the Upcoming Apache Spark 3.0
What’s New in the Upcoming Apache Spark 3.0
 
Self Regulating Streaming - Data Platforms Conference 2018
Self Regulating Streaming - Data Platforms Conference 2018Self Regulating Streaming - Data Platforms Conference 2018
Self Regulating Streaming - Data Platforms Conference 2018
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking Data
 
Rob Davidson: Using Galaxy for Metabolomics
Rob Davidson: Using Galaxy for MetabolomicsRob Davidson: Using Galaxy for Metabolomics
Rob Davidson: Using Galaxy for Metabolomics
 
Metabolomics in Galaxy - ICG8 Shenzhen 2013
Metabolomics in Galaxy - ICG8 Shenzhen 2013Metabolomics in Galaxy - ICG8 Shenzhen 2013
Metabolomics in Galaxy - ICG8 Shenzhen 2013
 
Python for Data Logistics
Python for Data LogisticsPython for Data Logistics
Python for Data Logistics
 

Mehr von ابراهيم العناني (15)

أهمية اللعب عند الأطفال
أهمية اللعب عند الأطفالأهمية اللعب عند الأطفال
أهمية اللعب عند الأطفال
 
The balance sheet
The balance sheetThe balance sheet
The balance sheet
 
Symcgoodman
SymcgoodmanSymcgoodman
Symcgoodman
 
Probability
ProbabilityProbability
Probability
 
Income statements
Income statementsIncome statements
Income statements
 
Healthy eating sc
Healthy eating scHealthy eating sc
Healthy eating sc
 
Foods
FoodsFoods
Foods
 
Fast food
Fast foodFast food
Fast food
 
Child psychology
Child psychologyChild psychology
Child psychology
 
Chapter3
Chapter3Chapter3
Chapter3
 
Ch16 introto business
Ch16 introto businessCh16 introto business
Ch16 introto business
 
Caiib fmmodbbsa nov08
Caiib fmmodbbsa nov08Caiib fmmodbbsa nov08
Caiib fmmodbbsa nov08
 
Accounting 1
Accounting 1Accounting 1
Accounting 1
 
Balance sheet
Balance sheetBalance sheet
Balance sheet
 
تعريف اللعب
تعريف اللعبتعريف اللعب
تعريف اللعب
 

Kürzlich hochgeladen

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 

Kürzlich hochgeladen (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 

Lee.stat

  • 1. STAT: A Debugging Tool For Extreme Scale Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory ASC STAT Team: Greg Lee, Dong Ahn (LLNL), Dane Gardner (LANL) Developed at LLNL, University of Wisconsin & University of New Mexico Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551 This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 LLNL-PRES-426152
  • 2. STAT: Debugging Support at Scale  The debugging challenge at scale • Traditional debuggers break down at scale • Data and control for too many tasks • Sequential paradigm  How can STAT help? • Identify equivalence classes • Pre-analysis for subset debugging  Typical use case • Application hang (life or dead-lock) • Answer the question: What is my code doing now? Lawrence Livermore National Laboratory
  • 3. Stacktraces: The Basis for STAT Lawrence Livermore National Laboratory
  • 4. Gathering Stack Traces  STAT gathers stack traces from • Multiple processes • Multiple samples per process 3D 2D Trace/Space Call Graph Prefix Tree Trace/Space/Time Call Graph Prefix Tree MPI MPI MPI Lawrence Livermore National Laboratory
  • 5. Interpreting Stacktrace Trees Task 0 Task 1 Task 2 Your Favorite Debugger Lawrence Livermore National Laboratory
  • 6. STAT GUI Lawrence Livermore National Laboratory
  • 7. Availability Platform Ver. Usage Documentation POC LLNL/TLCC 0.9.4 STATGUI https://computing.llnl.gov/code/STAT/ Greg Lee OCF STAT lee218@llnl.gov LLNL/TLCC 0.9.4 STATGUI https://computing.llnl.gov/code/STAT/ Greg Lee SCF STAT lee218@llnl.gov LLNL/uBGL 0.9.0 STAT https://computing.llnl.gov/code/STAT/ Greg Lee beta lee218@llnl.gov LLNL/Dawn 0.9.4 STATGUI https://computing.llnl.gov/code/STAT/ Greg Lee beta STAT lee218@llnl.gov SNL/Glory 0.9.2 see below https://computing.llnl.gov/code/STAT/ Mahesh Rajan mrajan@sandia.gov LANL/Yellow 0.9.1b Mod: hpc-tools man stat consult@lanl.gov Turing Mod: stat LANL/Turquoise 0.9.2 Mod: hpc-tools man stat consult@lanl.gov Lobo Mod: stat Usage for SNL/Glory: Note: Red Storm has a poor-man STAT-like module switch mpi mpi/mvapich-1.1_intel-11.1-f064-c064 utility called fast_where. module load /home/jgalaro/privatemodules/openss-mvapich Try "man fast_where” for usage instructions. Lawrence Livermore National Laboratory
  • 8. Usage Instructions  Option 1: Graphical User Interface • Launch GUI: STATGUI • Attach, create stacktraces & views through GUI  Option 2: Command line • STAT <MPI launcher pid> − -t: number of traces − -T: time between traces • Reports output file to stdout • STATview <output file>  Additional information • man STAT / STAT –h • acroread /usr/local/tools/stat/doc/*.pdf Lawrence Livermore National Laboratory
  • 9. Advanced Topics  Scalable Implementation FE • Tree-based overlay networks − Data aggregation on the fly CP CP − Tree depth configurable CP CP • Parameters to STAT • Useful for 10,000+ tasks BE BE … BE BE  Temporal Analysis • Finer grain analysis of process location • Disambiguation of iteration instances • Employs static analysis to determine loop variables Lawrence Livermore National Laboratory
  • 10. Reference & Demo Session  Usage documentation • https://computing.llnl.gov/code/STAT/  Man page • man STAT or man STATGUI • STAT -h  Background information • http://www.paradyn.org/STAT/STAT.html  Demo Session / Track 3 Lawrence Livermore National Laboratory